From d6d1fee77ebd7a0d3968d23c4cd5d9e6d41bcbe4 Mon Sep 17 00:00:00 2001 From: baiqing Date: Sat, 16 May 2026 22:07:30 +0800 Subject: [PATCH] =?UTF-8?q?chore(repo):=20=E7=A7=BB=E9=99=A4=E5=86=85?= =?UTF-8?q?=E9=83=A8=20AI=20=E5=8D=8F=E4=BD=9C=20/=20=E8=A7=84=E5=88=92=20?= =?UTF-8?q?/=20=E5=AE=A1=E8=AE=A1=E6=96=87=E6=A1=A3=EF=BC=8C=E4=BB=85?= =?UTF-8?q?=E6=9C=AC=E5=9C=B0=E4=BF=9D=E7=95=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 公共仓库不再发布 CLAUDE.md / AGENTS.md / 内部 audit + roadmap + github-tracking + superpowers + windows-*-tracking 等共 45 个文件。 - 全部用 `git rm --cached`,本地仍保留以供日常协作; - `.gitignore` 加规则防止再次被 `git add` 拉回; - 用户 docs(volcengine-setup / tauri-csp / references / 图片素材) 以及 ISSUE_TEMPLATE / workflows / pull_request_template 仍保留发布。 密钥层面审计同步完成:未发现 API key / token / 私钥泄露, tauri pubkey 与 GitHub OAuth client_id 均为设计上公开值。 --- .github/BUILD_TEST_REPORT.md | 134 - .github/COOPER_CONTRIBUTION_STRATEGY.md | 171 -- .github/COOPER_README.md | 151 -- .github/COOPER_WORKFLOW.md | 256 -- .github/MULTI_SCALE_AUDIT.md | 366 --- .github/P1_TEST_REPORT.md | 173 -- .github/TEST_VERIFICATION.md | 235 -- .github/WATCHDOG_RISK_ANALYSIS.md | 481 ---- .../architecture-risk-map-20260504.md | 309 --- .../system-audit-summary-20260504.md | 97 - .../system-level/tech-debt-matrix-20260504.md | 147 -- .../finding-reports/asr-analysis-20260504.md | 98 - .../finding-reports/dependencies-20260504.md | 96 - .../finding-summary-20260504.md | 37 - .../finding-reports/test-coverage-20260504.md | 97 - .../issues/EPIC-001-testing-infrastructure.md | 168 -- .github/issues/EPIC-002-asr-enhancement.md | 267 -- .gitignore | 34 + AGENTS.md | 193 -- CLAUDE.md | 266 -- ...erminal-clipboard-restore-investigation.md | 334 --- docs/audit-2026-05-06.md | 293 --- docs/audit-2026-05-10-validated.md | 700 ------ docs/auto-update-download-acceleration.md | 63 - .../issue-139-capsule-lifecycle.md | 87 - .../issue-98-startup-visible-ready.md | 74 - .../issue-windows-dual-hotkey-sources.md | 77 - ...ssue-windows-terminal-clipboard-restore.md | 146 -- .../pr-140-capsule-lifecycle.md | 52 - .../pr-145-cold-start-first-paint.md | 46 - .../pr-154-windows-dual-hotkey.md | 52 - .../pr-windows-terminal-clipboard-restore.md | 82 - docs/issue-420-wayland-hotkey-research.md | 401 --- docs/logic-review-2026-05-10.md | 159 -- docs/qa-reasoning-roadmap.md | 75 - docs/style-pack-marketplace.md | 299 --- .../2026-05-01-windows-temporary-tsf-ime.md | 2191 ----------------- .../plans/2026-05-06-windows-local-asr.md | 1396 ----------- ...-05-01-windows-temporary-tsf-ime-design.md | 143 -- .../2026-05-06-windows-local-asr-design.md | 247 -- .../issue-154-dual-hotkey-sources.md | 27 - docs/windows-tauri-test-agent-research.md | 127 - .../issue-142-capsule-geometry.md | 23 - .../issue-143-cold-start-ui.md | 23 - docs/windows-upstream-pr-workflow.md | 65 - issue-420-wayland-plan.md | 317 --- 46 files changed, 34 insertions(+), 11241 deletions(-) delete mode 100644 .github/BUILD_TEST_REPORT.md delete mode 100644 .github/COOPER_CONTRIBUTION_STRATEGY.md delete mode 100644 .github/COOPER_README.md delete mode 100644 .github/COOPER_WORKFLOW.md delete mode 100644 .github/MULTI_SCALE_AUDIT.md delete mode 100644 .github/P1_TEST_REPORT.md delete mode 100644 .github/TEST_VERIFICATION.md delete mode 100644 .github/WATCHDOG_RISK_ANALYSIS.md delete mode 100644 .github/audit-reports/system-level/architecture-risk-map-20260504.md delete mode 100644 .github/audit-reports/system-level/system-audit-summary-20260504.md delete mode 100644 .github/audit-reports/system-level/tech-debt-matrix-20260504.md delete mode 100644 .github/finding-reports/asr-analysis-20260504.md delete mode 100644 .github/finding-reports/dependencies-20260504.md delete mode 100644 .github/finding-reports/finding-summary-20260504.md delete mode 100644 .github/finding-reports/test-coverage-20260504.md delete mode 100644 .github/issues/EPIC-001-testing-infrastructure.md delete mode 100644 .github/issues/EPIC-002-asr-enhancement.md delete mode 100644 AGENTS.md delete mode 100644 CLAUDE.md delete mode 100644 docs/2026-05-02-windows-terminal-clipboard-restore-investigation.md delete mode 100644 docs/audit-2026-05-06.md delete mode 100644 docs/audit-2026-05-10-validated.md delete mode 100644 docs/auto-update-download-acceleration.md delete mode 100644 docs/github-tracking/issue-139-capsule-lifecycle.md delete mode 100644 docs/github-tracking/issue-98-startup-visible-ready.md delete mode 100644 docs/github-tracking/issue-windows-dual-hotkey-sources.md delete mode 100644 docs/github-tracking/issue-windows-terminal-clipboard-restore.md delete mode 100644 docs/github-tracking/pr-140-capsule-lifecycle.md delete mode 100644 docs/github-tracking/pr-145-cold-start-first-paint.md delete mode 100644 docs/github-tracking/pr-154-windows-dual-hotkey.md delete mode 100644 docs/github-tracking/pr-windows-terminal-clipboard-restore.md delete mode 100644 docs/issue-420-wayland-hotkey-research.md delete mode 100644 docs/logic-review-2026-05-10.md delete mode 100644 docs/qa-reasoning-roadmap.md delete mode 100644 docs/style-pack-marketplace.md delete mode 100644 docs/superpowers/plans/2026-05-01-windows-temporary-tsf-ime.md delete mode 100644 docs/superpowers/plans/2026-05-06-windows-local-asr.md delete mode 100644 docs/superpowers/specs/2026-05-01-windows-temporary-tsf-ime-design.md delete mode 100644 docs/superpowers/specs/2026-05-06-windows-local-asr-design.md delete mode 100644 docs/windows-lifecycle-tracking/issue-154-dual-hotkey-sources.md delete mode 100644 docs/windows-tauri-test-agent-research.md delete mode 100644 docs/windows-ui-tracking/issue-142-capsule-geometry.md delete mode 100644 docs/windows-ui-tracking/issue-143-cold-start-ui.md delete mode 100644 docs/windows-upstream-pr-workflow.md delete mode 100644 issue-420-wayland-plan.md diff --git a/.github/BUILD_TEST_REPORT.md b/.github/BUILD_TEST_REPORT.md deleted file mode 100644 index b55258f4..00000000 --- a/.github/BUILD_TEST_REPORT.md +++ /dev/null @@ -1,134 +0,0 @@ -# 构建和运行测试报告 - -## 测试时间 -2026-05-04 22:08 - -## 构建结果 - -### ✅ Rust 编译成功 - -**编译时间**: 6 分 28 秒 - -**编译警告**: 23 个警告(都是未使用的代码,不影响功能) -- 未使用的变量、方法、字段等 -- 这些是正常的开发中的警告,不影响运行时行为 - -**生成文件**: -- `D:\cargo-targets\release\openless.exe` (19 MB) - -### ⚠️ MSI 打包失败 - -**错误**: `failed to run C:\Users\luoxu\AppData\Local\tauri\WixTools314\light.exe` - -**影响**: 不影响功能测试,exe 文件可以直接运行 - -**原因**: WiX 工具链问题,与代码修改无关 - -## 运行测试 - -### ✅ 应用启动成功 - -**启动日志**: -``` -2026-05-04T14:08:59Z [INFO] === OpenLess 启动 === -2026-05-04T14:09:00Z [INFO] [hotkey] Windows low-level keyboard hook 已启动 -2026-05-04T14:09:00Z [INFO] [coord] hotkey listener installed (after 1 attempt(s)) -2026-05-04T14:09:00Z [INFO] [coord] QA hotkey listener installed on main thread (after 1 attempt(s)) -``` - -**状态**: -- ✅ 应用正常启动 -- ✅ 热键监听器安装成功 -- ✅ QA 热键监听器安装成功 -- ✅ 没有错误或警告 - -### 日志检查 - -**检查项目**: -- ✅ 没有 watchdog 相关错误 -- ✅ 没有 timeout 相关错误 -- ✅ 没有 recorder 相关错误 -- ✅ 没有 coordinator 相关错误 - -**日志文件位置**: `%LOCALAPPDATA%\OpenLess\Logs\openless.log` - -## 代码质量检查 - -### 编译警告分析 - -所有 23 个警告都是 `unused` 类型: -- `unused_mut`: 1 个(coordinator.rs:1461) -- `unreachable_code`: 1 个(coordinator.rs:1901) -- `dead_code`: 21 个(未使用的枚举变体、方法、字段等) - -**结论**: 这些警告不影响功能,是正常的开发中的代码。 - -### 修复代码检查 - -**Recorder Watchdog**: -- ✅ 编译通过 -- ✅ 没有运行时错误 -- ✅ Watchdog 线程正常启动(从日志推断) - -**Coordinator 全局超时**: -- ✅ 编译通过 -- ✅ 没有运行时错误 -- ✅ 超时保护代码正常加载 - -## 功能测试建议 - -由于这是自动化测试,无法进行实际的录音测试。建议手动测试以下场景: - -### P0 测试(必须) - -1. **正常录音流程** - - 按下热键 - - 说话 2-3 秒 - - 再次按下热键 - - 验证识别结果正常插入 - -2. **长时间静音** - - 按下热键 - - 不说话,保持 10 秒 - - 再次按下热键 - - 验证不会触发 watchdog 超时 - -3. **快速开关** - - 快速按下热键 5 次 - - 验证状态机正确处理 - - 验证没有崩溃或卡死 - -### P1 测试(建议) - -4. **网络中断** - - 断开网络 - - 触发录音 - - 验证 15 秒内恢复到 Idle - -5. **多次使用** - - 连续使用 10 次 - - 验证没有资源泄漏 - - 验证性能稳定 - -## 结论 - -✅ **构建测试通过** -- Rust 代码编译成功 -- 应用正常启动 -- 没有运行时错误 -- 日志输出正常 - -✅ **代码质量良好** -- 编译警告都是无害的 -- 修复代码正确加载 -- 没有明显的问题 - -✅ **可以进行手动功能测试** -- exe 文件可以直接运行 -- 建议按照上述测试场景进行验证 - ---- - -**测试人员**: Claude Sonnet 4.6 (自动化测试) -**测试分支**: fix/recorder-timeout-238 -**测试 Commit**: 4e66c91 diff --git a/.github/COOPER_CONTRIBUTION_STRATEGY.md b/.github/COOPER_CONTRIBUTION_STRATEGY.md deleted file mode 100644 index 61604da8..00000000 --- a/.github/COOPER_CONTRIBUTION_STRATEGY.md +++ /dev/null @@ -1,171 +0,0 @@ -# Cooper 贡献策略分析 - -## Finding 结果总结(2026-05-04) - -### 1. 项目现状 -- **主维护者 baiqing**:负责 UI、产品设计、核心架构 -- **技术栈**:Tauri 2 + Rust backend + React/TS frontend -- **核心模块**:coordinator (3462行)、ASR (1164行)、polish (992行)、recorder (525行) - -### 2. 技术债务与机会 - -#### 🔴 测试覆盖率极低(最大痛点) -- 项目只有 **1 个 test 类型提交**(vs 42 个 fix 提交) -- 15 个模块有 `#[cfg(test)]`,但测试内容很少 -- `cargo test` 能跑,但覆盖率几乎为 0 -- **机会**:建立测试基础设施,成为测试领域的 owner - -#### 🟡 ASR 扩展性需求(高价值功能) -- **#211 本地 ASR AI 支持**(0 commits,无人认领) - - 需求文档已明确:whisper.cpp / sherpa-onnx 选型 - - 涉及:模型下载、本地推理、流式对接 - - 技术挑战高,但有清晰的规划框架 - -- **#89 混淆词纠错层**(priority: high,0 commits) - - ASR → polish 之间插入纠错层 - - 解决 "issue" 被识别为 "iOS" 的问题 - - 需要规则引擎 + 上下文判断 - -#### 🟢 安全与基础设施(高优先级但无人做) -- **#222 CI secrets 暴露风险**(priority: high) -- **#223 凭据配置状态管理**(priority: high) -- **#230 Keychain 威胁模型** -- **#226 WebView CSP 策略** - -#### 🔵 Windows 平台问题(你的已有优势) -- 7 个 Windows 相关 issues(#244-247, #203-204, #207) -- 但这些都是 UI 问题,baiqing 可能不让碰 - ---- - -## 三条可选路径 - -### 路径 A:测试基础设施建设者(推荐 ⭐⭐⭐⭐⭐) - -**为什么推荐**: -- 项目最大的技术债,无人认领 -- 不涉及 UI,不会和 baiqing 冲突 -- 建立后你就是测试领域的 owner -- 对所有模块都有贡献机会 - -**具体工作**: -1. **Phase 1**:为核心模块补单元测试 - - `recorder.rs`:音频采集、RMS 计算、watchdog - - `asr/frame.rs`:二进制帧编解码(已有 1 个测试,可扩展) - - `persistence.rs`:JSON 序列化、Keychain 读写 - - `types.rs`:状态机转换、错误类型 - -2. **Phase 2**:建立集成测试 - - 录音 → ASR → 润色 → 插入 全链路 mock 测试 - - 凭据管理流程测试 - - 热词注入测试 - -3. **Phase 3**:CI 自动化 - - GitHub Actions 跑测试 - - 覆盖率报告(codecov) - - PR 门禁 - -**预期产出**: -- 测试覆盖率从 ~0% → 60%+ -- 成为项目测试基础设施的 owner -- 提交数可能 +30-50 commits - ---- - -### 路径 B:ASR 功能扩展专家(推荐 ⭐⭐⭐⭐) - -**为什么推荐**: -- #211 本地 ASR 是高价值功能,无人认领 -- #89 混淆词纠错是 priority: high -- ASR 模块相对独立,不涉及 UI -- 技术挑战高,完成后影响力大 - -**具体工作**: -1. **先做 #89 混淆词纠错层**(热身项目) - - 在 `coordinator.rs:616-617` 之前插入纠错层 - - 实现规则引擎:`issue/iOS`, `PR/批阅`, `CI/西爱` 等 - - 支持用户自定义混淆词表 - - 预计 3-5 天完成 - -2. **再做 #211 本地 ASR**(主攻方向) - - 先写 `docs/local-asr-plan.md` 规划文档 - - 选型:whisper.cpp vs sherpa-onnx - - 实现 `asr/local_whisper.rs` 模块 - - 模型下载与管理 - - 预计 2-3 周完成 - -**预期产出**: -- 2 个高价值功能 -- 成为 ASR 模块的 co-owner -- 提交数可能 +20-30 commits - ---- - -### 路径 C:安全与基础设施专家(推荐 ⭐⭐⭐) - -**为什么推荐**: -- 4 个 priority: high 的安全 issues -- 无人认领,但很重要 -- 不涉及 UI 和产品设计 - -**具体工作**: -1. **#222 CI secrets 暴露**:pin PR-Agent action 版本 -2. **#223 凭据配置状态**:修复 `get_credentials` 逻辑 -3. **#230 Keychain 威胁模型**:审查 `persistence.rs` 凭据存储 -4. **#226 WebView CSP**:为 Tauri WebView 添加 CSP 策略 - -**预期产出**: -- 解决 4 个高优先级安全问题 -- 成为安全领域的 owner -- 提交数可能 +10-15 commits - ---- - -## 我的建议 - -### 最优策略:A + B 组合拳 - -**第 1 周**:做 #89 混淆词纠错(快速产出,熟悉 ASR 链路) -**第 2-3 周**:为 ASR 模块补测试(frame.rs, volcengine.rs, whisper.rs) -**第 4-6 周**:做 #211 本地 ASR(大功能,高影响力) -**第 7 周起**:继续补其他模块测试 + 建 CI - -**为什么这样组合**: -1. 混淆词纠错是小功能,快速建立信心 -2. 补 ASR 测试时深入理解模块,为本地 ASR 打基础 -3. 本地 ASR 是大功能,完成后你就是 ASR 领域的专家 -4. 测试基础设施是长期工作,可以持续贡献 - -**避开的雷区**: -- ❌ Windows UI 问题(#244-247):baiqing 的领域 -- ❌ 主窗口 UI(Overview/History/Settings):baiqing 的领域 -- ❌ Capsule 视觉设计:baiqing 的领域 - -**你的领域**: -- ✅ 测试基础设施 -- ✅ ASR 功能扩展 -- ✅ 录音器稳定性(你已在做 #238) -- ✅ 安全与基础设施 -- ✅ 文档与分析报告 - ---- - -## 下一步行动 - -**现在就可以开始**: -```bash -# 1. 先看看 #89 混淆词纠错的代码位置 -gh issue view 89 - -# 2. 读 coordinator.rs:616-617 附近的代码 -# 找到 ASR → polish 的接口 - -# 3. 设计纠错层的接口 -# 输入:RawTranscript -# 输出:CorrectedTranscript -``` - -**要不要我帮你**: -- 生成 #89 的实现方案? -- 或者先帮你规划 #211 本地 ASR 的技术选型? -- 或者先帮你为 `asr/frame.rs` 补测试作为热身? diff --git a/.github/COOPER_README.md b/.github/COOPER_README.md deleted file mode 100644 index f8a5fdd0..00000000 --- a/.github/COOPER_README.md +++ /dev/null @@ -1,151 +0,0 @@ -# Cooper 的贡献体系 - -> 在 fork 仓库中建立专业的 finding 和实施流程,成熟后向上游提交。 - -## 📁 文件结构 - -``` -.github/ -├── issues/ -│ ├── EPIC-001-testing-infrastructure.md # 测试基础设施母体 (41 tasks) -│ └── EPIC-002-asr-enhancement.md # ASR 功能扩展母体 (71 tasks) -├── finding-reports/ # Finding 分析报告 -│ ├── test-coverage-20260504.md -│ ├── asr-analysis-20260504.md -│ ├── dependencies-20260504.md -│ └── finding-summary-20260504.md -├── COOPER_WORKFLOW.md # 工作流程文档 -└── COOPER_CONTRIBUTION_STRATEGY.md # 贡献策略分析 - -scripts/ -└── finding-helper.sh # Finding 辅助脚本 -``` - -## 🎯 两大 EPIC - -### EPIC-001: 测试基础设施建设 -- **目标**: 测试覆盖率 0% → 60%+ -- **任务**: 41 个子任务 -- **时间**: 6 周 -- **文件**: `.github/issues/EPIC-001-testing-infrastructure.md` - -### EPIC-002: ASR 功能扩展与优化 -- **目标**: 混淆词纠错 + 本地 ASR 支持 -- **任务**: 71 个子任务 -- **时间**: 6 周 -- **文件**: `.github/issues/EPIC-002-asr-enhancement.md` - -## 🚀 快速开始 - -### 1. 查看 Finding 报告 -```bash -# 运行 finding 脚本(已完成) -bash scripts/finding-helper.sh - -# 查看总结 -cat .github/finding-reports/finding-summary-20260504.md - -# 查看详细报告 -cat .github/finding-reports/test-coverage-20260504.md -cat .github/finding-reports/asr-analysis-20260504.md -``` - -### 2. 阅读工作流程 -```bash -# 查看工作流程文档 -cat .github/COOPER_WORKFLOW.md - -# 查看贡献策略 -cat .github/COOPER_CONTRIBUTION_STRATEGY.md -``` - -### 3. 开始第一个任务 -```bash -# 创建分支 -git checkout -b feat/asr-correction - -# 开始实现混淆词纠错层 -# 参考 EPIC-002 Phase 1 的任务清单 -``` - -## 📊 当前状态 - -**Finding 阶段完成度**: -- ✅ 测试覆盖率分析 -- ✅ ASR 模块分析 -- ✅ 依赖关系分析 -- ✅ 生成 Finding 报告 -- ⏳ 更新 EPIC 文档(下一步) - -**关键指标**: -- 包含测试的文件数: 15 -- 测试函数数: 76 -- 核心模块数: 17 -- ASR 模块代码量: 1164 行 - -## 🎯 下一步行动 - -### 立即开始(本周) -1. ✅ 运行 finding-helper.sh 生成报告 -2. ⏳ 阅读 3 份 finding 报告 -3. ⏳ 更新 EPIC-001 和 EPIC-002 的 Finding 任务状态 -4. ⏳ 开始实现混淆词纠错层(快速产出) - -### 短期计划(Week 2-3) -- 为 recorder.rs 补测试 -- 为 asr/frame.rs 补测试 -- 编写测试规范文档 - -### 中期计划(Week 4-6) -- 完成本地 ASR 技术选型 -- 实现本地 ASR 支持 -- 建立 CI 自动化测试 - -## 🔄 工作流程 - -``` -Finding 阶段 - ↓ -实施阶段(在 fork 中) - ↓ -Review 阶段(自我 review) - ↓ -向上游提交 PR - ↓ -定期同步上游 -``` - -详细流程见 `.github/COOPER_WORKFLOW.md` - -## 📝 Commit 规范 - -```bash -# 格式 -(): - -# 示例 -test(recorder): add unit tests for PCM data collection -feat(asr): add correction layer for homophones -docs(testing): add testing guidelines -``` - -## 🔗 相关资源 - -- **上游仓库**: https://github.com/appergb/openless -- **你的 fork**: https://github.com/Cooper-X-Oak/openless -- **项目文档**: `CLAUDE.md` -- **开发文档**: `docs/openless-development.md` - -## 💡 提示 - -- 所有工作先在 fork 中验证,成熟后再向上游提交 -- 每个 Phase 对应一个 PR -- 定期运行 `finding-helper.sh` 更新分析报告 -- 保持与上游同步(每周一次) - ---- - -**创建时间**: 2026-05-04 -**负责人**: Cooper -**当前阶段**: Finding -**下一个里程碑**: 完成混淆词纠错层(Week 1) diff --git a/.github/COOPER_WORKFLOW.md b/.github/COOPER_WORKFLOW.md deleted file mode 100644 index 45e21c22..00000000 --- a/.github/COOPER_WORKFLOW.md +++ /dev/null @@ -1,256 +0,0 @@ -# Cooper 的贡献工作流程 - -> **策略**:在自己的 fork 仓库(Cooper-X-Oak/openless)中进行探索和规划,成熟后再向上游(appergb/openless)提交。 - ---- - -## 📋 两大母体 EPIC - -### EPIC-001: 测试基础设施建设 -- **文件**:`.github/issues/EPIC-001-testing-infrastructure.md` -- **目标**:测试覆盖率 0% → 60%+ -- **任务数**:41 个子任务 -- **预计时间**:6 周 - -### EPIC-002: ASR 功能扩展与优化 -- **文件**:`.github/issues/EPIC-002-asr-enhancement.md` -- **目标**:混淆词纠错 + 本地 ASR 支持 -- **任务数**:71 个子任务 -- **预计时间**:6 周 - ---- - -## 🔄 工作流程 - -### 阶段 1: Finding(当前) - -**目标**:深入调研,发现所有相关问题,填充到母体 EPIC 中。 - -#### 测试基础设施 Finding -```bash -# F1.1 审查现有测试 -find openless-all/app/src-tauri/src -name "*.rs" -exec grep -l "#\[cfg(test)\]" {} \; -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml --lib -- --list - -# F1.2 识别关键测试场景 -# 读取核心模块代码,列出需要测试的函数和场景 - -# F1.3 分析模块依赖 -# 绘制依赖图,确定哪些需要 mock - -# F1.4 调研测试工具 -# 评估 mockall, proptest, criterion 等工具 - -# F1.5 建立测试规范 -# 编写 docs/testing-guidelines.md -``` - -#### ASR 功能扩展 Finding -```bash -# F1.1 收集 ASR 错词样本 -# 从 issues、用户反馈、自己测试中收集 - -# F2.1 对比本地 ASR 技术栈 -# 调研 whisper.cpp, sherpa-onnx, faster-whisper -# 测试性能、延迟、跨平台兼容性 - -# F2.7 编写技术方案 -# docs/local-asr-plan.md -``` - -**产出**: -- 完善的子任务清单 -- 技术方案文档 -- 风险评估 - ---- - -### 阶段 2: 实施 - -**原则**: -- 每个子任务对应一个 commit -- 每个 Phase 对应一个 PR(在 fork 中) -- 重要功能先在 fork 中验证,再向上游提交 - -#### 分支策略 -```bash -# 从 main 创建 feature 分支 -git checkout main -git pull origin main -git checkout -b feat/testing-recorder # 测试相关 -git checkout -b feat/asr-correction # ASR 纠错 -git checkout -b feat/asr-local-whisper # 本地 ASR - -# 在 fork 中创建 PR -gh pr create --repo Cooper-X-Oak/openless --base main - -# 验证通过后,向上游提交 -gh pr create --repo appergb/openless --base main -``` - -#### Commit 规范 -```bash -# 格式:(): -# type: feat, fix, test, docs, refactor, perf, chore -# scope: 模块名(recorder, asr, coordinator, etc.) - -# 示例 -git commit -m "test(recorder): add unit tests for PCM data collection" -git commit -m "feat(asr): add correction layer for homophones" -git commit -m "docs(testing): add testing guidelines" -``` - ---- - -### 阶段 3: Review - -**自我 Review 清单**: -- [ ] 代码符合项目规范(CLAUDE.md) -- [ ] 添加了测试(如果是功能代码) -- [ ] 更新了文档(如果改变了行为) -- [ ] 通过了 `cargo check` 和 `cargo test` -- [ ] 通过了 `npm run build`(如果改了前端) -- [ ] 提交信息清晰 - -**向上游提交前**: -- [ ] 在 fork 中验证至少 1 周 -- [ ] 自己实机测试通过 -- [ ] 写了详细的 PR 描述 -- [ ] 关联了相关 issues - ---- - -### 阶段 4: 同步上游 - -**定期同步**(每周一次): -```bash -# 拉取上游更新 -git checkout main -git pull origin main - -# 推送到 fork -git push fork main - -# rebase feature 分支 -git checkout feat/testing-recorder -git rebase main -``` - ---- - -## 📊 进度追踪 - -### 使用 EPIC 文档追踪 -- 每完成一个子任务,在 EPIC 文档中打勾 `- [x]` -- 更新完成度百分比 -- 记录遇到的问题和解决方案 - -### 使用 GitHub Issues(fork 中) -```bash -# 为每个 Phase 创建 issue -gh issue create --repo Cooper-X-Oak/openless \ - --title "[Phase 1] 核心模块单元测试" \ - --body "参考 EPIC-001-testing-infrastructure.md Phase 1" - -# 关联 commits -git commit -m "test(recorder): add unit tests - -Refs Cooper-X-Oak/openless#1" -``` - ---- - -## 🎯 当前行动计划 - -### Week 1: Finding + 快速产出 -- [ ] **Day 1-2**:完成测试基础设施 Finding(F1.1-F1.5) -- [ ] **Day 3-4**:完成 ASR 功能扩展 Finding(F1.1-F1.5, F2.1-F2.7) -- [ ] **Day 5-7**:实现混淆词纠错层(EPIC-002 Phase 1) - -### Week 2-3: 测试基础建设 -- [ ] 为 recorder.rs 补测试 -- [ ] 为 asr/frame.rs 补测试 -- [ ] 为 persistence.rs 补测试 - -### Week 4-6: 本地 ASR 支持 -- [ ] 完成技术选型和方案设计 -- [ ] 实现模型管理 -- [ ] 实现本地推理 -- [ ] 跨平台测试 - ---- - -## 🔧 工具和脚本 - -### 测试覆盖率检查 -```bash -# 安装 cargo-llvm-cov -cargo install cargo-llvm-cov - -# 运行覆盖率测试 -cargo llvm-cov --manifest-path openless-all/app/src-tauri/Cargo.toml - -# 生成 HTML 报告 -cargo llvm-cov --html --manifest-path openless-all/app/src-tauri/Cargo.toml -``` - -### 代码质量检查 -```bash -# Rust 格式化 -cargo fmt --manifest-path openless-all/app/src-tauri/Cargo.toml --check - -# Rust linting -cargo clippy --manifest-path openless-all/app/src-tauri/Cargo.toml -- -D warnings - -# TypeScript 类型检查 -cd openless-all/app && npm run build -``` - ---- - -## 🚀 快速开始 - -### 1. 确认环境 -```bash -# 确认 git remote -git remote -v -# 应该看到: -# origin https://github.com/appergb/openless.git -# fork https://github.com/Cooper-X-Oak/openless.git - -# 确认当前分支 -git branch - -# 确认构建环境 -cd openless-all/app -npm ci -cargo check --manifest-path src-tauri/Cargo.toml -``` - -### 2. 开始 Finding -```bash -# 创建 finding 分支 -git checkout -b finding/testing-infrastructure - -# 开始调研,记录到 EPIC 文档 -# 编辑 .github/issues/EPIC-001-testing-infrastructure.md -``` - -### 3. 提交 Finding 结果 -```bash -# 提交 EPIC 文档更新 -git add .github/issues/ -git commit -m "docs(epic): complete finding phase for testing infrastructure" -git push fork finding/testing-infrastructure - -# 在 fork 中创建 PR(可选) -gh pr create --repo Cooper-X-Oak/openless \ - --title "[Finding] 测试基础设施调研完成" \ - --body "完成了 EPIC-001 的 Finding 阶段,识别了 41 个子任务" -``` - ---- - -**最后更新**:2026-05-04 -**负责人**:Cooper -**状态**:Finding 阶段 diff --git a/.github/MULTI_SCALE_AUDIT.md b/.github/MULTI_SCALE_AUDIT.md deleted file mode 100644 index 60ff2208..00000000 --- a/.github/MULTI_SCALE_AUDIT.md +++ /dev/null @@ -1,366 +0,0 @@ -# 多尺度审计框架(Multi-Scale Audit Framework) - -## 🎯 审计哲学 - -**核心原则**:从宏观到微观,从架构到细节,分层发现不同尺度的问题。 - -**为什么需要多尺度**: -- 系统级问题影响整体方向,优先级最高 -- 模块级问题影响可维护性和扩展性 -- 功能级问题影响用户体验 -- 代码级问题影响代码质量 - -**审计顺序**:先高尺度后低尺度,避免在低层次问题上浪费时间(可能因高层次重构而消失)。 - ---- - -## 📐 四个尺度 - -### 尺度 1: 系统级审计(System-Level Audit) -**关注点**:整体架构、技术债务、安全威胁模型、可扩展性瓶颈 - -**审计维度**: -- 架构设计合理性 -- 模块间依赖关系 -- 技术栈选型 -- 安全威胁面 -- 性能瓶颈 -- 可扩展性限制 -- 技术债务全局视图 - -**产出**: -- 架构风险地图 -- 技术债务优先级矩阵 -- 安全威胁模型 -- 扩展性瓶颈分析 - -**时间**:2-3 天 - ---- - -### 尺度 2: 模块级审计(Module-Level Audit) -**关注点**:单个模块的设计、模块内部架构、模块间接口 - -**审计维度**: -- 模块职责是否单一 -- 模块内部分层是否清晰 -- 模块间接口是否稳定 -- 模块依赖是否合理 -- 模块可测试性 -- 模块文档完整性 - -**产出**: -- 模块健康度评分 -- 模块重构优先级 -- 模块接口规范 -- 模块依赖图 - -**时间**:3-5 天(每个核心模块 0.5-1 天) - ---- - -### 尺度 3: 功能级审计(Feature-Level Audit) -**关注点**:单个功能的实现、用户体验、边界条件处理 - -**审计维度**: -- 功能完整性 -- 边界条件处理 -- 错误处理 -- 用户体验 -- 性能表现 -- 测试覆盖 - -**产出**: -- 功能缺陷清单 -- 边界条件测试用例 -- 性能优化建议 -- 用户体验改进点 - -**时间**:5-7 天(每个功能 0.5-1 天) - ---- - -### 尺度 4: 代码级审计(Code-Level Audit) -**关注点**:具体的 bug、代码风格、性能优化点 - -**审计维度**: -- 代码风格一致性 -- 命名规范 -- 注释质量 -- 代码复杂度 -- 潜在 bug -- 性能热点 - -**产出**: -- 代码质量报告 -- Bug 清单 -- 性能优化点 -- 重构建议 - -**时间**:持续进行(结合日常开发) - ---- - -## 🔍 两大 EPIC 的多尺度审计计划 - -### EPIC-001: 测试基础设施建设 - -#### 尺度 1: 系统级审计 -**问题**: -- [ ] 项目是否有测试策略?(单元测试、集成测试、E2E 测试的比例) -- [ ] 测试基础设施是否支持 CI/CD? -- [ ] 测试数据管理策略是否清晰? -- [ ] Mock 策略是否统一? -- [ ] 测试环境隔离是否充分? - -**产出**: -- 测试策略文档 -- 测试基础设施架构图 -- 测试技术栈选型报告 - -#### 尺度 2: 模块级审计 -**问题**: -- [ ] 哪些模块完全没有测试? -- [ ] 哪些模块测试覆盖率低于 50%? -- [ ] 哪些模块的测试质量差(只测 happy path)? -- [ ] 哪些模块难以测试(耦合度高、依赖外部服务)? -- [ ] 哪些模块的测试运行缓慢? - -**产出**: -- 模块测试覆盖率矩阵 -- 模块可测试性评分 -- 模块测试优先级排序 - -#### 尺度 3: 功能级审计 -**问题**: -- [ ] 录音功能的边界条件是否有测试?(设备不可用、权限拒绝、超时) -- [ ] ASR 功能的错误处理是否有测试?(网络失败、服务不可用) -- [ ] 文本插入功能的降级策略是否有测试?(AX 失败 → clipboard) -- [ ] 凭据管理的安全性是否有测试?(Keychain 失败 → JSON fallback) - -**产出**: -- 功能测试用例清单 -- 边界条件测试矩阵 -- 错误处理测试覆盖 - -#### 尺度 4: 代码级审计 -**问题**: -- [ ] 现有测试代码质量如何? -- [ ] 测试命名是否清晰? -- [ ] 测试是否独立(不依赖执行顺序)? -- [ ] 测试是否可重复(不依赖外部状态)? -- [ ] 测试断言是否充分? - -**产出**: -- 测试代码质量报告 -- 测试重构建议 - ---- - -### EPIC-002: ASR 功能扩展与优化 - -#### 尺度 1: 系统级审计 -**问题**: -- [ ] ASR 模块的架构是否支持多 provider? -- [ ] ASR 模块是否有统一的接口抽象? -- [ ] ASR 模块的扩展性如何?(添加新 provider 的成本) -- [ ] ASR 模块的可观测性如何?(日志、指标、追踪) -- [ ] ASR 模块的错误处理策略是否统一? -- [ ] ASR 模块的性能瓶颈在哪里? - -**产出**: -- ASR 架构设计文档 -- ASR 扩展性评估报告 -- ASR 性能瓶颈分析 -- ASR 重构方案 - -#### 尺度 2: 模块级审计 -**问题**: -- [ ] Volcengine ASR 模块的职责是否单一? -- [ ] Whisper ASR 模块的设计是否合理? -- [ ] frame.rs 的二进制协议是否稳定? -- [ ] AudioConsumer trait 是否足够抽象? -- [ ] 各 ASR provider 之间是否有重复代码? - -**产出**: -- ASR 模块健康度评分 -- ASR 模块重构优先级 -- ASR 模块接口规范 - -#### 尺度 3: 功能级审计 -**问题**: -- [ ] 混淆词纠错功能的需求是否清晰? -- [ ] 混淆词纠错的插入位置是否合理? -- [ ] 混淆词纠错的上下文判断策略是什么? -- [ ] 本地 ASR 的模型下载策略是什么? -- [ ] 本地 ASR 的模型管理策略是什么? -- [ ] 本地 ASR 的性能要求是什么? - -**产出**: -- 混淆词纠错功能设计文档 -- 本地 ASR 技术方案 -- 功能需求清单 - -#### 尺度 4: 代码级审计 -**问题**: -- [ ] Volcengine ASR 的 WebSocket 处理是否健壮? -- [ ] Whisper ASR 的 HTTP 请求是否有超时? -- [ ] frame.rs 的序列化是否有边界检查? -- [ ] AudioConsumer 的并发安全性如何? - -**产出**: -- 代码质量问题清单 -- Bug 修复优先级 -- 性能优化建议 - ---- - -## 📊 审计执行流程 - -### Phase 1: 系统级审计(Week 1, Day 1-3) -```bash -# 运行系统级审计脚本 -bash scripts/audit-system-level.sh - -# 产出 -.github/audit-reports/system-level/ -├── architecture-risk-map.md -├── tech-debt-matrix.md -├── security-threat-model.md -└── scalability-bottlenecks.md -``` - -**决策点**:是否需要架构重构?如果需要,停止低尺度审计,先做架构设计。 - ---- - -### Phase 2: 模块级审计(Week 1, Day 4-7 + Week 2, Day 1-2) -```bash -# 运行模块级审计脚本 -bash scripts/audit-module-level.sh - -# 产出 -.github/audit-reports/module-level/ -├── module-health-scores.md -├── module-refactor-priority.md -├── module-interface-spec.md -└── module-dependency-graph.md -``` - -**决策点**:哪些模块需要重构?重构优先级如何? - ---- - -### Phase 3: 功能级审计(Week 2, Day 3-7 + Week 3, Day 1-2) -```bash -# 运行功能级审计脚本 -bash scripts/audit-feature-level.sh - -# 产出 -.github/audit-reports/feature-level/ -├── feature-defects.md -├── boundary-test-cases.md -├── performance-suggestions.md -└── ux-improvements.md -``` - -**决策点**:哪些功能需要补充?哪些功能需要优化? - ---- - -### Phase 4: 代码级审计(持续进行) -```bash -# 运行代码级审计脚本 -bash scripts/audit-code-level.sh - -# 产出 -.github/audit-reports/code-level/ -├── code-quality-report.md -├── bug-list.md -├── performance-hotspots.md -└── refactor-suggestions.md -``` - -**决策点**:哪些代码需要立即修复?哪些可以延后? - ---- - -## 🎯 审计产出整合 - -### 风险地图(Risk Map) -``` -高风险 | 系统级问题 | 模块级问题 | 功能级问题 | 代码级问题 -───────┼───────────┼───────────┼───────────┼────────── -架构 | 缺少统一 | coordinator| 录音超时 | WebSocket - | ASR trait | 3462 行 | 处理不当 | 错误处理 -───────┼───────────┼───────────┼───────────┼────────── -测试 | 无测试策略 | 15 模块 | 边界条件 | 测试命名 - | | 覆盖率低 | 缺失 | 不清晰 -───────┼───────────┼───────────┼───────────┼────────── -安全 | 凭据存储 | Keychain | 凭据泄露 | 日志打印 - | 威胁模型 | 威胁 | 风险 | 敏感信息 -``` - -### 优先级矩阵(Priority Matrix) -``` -影响 ↑ - │ -高 │ [系统级] [模块级] - │ 架构重构 coordinator - │ 拆分 - │ -中 │ [功能级] [代码级] - │ 混淆词纠错 Bug 修复 - │ -低 │ - └─────────────────────→ 紧急度 - 低 中 高 -``` - ---- - -## 🔧 审计工具链 - -### 系统级审计工具 -- 架构可视化:`cargo-modules` -- 依赖分析:`cargo-tree` -- 安全扫描:`cargo-audit` - -### 模块级审计工具 -- 代码度量:`tokei` -- 复杂度分析:`cargo-geiger` -- 依赖图:`cargo-depgraph` - -### 功能级审计工具 -- 测试覆盖率:`cargo-llvm-cov` -- 性能分析:`cargo-flamegraph` -- 内存分析:`valgrind` - -### 代码级审计工具 -- 代码检查:`cargo-clippy` -- 格式检查:`cargo-fmt` -- 死代码检测:`cargo-udeps` - ---- - -## 📝 下一步行动 - -1. **创建审计脚本**: - - `scripts/audit-system-level.sh` - - `scripts/audit-module-level.sh` - - `scripts/audit-feature-level.sh` - - `scripts/audit-code-level.sh` - -2. **执行系统级审计**(优先): - - 测试基础设施的系统级问题 - - ASR 模块的系统级问题 - -3. **根据系统级审计结果决定**: - - 是否需要架构重构 - - 是否继续低尺度审计 - ---- - -**创建时间**: 2026-05-04 -**审计哲学**: 从宏观到微观,从架构到细节 -**预计时间**: 系统级 3 天 + 模块级 5 天 + 功能级 7 天 = 15 天 diff --git a/.github/P1_TEST_REPORT.md b/.github/P1_TEST_REPORT.md deleted file mode 100644 index 323bb155..00000000 --- a/.github/P1_TEST_REPORT.md +++ /dev/null @@ -1,173 +0,0 @@ -# P1 测试报告:麦克风异常恢复测试 - -## 测试时间 -2026-05-04 14:16:07 - 14:16:12 - -## 测试场景 -模拟 Issue #238 的真实场景:录音回调在运行过程中突然静默停止 - -## 测试方法 -在 `process_callback` 中添加测试代码,让回调在执行 100 次后静默停止(不调用 consumer,不更新时间戳) - -## 测试结果 - -### ✅ 完整时间线 - -**14:16:07.051** - cb#100 正常执行 -- 前 100 次回调正常工作 -- 音频数据正常处理 -- 时间戳正常更新 - -**14:16:07.051 之后** - 回调开始静默停止 -- 回调函数仍在被调用(CPAL 层面) -- 但 `return` 提前退出,不处理音频 -- 不调用 `consumer.consume_pcm_chunk()` -- **关键**:不更新 `last_callback_time` 时间戳 - -**14:16:12.056** - **Watchdog 检测到异常(4秒后)** -``` -[ERROR] [recorder] watchdog: 录音回调已停止 4 秒,触发错误恢复 -``` - -**14:16:12.056** - **Coordinator 接收到错误** -``` -[ERROR] [coord] recorder runtime error: audio engine failed: 录音回调静默停止 4 秒 -``` - -### 📊 关键指标 - -| 指标 | 预期值 | 实际值 | 结果 | -|------|--------|--------|------| -| 回调停止检测阈值 | 3 秒 | 3 秒 | ✅ | -| 实际检测时间 | ~3 秒 | ~4 秒 | ✅ (在阈值范围内) | -| Watchdog 触发 | 是 | 是 | ✅ | -| 错误传播到 Coordinator | 是 | 是 | ✅ | -| 错误消息准确性 | 准确 | "录音回调静默停止 4 秒" | ✅ | - -### 🔍 详细分析 - -#### 1. Watchdog 检测机制验证 - -**工作原理**: -- Watchdog 每 1 秒检查一次 `last_callback_time` -- 如果 `last_callback_time.elapsed() > 3 秒`,触发错误 - -**实际表现**: -- ✅ 检测到回调停止 -- ✅ 在 4 秒时触发(3 秒阈值 + 1 秒检查间隔) -- ✅ 误差在合理范围内(检查间隔导致) - -**为什么是 4 秒而不是 3 秒?** -- Watchdog 每 1 秒检查一次 -- 回调在某个时刻停止 -- 下一次检查时,elapsed 可能是 3.x 秒 -- 再下一次检查时,elapsed 是 4.x 秒,触发阈值 - -这是正常的,因为检查不是实时的。 - -#### 2. 错误传播链路验证 - -**完整链路**: -``` -Recorder 回调停止 - ↓ -last_callback_time 不再更新 - ↓ -Watchdog 检测到 elapsed > 3 秒 - ↓ -发送 RecorderError::EngineFailed 到 runtime_error_tx - ↓ -Coordinator 的 spawn_recorder_error_monitor 接收 - ↓ -日志记录: "[coord] recorder runtime error: ..." - ↓ -调用 abort_recording_with_error() - ↓ -恢复胶囊状态到 Idle -``` - -**验证结果**: -- ✅ 错误成功从 Recorder 传播到 Coordinator -- ✅ 错误消息准确:"录音回调静默停止 4 秒" -- ✅ 日志记录完整 - -#### 3. 用户体验验证 - -**预期行为**: -1. 用户开始录音 -2. 约 1 秒后(100 次回调),回调静默停止 -3. 约 4 秒后,应用自动停止录音 -4. 胶囊显示错误状态:"录音中断: audio engine failed: 录音回调静默停止 4 秒" -5. 2 秒后胶囊自动隐藏 -6. 用户可以继续使用应用 - -**实际表现**: -- ✅ 应用在 4 秒后自动停止 -- ✅ 错误消息清晰 -- ✅ 用户可以继续使用(从后续日志看到新的热键事件) - -### 🎯 测试结论 - -**✅ P1 测试完全通过** - -1. ✅ **Watchdog 正确检测回调停止** - - 检测时间:4 秒(3 秒阈值 + 检查间隔) - - 误差在合理范围内 - -2. ✅ **错误传播链路完整** - - Recorder → Coordinator 传播成功 - - 错误消息准确 - -3. ✅ **用户体验良好** - - 自动恢复,无需手动干预 - - 错误消息清晰 - - 可以继续使用应用 - -4. ✅ **修复有效** - - 成功解决 Issue #238 的核心问题 - - 胶囊不再卡在 Processing 状态 - - 应用能够自动恢复 - -### 📝 与 Issue #238 的对比 - -**Issue #238 的问题**: -- 录音器异常停止 -- ASR 等待 12 秒超时 -- 胶囊卡在 processing 状态 -- 用户无法继续使用 - -**修复后的表现**: -- ✅ Watchdog 在 4 秒内检测到 -- ✅ 立即触发错误恢复(不等 ASR 超时) -- ✅ 胶囊正确恢复到 Idle -- ✅ 用户可以继续使用 - -**改进**: -- 检测时间从 12 秒缩短到 4 秒(提升 67%) -- 用户体验显著改善 - -### 🔧 优化建议(可选) - -如果想进一步优化检测速度,可以考虑: - -1. **减少 Watchdog 检查间隔** - - 当前:1 秒 - - 可选:500ms - - 权衡:更快检测 vs 更多 CPU 开销 - -2. **减少回调停止阈值** - - 当前:3 秒 - - 可选:2 秒 - - 权衡:更快检测 vs 误报风险 - -**建议**:保持当前配置(1 秒检查间隔 + 3 秒阈值) -- 4 秒检测时间已经足够快 -- 不会误报 -- CPU 开销小 - ---- - -**测试人员**: Claude Sonnet 4.6 -**测试分支**: fix/recorder-timeout-238 -**测试方法**: 代码注入 + 日志分析 -**测试结果**: ✅ 完全通过 diff --git a/.github/TEST_VERIFICATION.md b/.github/TEST_VERIFICATION.md deleted file mode 100644 index 6b418bc7..00000000 --- a/.github/TEST_VERIFICATION.md +++ /dev/null @@ -1,235 +0,0 @@ -# Issue #238 修复验证报告 - -## 修复内容总结 - -本次修复解决了"录音器异常停止后触发 ASR 超时,导致胶囊无响应"的问题,包含以下 4 个关键修复: - -1. **Recorder Liveness Watchdog** - 检测录音回调静默停止 -2. **Coordinator 全局超时保护** - 15秒兜底超时,确保胶囊状态恢复 -3. **ASR 资源清理** - 超时时调用 `asr.cancel()` 清理 WebSocket -4. **Watchdog 计时优化** - 从 `stream.play()` 后开始计时,避免慢启动误报 - -## 代码审查验证 - -### 1. Recorder Watchdog 逻辑验证 - -**文件**: `openless-all/app/src-tauri/src/recorder.rs` - -**关键代码位置**: 第 144-183 行 - -**验证点**: - -✅ **Watchdog 线程启动时机**: 在 `stream.play()` 成功后启动(line 144) -- 确保只有在音频流真正开始后才开始监控 -- 避免将设备初始化时间计入超时预算 - -✅ **计时起点正确**: 使用 `watchdog_start_time = Instant::now()`(line 147) -- 不依赖 `StreamState::stream_start_time` -- 从 watchdog 真正开始监控时计时 - -✅ **双模式检测**: -- **None 分支**(line 166-179): 检测"首次回调永远不到达" - - 使用 `watchdog_start_time.elapsed()` - - 超时阈值: 5 秒 - - 错误消息: "录音启动后 5 秒内未收到回调" - -- **Some 分支**(line 152-164): 检测"回调中途停止" - - 使用 `last_time.elapsed()` - - 超时阈值: 3 秒 - - 错误消息: "录音回调静默停止 X 秒" - -✅ **时间戳更新**: `process_callback` 在成功调用 consumer 后更新(recorder.rs:389) -- 只有在真正处理音频数据后才更新时间戳 -- 避免空数据导致的误判 - -✅ **错误通知**: 通过 `runtime_error_tx` 发送 `RecorderError::EngineFailed` -- 错误会传播到 coordinator -- 触发胶囊状态恢复 - -### 2. Coordinator 全局超时验证 - -**文件**: `openless-all/app/src-tauri/src/coordinator.rs` - -**关键代码位置**: -- Dictation 路径: 第 1367-1403 行 -- QA 路径: 第 2288-2310 行 - -**验证点**: - -✅ **超时时间设置**: `COORDINATOR_GLOBAL_TIMEOUT_SECS = 15`(line 30) -- 比 ASR 的 12 秒超时稍长 -- 作为最后的防线,只在 ASR 超时机制失效时触发 - -✅ **Dictation 路径超时保护**(line 1368-1403): -- 使用 `tokio::time::timeout` 包装 `await_final_result()` -- **成功路径**: `Ok(Ok(r))` - 正常返回结果 -- **ASR 错误路径**: `Ok(Err(e))` - ASR 报告错误,恢复状态 -- **全局超时路径**: `Err(_)` - 15秒超时,强制恢复 - -✅ **QA 路径超时保护**(line 2288-2310): -- 相同的超时逻辑 -- 使用 `finish_qa_with_error` 恢复 QA 状态 - -✅ **超时时的资源清理**: -- **关键**: 调用 `asr.cancel()`(line 1393, 2304) -- 清理 WebSocket 连接和 worker 线程 -- 防止资源泄漏 - -✅ **状态恢复完整性**: -- 发送 Error 胶囊事件 -- 恢复 Windows IME session -- 设置 phase 为 Idle -- 调度胶囊自动隐藏 - -### 3. 错误传播路径验证 - -**完整的错误传播链**: - -``` -Recorder 回调停止 - ↓ -Watchdog 检测到(3秒或5秒) - ↓ -发送 RecorderError::EngineFailed 到 runtime_error_tx - ↓ -Coordinator 的 recorder_error_rx 接收 - ↓ -调用 handle_recorder_error() - ↓ -取消 ASR session - ↓ -恢复胶囊状态到 Idle -``` - -**验证**: 检查 `coordinator.rs` 中的错误监听实现 - -✅ **Dictation 路径错误监听**(line 1146-1148): -- Recorder 启动时返回 `runtime_errors` channel -- 调用 `spawn_recorder_error_monitor` 启动监听线程 - -✅ **QA 路径错误监听**(line 2212-2216): -- QA 录音同样启动 `spawn_qa_recorder_error_monitor` -- 使用独立的 session_id 守卫 - -✅ **错误监听器实现**(line 1173-1197): -- 捕获 session_id,防止处理过期事件 -- 接收到错误后调用 `abort_recording_with_error` -- 日志记录: `"[coord] recorder runtime error: {err}"` - -✅ **错误中止实现**(line 1226-1250): -- 调用 `begin_recording_abort_before_restore` 获取中止上下文 -- 清理启动资源: `discard_startup_resources_for_session` -- 恢复 Windows IME session -- 发送 Error 胶囊事件 -- 恢复状态到 Idle - -### 4. 边界情况分析 - -#### 4.1 慢启动设备 - -**场景**: 设备初始化需要 2 秒,`stream.play()` 需要 1 秒 - -**预期行为**: -- ✅ Watchdog 从 `stream.play()` 成功后开始计时 -- ✅ 5 秒预算完全用于等待首次回调 -- ✅ 不会因为设备慢而误报 - -**验证**: `watchdog_start_time` 在 watchdog 线程内部初始化(line 147) - -#### 4.2 长时间静音 - -**场景**: 用户触发录音但不说话,保持 10 秒 - -**预期行为**: -- ✅ 回调持续执行(即使是静音数据) -- ✅ `last_callback_time` 持续更新 -- ✅ 不触发 watchdog 超时 -- ✅ 正常完成识别流程 - -**验证**: `process_callback` 在处理任何非空数据后都会更新时间戳 - -#### 4.3 网络中断 - -**场景**: ASR WebSocket 连接失败或中断 - -**预期行为**: -- ✅ ASR 层报告错误或超时(12秒) -- ✅ 如果 ASR 超时机制失效,全局超时在 15 秒触发 -- ✅ 调用 `asr.cancel()` 清理资源 -- ✅ 胶囊恢复到 Idle - -**验证**: 全局超时的 `Err(_)` 分支包含 `asr.cancel()` 调用 - -#### 4.4 快速开关 - -**场景**: 快速启动/停止录音 5 次 - -**预期行为**: -- ✅ 每次停止时 `stop_flag` 设置为 true -- ✅ Watchdog 线程检测到 stop_flag 并退出 -- ✅ 主线程等待 watchdog 退出(line 194-196) -- ✅ 不会有多个 watchdog 线程同时运行 - -**验证**: `run_audio_thread` 在退出前等待 watchdog(line 194-196) - -## 潜在风险评估 - -### 低风险 - -1. **正常流程不受影响**: 所有修改都是防御性的,不改变正常路径 -2. **超时阈值保守**: 5秒/3秒/15秒都足够宽松,不会误报 -3. **资源清理完整**: 超时时正确调用 `asr.cancel()` - -### 需要运行时验证的场景 - -以下场景需要在真实环境中测试,无法通过代码审查完全验证: - -1. **CPAL 回调真的会静默停止吗?** - - 需要在 Windows 上复现 Issue #238 的场景 - - 验证 watchdog 能否检测到 - -2. **Watchdog 线程的性能影响** - - 每秒检查一次,理论上开销很小 - - 需要在低端设备上验证 - -3. **多次超时恢复的稳定性** - - 连续触发 10 次超时,观察是否有资源泄漏 - - 验证状态机是否始终能恢复 - -## 代码质量评估 - -### 优点 - -✅ **防御深度**: 三层防护(Recorder watchdog → ASR timeout → Coordinator global timeout) -✅ **错误传播清晰**: 通过 channel 传递错误,不依赖共享状态 -✅ **资源清理完整**: 超时时调用 `asr.cancel()` -✅ **日志完善**: 每个关键路径都有日志输出 -✅ **计时准确**: Watchdog 从正确的时间点开始计时 - -### 改进建议 - -💡 **可选**: 添加 metrics 统计 -- 记录 watchdog 触发次数 -- 记录全局超时触发次数 -- 帮助监控线上问题 - -💡 **可选**: 可配置的超时阈值 -- 允许用户在设置中调整超时时间 -- 适应不同性能的设备 - -## 结论 - -**代码审查结果**: ✅ **通过** - -所有关键逻辑都已正确实现: -1. ✅ Watchdog 从正确的时间点开始计时 -2. ✅ 双模式检测覆盖所有故障场景 -3. ✅ 全局超时作为最后防线 -4. ✅ 资源清理完整,无泄漏风险 -5. ✅ 错误传播路径清晰 -6. ✅ 边界情况处理正确 - -**建议**: -- 可以直接向上游提交 PR -- 在 PR 描述中说明需要在 Windows 上测试验证 -- 如果维护者反馈有问题,再根据实际情况调整 diff --git a/.github/WATCHDOG_RISK_ANALYSIS.md b/.github/WATCHDOG_RISK_ANALYSIS.md deleted file mode 100644 index 6879b38d..00000000 --- a/.github/WATCHDOG_RISK_ANALYSIS.md +++ /dev/null @@ -1,481 +0,0 @@ -# Watchdog 线程影响分析与风险评估 - -## 问题背景 - -引入 watchdog 线程后,需要评估对系统其他部分的影响,特别是: -1. 线程生命周期管理 -2. 与其他组件(ASR、LLM、Coordinator)的交互 -3. 并发安全性 -4. 资源泄漏风险 - -## 当前实现分析 - -### 1. Watchdog 线程生命周期 - -**启动**(recorder.rs:144-186): -```rust -let watchdog_handle = thread::Builder::new() - .name("openless-recorder-watchdog".into()) - .spawn(move || { - while !stop_flag_for_watchdog.load(Ordering::SeqCst) { - thread::sleep(Duration::from_millis(1000)); - // 检查逻辑... - if 检测到异常 { - runtime_error_tx_for_watchdog.send(...); - break; // 只报告一次 - } - } - }) - .ok(); -``` - -**退出**(recorder.rs:197-199): -```rust -if let Some(handle) = watchdog_handle { - let _ = handle.join(); -} -``` - -### 2. 退出条件 - -Watchdog 线程有 **3 种退出方式**: - -1. **正常退出**:`stop_flag` 被设置为 true - - 用户停止录音 - - 主线程设置 `stop_flag` - - Watchdog 检测到并退出循环 - -2. **检测到异常**:发送错误后 `break` - - 回调停止超过 3 秒 - - 首次回调超过 5 秒未到达 - - 发送错误到 `runtime_error_tx` - - 立即 `break` 退出循环 - -3. **线程 panic**(理论上不会发生) - - 代码中没有可能 panic 的操作 - - 所有操作都是安全的 - -## 潜在风险分析 - -### ⚠️ 风险 1:Watchdog 触发后的竞态条件 - -**场景**: -1. Watchdog 检测到异常,发送错误(line 163) -2. Watchdog 立即 `break` 退出(line 166) -3. 主线程收到错误,开始清理 -4. **但此时 CPAL 回调可能仍在执行** - -**问题**: -- Watchdog 退出后,`last_callback_time` 可能仍在被更新 -- 主线程可能在清理资源时,回调线程仍在访问 - -**当前代码的保护**: -```rust -// 主线程等待 stop_flag -while !stop_flag.load(Ordering::SeqCst) { - thread::sleep(Duration::from_millis(50)); -} - -// Stream 在 drop 时自动停止 -drop(stream); - -// 等待 watchdog 退出 -if let Some(handle) = watchdog_handle { - let _ = handle.join(); -} -``` - -**分析**: -- ✅ 主线程会等待 `stop_flag` 被设置 -- ✅ `drop(stream)` 会停止 CPAL 回调 -- ✅ 然后才等待 watchdog 退出 -- ⚠️ **但 watchdog 可能在 `stop_flag` 被设置之前就退出了** - -**潜在问题**: -``` -时间线: -T0: Watchdog 检测到异常 -T1: Watchdog 发送错误,break 退出 -T2: Coordinator 收到错误,调用 recorder.stop() -T3: recorder.stop() 设置 stop_flag -T4: 主线程检测到 stop_flag,开始清理 -T5: drop(stream) 停止回调 -T6: 等待 watchdog.join() - -问题:T1-T5 之间,watchdog 已经退出,但回调可能仍在执行 -``` - -**影响评估**: -- **低风险**:CPAL 回调和 watchdog 访问的是不同的数据 - - 回调更新 `last_callback_time` - - Watchdog 只读取 `last_callback_time` - - 使用 `Mutex` 保护,并发安全 -- **无数据竞争**:即使 watchdog 退出,回调继续执行也是安全的 - -### ⚠️ 风险 2:多次录音的 Watchdog 累积 - -**场景**: -用户快速启动/停止录音多次 - -**问题**: -- 每次启动录音都会创建新的 watchdog 线程 -- 如果旧的 watchdog 没有正确退出,可能累积 - -**当前代码的保护**: -```rust -// 每次录音都在新线程中运行 -thread::Builder::new() - .name("openless-recorder".into()) - .spawn(move || { - // 创建 watchdog - let watchdog_handle = ...; - - // 等待停止 - while !stop_flag.load(...) { ... } - - // 等待 watchdog 退出 - if let Some(handle) = watchdog_handle { - let _ = handle.join(); - } - }) -``` - -**分析**: -- ✅ 每个录音线程都会等待自己的 watchdog 退出 -- ✅ `join()` 确保 watchdog 完全退出后才返回 -- ✅ 不会累积 - -**影响评估**: -- **无风险**:设计正确,不会累积 - -### ⚠️ 风险 3:Watchdog 错误与 Coordinator 超时的交互 - -**场景**: -1. Watchdog 在 4 秒时检测到异常,发送错误 -2. Coordinator 收到错误,开始清理 -3. 但 Coordinator 的全局超时(15 秒)仍在运行 - -**问题**: -- 两个超时机制可能同时触发 -- 可能导致重复的错误处理 - -**当前代码的保护**: - -**Coordinator 的错误监听**(coordinator.rs:1173-1197): -```rust -fn spawn_recorder_error_monitor(inner: &Arc, rx: mpsc::Receiver) { - let captured_session_id = inner.state.lock().session_id; - thread::spawn(move || { - if let Ok(err) = rx.recv() { - let current_session_id = inner.state.lock().session_id; - if captured_session_id != current_session_id { - // 过期事件,丢弃 - return; - } - abort_recording_with_error(&inner, format!("录音中断: {err}")); - } - }) -} -``` - -**Coordinator 的全局超时**(coordinator.rs:1368-1403): -```rust -match tokio::time::timeout(15秒, asr.await_final_result()).await { - Ok(Ok(r)) => r, - Ok(Err(e)) => { /* ASR 错误 */ } - Err(_) => { /* 全局超时 */ } -} -``` - -**分析**: -- ✅ Watchdog 错误会立即触发 `abort_recording_with_error` -- ✅ `abort_recording_with_error` 会改变 `phase` 状态 -- ⚠️ **但全局超时仍在等待 `await_final_result()`** - -**潜在问题**: -``` -时间线: -T0: 录音开始 -T4: Watchdog 检测到异常,发送错误 -T4: Coordinator 收到错误,调用 abort_recording_with_error -T4: phase 变为 Idle -T15: 全局超时触发(如果 await_final_result 仍在等待) -``` - -**影响评估**: -- **中风险**:可能导致重复的错误处理 -- **但实际影响有限**: - - `abort_recording_with_error` 会清理资源 - - 全局超时触发时,phase 已经是 Idle - - 全局超时的错误处理会被忽略(因为 session_id 不匹配) - -### ⚠️ 风险 4:Channel 阻塞 - -**场景**: -Watchdog 发送错误到 `runtime_error_tx`,但接收端没有在监听 - -**问题**: -- 如果 channel 是有界的且已满,`send()` 会阻塞 -- 如果 channel 是无界的,可能内存泄漏 - -**当前代码**: -```rust -let _ = runtime_error_tx_for_watchdog.send(RecorderError::EngineFailed(...)); -``` - -**Channel 类型**: -```rust -use std::sync::mpsc::{channel, Receiver, Sender}; -``` - -**分析**: -- 使用标准库的 `mpsc::channel`(无界 channel) -- `send()` 永远不会阻塞 -- ✅ 不会导致 watchdog 线程阻塞 - -**影响评估**: -- **无风险**:无界 channel,不会阻塞 - -### ⚠️ 风险 5:与 ASR/LLM 的交互 - -**场景**: -Watchdog 触发错误后,ASR 和 LLM 服务可能仍在处理 - -**问题**: -- ASR WebSocket 连接可能仍在等待 -- LLM 请求可能仍在进行 -- 资源没有正确清理 - -**当前代码的保护**: - -**Coordinator 的错误处理**(coordinator.rs:1226-1250): -```rust -fn abort_recording_with_error(inner: &Arc, message: String) { - // 1. 获取中止上下文 - let Some(abort) = begin_recording_abort_before_restore(&mut state) else { - return; - }; - - // 2. 清理启动资源(包括 ASR) - discard_startup_resources_for_session(inner, abort.session_id); - - // 3. 恢复 Windows IME - restore_prepared_windows_ime_session(inner, abort.session_id); - - // 4. 发送错误胶囊 - emit_capsule(inner, CapsuleState::Error, ...); - - // 5. 恢复状态到 Idle - publish_abort_idle_after_restore(&mut state, abort.session_id); -} -``` - -**`discard_startup_resources_for_session` 的实现**(已验证): -```rust -fn discard_startup_resources_for_session(inner: &Arc, session_id: u64) { - stop_recorder_for_session(inner, session_id); - cancel_asr_for_session(inner, session_id); // ✅ 调用了 ASR 取消 -} - -fn cancel_asr_for_session(inner: &Arc, session_id: u64) { - if let Some(asr) = take_asr_for_session(inner, session_id) { - cancel_active_asr(asr); // ✅ 显式调用 cancel - } -} - -fn cancel_active_asr(asr: ActiveAsr) { - match asr { - ActiveAsr::Volcengine(v) => v.cancel(), // ✅ Volcengine ASR 取消 - ActiveAsr::Whisper(w) => w.cancel(), // ✅ Whisper 取消 - } -} -``` - -**分析**: -- ✅ `discard_startup_resources_for_session` 确实调用了 `cancel_asr_for_session` -- ✅ `cancel_asr_for_session` 显式调用 `asr.cancel()` -- ✅ 支持 Volcengine 和 Whisper 两种 ASR -- ✅ 使用 session_id 守卫,确保只取消对应 session 的 ASR - -**影响评估**: -- **无风险**:ASR 资源清理逻辑完整且正确 - -## 建议的改进 - -### 改进 1:~~确保 ASR 在 Watchdog 错误时被取消~~ - -**状态**:✅ **已验证,无需改进** - -**验证结果**: -- `abort_recording_with_error` 调用 `discard_startup_resources_for_session` -- `discard_startup_resources_for_session` 调用 `cancel_asr_for_session` -- `cancel_asr_for_session` 显式调用 `asr.cancel()` -- 资源清理逻辑完整且正确 - -**结论**:当前实现已经正确处理 ASR 资源清理,无需修改。 - -### 改进 2:添加 Watchdog 退出日志 - -**问题**: -当前无法从日志中确认 watchdog 是否正确退出 - -**建议**: -在 watchdog 退出时添加日志 - -**实现**: -```rust -let watchdog_handle = thread::Builder::new() - .name("openless-recorder-watchdog".into()) - .spawn(move || { - let watchdog_start_time = std::time::Instant::now(); - - while !stop_flag_for_watchdog.load(Ordering::SeqCst) { - // ... 检查逻辑 ... - } - - log::debug!("[recorder] watchdog 正常退出"); - }) - .ok(); -``` - -### 改进 3:Session ID 守卫 - -**问题**: -Watchdog 可能在旧 session 中触发,但错误被发送到新 session - -**建议**: -在 watchdog 中捕获 session_id,发送错误时一起发送 - -**实现**: -```rust -// 修改错误类型 -pub enum RecorderError { - EngineFailed { - message: String, - session_id: u64, // 添加 session_id - }, - // ... -} - -// Watchdog 中捕获 session_id -let session_id = inner.state.lock().session_id; -let watchdog_handle = thread::spawn(move || { - // ... - runtime_error_tx.send(RecorderError::EngineFailed { - message: format!("录音回调静默停止 {} 秒", elapsed.as_secs()), - session_id, - }); -}); - -// Coordinator 中验证 session_id -if let Ok(err) = rx.recv() { - match err { - RecorderError::EngineFailed { message, session_id } => { - if session_id != current_session_id { - log::warn!("[coord] 忽略过期 session 的 watchdog 错误"); - return; - } - // 处理错误... - } - } -} -``` - -## 当前实现的优点 - -### ✅ 优点 1:线程生命周期管理正确 - -- 每个录音线程都会等待自己的 watchdog 退出 -- 使用 `join()` 确保完全退出 -- 不会累积线程 - -### ✅ 优点 2:并发安全 - -- 使用 `Arc` 和 `Mutex` 保护共享状态 -- 使用 `AtomicBool` 作为停止信号 -- 无数据竞争 - -### ✅ 优点 3:错误传播清晰 - -- 通过 channel 传递错误 -- Coordinator 有专门的错误监听线程 -- 错误处理流程完整 - -### ✅ 优点 4:性能开销小 - -- Watchdog 每秒检查一次 -- 使用 `sleep` 而不是忙等待 -- CPU 开销可忽略 - -## 风险总结 - -| 风险 | 严重性 | 可能性 | 影响 | 状态 | -|------|--------|--------|------|------| -| Watchdog 触发后的竞态条件 | 低 | 低 | 无 | ✅ 安全 | -| 多次录音的 Watchdog 累积 | 无 | 无 | 无 | ✅ 安全 | -| Watchdog 错误与全局超时交互 | 低 | 低 | 可能重复错误处理 | ✅ 可接受 | -| Channel 阻塞 | 无 | 无 | 无 | ✅ 安全 | -| ASR/LLM 资源清理 | 无 | 无 | 无 | ✅ 已验证安全 | - -## 结论 - -### 当前实现评估:✅ **完全安全** - -1. ✅ 线程管理正确,不会泄漏 -2. ✅ 并发安全,无数据竞争 -3. ✅ 性能开销小 -4. ✅ **ASR 资源清理已验证正确** - -### 建议的优先级 - -**P0(必须)**: -- ✅ **无需修改** - ASR 资源清理已验证正确 - -**P1(建议)**: -- 添加 watchdog 退出日志(便于调试) -- 添加 session_id 守卫(防止过期事件) - -**P2(可选)**: -- 在全局超时前检查 phase 状态(避免重复错误处理) - -### 对 LLM 和其他组件的影响 - -**✅ 无负面影响**: -- Watchdog 只监控 recorder 回调 -- 不直接与 ASR、LLM 交互 -- 通过 Coordinator 间接影响 -- 所有资源清理逻辑正确 - -**✅ 正面影响**: -- 更快检测到问题(4 秒 vs 12 秒) -- 更快恢复,减少资源占用时间 -- ASR WebSocket 连接被正确取消 -- 用户体验显著改善 - -**✅ 线程安全保证**: -- 使用 `Arc>` 保护共享状态 -- 使用 `AtomicBool` 作为停止信号 -- 使用 session_id 守卫防止过期事件 -- 主线程等待 watchdog 完全退出 - -### 最终结论 - -**当前实现完全安全,可以放心合并。** - -所有潜在风险都已分析并验证: -- ✅ 无线程泄漏 -- ✅ 无资源泄漏 -- ✅ 无数据竞争 -- ✅ 无阻塞风险 -- ✅ ASR/LLM 不受负面影响 - -**建议**: -- 当前版本可以直接合并 -- P1/P2 改进可以在后续 PR 中实施(非必需) - ---- - -**分析人员**: Claude Sonnet 4.6 -**分析日期**: 2026-05-04 -**结论**: ✅ **完全安全,建议合并** - diff --git a/.github/audit-reports/system-level/architecture-risk-map-20260504.md b/.github/audit-reports/system-level/architecture-risk-map-20260504.md deleted file mode 100644 index 948a6f51..00000000 --- a/.github/audit-reports/system-level/architecture-risk-map-20260504.md +++ /dev/null @@ -1,309 +0,0 @@ -# 架构风险地图 - -## 生成时间 -2026-05-04 23:15:40 - -## 1. 整体架构评估 - -### 当前架构 -``` -┌─────────────────────────────────────────┐ -│ Frontend (React/TS) │ -│ Capsule / Overview / Settings / QA │ -└──────────────┬──────────────────────────┘ - │ IPC (Tauri commands) -┌──────────────┴──────────────────────────┐ -│ Coordinator (状态机) │ -│ Idle → Starting → Listening → Processing│ -└─┬────┬────┬────┬────┬────┬────┬────┬───┘ - │ │ │ │ │ │ │ │ - ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ -Hotkey Recorder ASR Polish Insert Persist Perms History -``` - -### 架构优势 -- ✅ Coordinator 作为单一状态机,职责清晰 -- ✅ 模块间通过 Coordinator 协调,避免直接依赖 -- ✅ 使用 trait 抽象(AudioConsumer) - -### 架构风险 - -#### 🔴 高风险:Coordinator 过于庞大 -**现象**: -- coordinator.rs 有 3462 行代码 -- 承担了状态机、会话管理、模块协调、错误处理等多重职责 - -**影响**: -- 难以理解和维护 -- 修改一个功能可能影响其他功能 -- 测试困难(需要 mock 所有依赖) - -**建议**: -- 拆分为多个子模块: - - `coordinator/state_machine.rs` - 状态转换逻辑 - - `coordinator/session.rs` - 会话管理 - - `coordinator/orchestrator.rs` - 模块协调 - - `coordinator/error_handler.rs` - 错误处理 - -#### 🟡 中风险:缺少统一的 ASR Provider trait -**现象**: -- Volcengine 和 Whisper 实现各自独立 -- 添加新 provider 需要大量手工集成 -- 代码重复(会话管理、错误处理) - -**影响**: -- 扩展性差 -- 维护成本高 -- 容易引入不一致 - -**建议**: -- 定义统一的 `ASRProvider` trait -- 重构现有 provider 实现该 trait -- 在 Coordinator 中使用 trait object - -#### 🟡 中风险:测试基础设施缺失 -**现象**: -- 无测试策略文档 -- 无 CI 自动化测试 -- 测试覆盖率接近 0% - -**影响**: -- 重构风险高(容易引入回归 bug) -- 新功能质量无保障 -- 技术债务累积 - -**建议**: -- 建立测试策略(单元测试、集成测试、E2E 测试比例) -- 配置 CI 自动化测试 -- 为核心模块补充测试 - -#### 🟢 低风险:模块间依赖清晰 -**现象**: -- 各模块只依赖 `types.rs` -- 模块间不直接调用 - -**影响**: -- 正面影响,易于维护 - -## 2. 模块依赖分析 - -### 核心模块依赖图 -``` -types.rs (530 行) - ↑ - ├── coordinator.rs (3462 行) - │ ↑ - │ ├── hotkey.rs (785 行) - │ ├── recorder.rs (525 行) - │ ├── asr/mod.rs (1164 行) - │ ├── polish.rs (992 行) - │ ├── insertion.rs (489 行) - │ ├── persistence.rs (770 行) - │ └── permissions.rs (428 行) - │ - ├── commands.rs (712 行) - └── lib.rs (844 行) -``` - -### 依赖健康度 -- ✅ **单向依赖**:所有模块依赖 types,types 不依赖任何模块 -- ✅ **无循环依赖**:模块间无循环依赖 -- ⚠️ **Coordinator 依赖过多**:依赖 8+ 个模块 - -## 3. 技术栈评估 - -### 当前技术栈 -```toml -[dependencies] -tauri = { version = "2", features = ["macos-private-api", "tray-icon"] } -tauri-plugin-shell = "2" -tauri-plugin-updater = "2" -tauri-plugin-single-instance = "2" -tauri-plugin-autostart = "2" -serde = { version = "1", features = ["derive"] } -serde_json = "1" -tokio = { version = "1", features = ["full"] } -tokio-tungstenite = { version = "0.24", features = ["rustls-tls-native-roots"] } -futures-util = "0.3" -reqwest = { version = "0.12", default-features = false, features = ["json", "multipart", "rustls-tls"] } -thiserror = "1" -anyhow = "1" -log = "0.4" -env_logger = "0.11" -simplelog = "0.12" -parking_lot = "0.12" -once_cell = "1" -uuid = { version = "1", features = ["v4", "serde"] } -chrono = { version = "0.4", features = ["serde"] } -bytes = "1" -url = "2" -raw-window-handle = "0.6" - -# Hotkey + audio + insertion -global-hotkey = "0.6" -cpal = "0.15" -enigo = "0.2" -arboard = "3" -rdev = "0.5" -``` - -### 技术栈风险 -- ✅ **Tauri 2**: 成熟稳定,社区活跃 -- ✅ **Tokio**: 异步运行时,性能优秀 -- ✅ **Serde**: 序列化标准,生态完善 -- ⚠️ **global-hotkey 0.6**: 版本较新,可能有兼容性问题 -- ⚠️ **cpal 0.15**: 音频库,跨平台兼容性需关注 - -## 4. 扩展性瓶颈 - -### 当前扩展点 -1. **ASR Provider**: 需要手工集成,成本高 -2. **Polish Provider**: 已支持 OpenAI 兼容接口,扩展性好 -3. **Insertion Strategy**: 硬编码 AX → clipboard → copy-only,扩展性差 - -### 扩展性改进建议 - -#### ASR Provider 扩展 -**当前成本**:添加新 provider 需要: -1. 实现 AudioConsumer trait -2. 在 Coordinator 中添加分支逻辑 -3. 在 Settings UI 中添加配置 -4. 在 persistence 中添加凭据存储 - -**改进方案**: -```rust -// 定义统一接口 -#[async_trait] -pub trait ASRProvider: Send + Sync { - async fn open_session(&self, hotwords: Vec) -> Result<()>; - fn get_audio_consumer(&self) -> Arc; - async fn close_session(&self) -> Result; - async fn cancel_session(&self); -} - -// 注册机制 -pub struct ASRRegistry { - providers: HashMap>, -} - -impl ASRRegistry { - pub fn register(&mut self, name: &str, provider: Box) { - self.providers.insert(name.to_string(), provider); - } -} -``` - -#### Insertion Strategy 扩展 -**当前成本**:添加新策略需要修改 insertion.rs 核心逻辑 - -**改进方案**: -```rust -// 策略模式 -pub trait InsertionStrategy: Send + Sync { - async fn insert(&self, text: &str) -> Result<()>; -} - -pub struct AXInsertionStrategy; -pub struct ClipboardInsertionStrategy; -pub struct CopyOnlyStrategy; - -// 策略链 -pub struct InsertionChain { - strategies: Vec>, -} -``` - -## 5. 性能瓶颈 - -### 潜在瓶颈 -1. **Coordinator 锁竞争**: 所有操作都需要获取 Coordinator 锁 -2. **音频数据拷贝**: Recorder → AudioConsumer 可能有多次拷贝 -3. **WebSocket 缓冲**: BufferingAudioConsumer 可能积压大量数据 - -### 性能优化建议 -- 使用细粒度锁(拆分 Coordinator 状态) -- 使用 zero-copy 音频传输(Arc<[u8]>) -- 限制 BufferingAudioConsumer 缓冲区大小 - -## 6. 架构演进路线图 - -### Phase 1: Coordinator 拆分(优先级:高) -**目标**: 将 3462 行的 Coordinator 拆分为多个子模块 - -**步骤**: -1. 提取状态机逻辑到 `state_machine.rs` -2. 提取会话管理到 `session.rs` -3. 提取模块协调到 `orchestrator.rs` -4. 保留 `coordinator.rs` 作为入口 - -**预期收益**: -- 代码可读性提升 50%+ -- 测试覆盖率提升 30%+ -- 维护成本降低 40%+ - -### Phase 2: ASR Provider 统一接口(优先级:高) -**目标**: 定义统一的 ASRProvider trait,重构现有 provider - -**步骤**: -1. 定义 `ASRProvider` trait -2. 重构 Volcengine 实现该 trait -3. 重构 Whisper 实现该 trait -4. 添加 provider 注册机制 - -**预期收益**: -- 添加新 provider 成本降低 70%+ -- 代码重复减少 50%+ -- 扩展性提升 100%+ - -### Phase 3: 测试基础设施建设(优先级:高) -**目标**: 建立完整的测试基础设施 - -**步骤**: -1. 编写测试策略文档 -2. 为核心模块补充单元测试 -3. 添加集成测试 -4. 配置 CI 自动化测试 - -**预期收益**: -- 测试覆盖率从 0% → 60%+ -- 重构风险降低 80%+ -- 代码质量提升 50%+ - -## 7. 风险优先级矩阵 - -| 风险 | 影响 | 紧急度 | 优先级 | 预计工作量 | -|------|------|--------|--------|-----------| -| Coordinator 过于庞大 | 高 | 中 | P1 | 2 周 | -| 缺少统一 ASR trait | 高 | 中 | P1 | 1 周 | -| 测试基础设施缺失 | 高 | 高 | P0 | 6 周 | -| Insertion 扩展性差 | 中 | 低 | P2 | 1 周 | -| 性能瓶颈 | 中 | 低 | P3 | 2 周 | - -## 8. 下一步行动 - -### 立即开始(本周) -1. ✅ 完成系统级审计 -2. ⏳ 决策:是否需要架构重构 -3. ⏳ 如果需要,暂停低尺度审计,先做架构设计 - -### 短期计划(2-4 周) -1. Coordinator 拆分设计文档 -2. ASR Provider trait 设计文档 -3. 测试策略文档 - -### 中期计划(1-2 个月) -1. 实施 Coordinator 拆分 -2. 实施 ASR Provider 统一接口 -3. 建立测试基础设施 - ---- - -**审计结论**: -- 🔴 **需要架构重构**:Coordinator 过于庞大,ASR 缺少统一接口 -- 🟡 **测试基础设施缺失**:需要优先建设 -- 🟢 **模块依赖健康**:无循环依赖,单向依赖清晰 - -**建议**: -1. 优先建立测试基础设施(为重构保驾护航) -2. 然后进行 Coordinator 拆分 -3. 最后统一 ASR Provider 接口 diff --git a/.github/audit-reports/system-level/system-audit-summary-20260504.md b/.github/audit-reports/system-level/system-audit-summary-20260504.md deleted file mode 100644 index 82976143..00000000 --- a/.github/audit-reports/system-level/system-audit-summary-20260504.md +++ /dev/null @@ -1,97 +0,0 @@ -# 系统级审计总结 - -**生成时间**: 2026-05-04 23:15:40 - -## 🎯 审计结论 - -### 架构健康度: ⚠️ 中等(需要重构) - -**优势**: -- ✅ 模块依赖清晰,无循环依赖 -- ✅ Coordinator 作为单一状态机,职责清晰 -- ✅ 使用 trait 抽象(AudioConsumer) - -**风险**: -- 🔴 Coordinator 过于庞大(3462 行) -- 🔴 缺少统一的 ASR Provider trait -- 🔴 测试基础设施缺失(覆盖率接近 0%) - -### 技术债务总量: 💳 13 项 - -**优先级分布**: -- P0: 2 项(测试相关) -- P1: 5 项(架构 + 测试 + 代码) -- P2: 4 项(架构 + 文档 + 代码) -- P3: 2 项(文档) - -**预计偿还成本**: 14 周(3.5 个月) - -## 📋 生成的报告 - -1. **架构风险地图**: .github/audit-reports/system-level/architecture-risk-map-20260504.md -2. **技术债务矩阵**: .github/audit-reports/system-level/tech-debt-matrix-20260504.md - -## 🎯 关键决策点 - -### 决策 1: 是否需要架构重构? -**建议**: ✅ **需要** - -**理由**: -- Coordinator 3462 行,维护困难 -- 缺少统一 ASR trait,扩展性差 -- 测试覆盖率接近 0%,重构风险高 - -**方案**: -1. 先建立测试基础设施(为重构保驾护航) -2. 然后进行 Coordinator 拆分 -3. 最后统一 ASR Provider 接口 - -### 决策 2: 是否继续低尺度审计? -**建议**: ⏸️ **暂停** - -**理由**: -- 系统级问题会影响低尺度审计的结果 -- 架构重构可能使低尺度问题消失 -- 应该先解决高尺度问题 - -**方案**: -1. 暂停模块级、功能级、代码级审计 -2. 先完成测试基础设施建设 -3. 然后进行架构重构 -4. 重构完成后再继续低尺度审计 - -## 🚀 下一步行动 - -### 立即开始(本周) -1. ✅ 完成系统级审计 -2. ⏳ 编写测试策略文档 -3. ⏳ 编写 Coordinator 拆分设计文档 -4. ⏳ 编写 ASR Provider trait 设计文档 - -### 短期计划(2-4 周) -1. 建立测试基础设施(Phase 1) -2. 为核心模块补充单元测试 -3. 配置 CI 自动化测试 - -### 中期计划(1-2 个月) -1. 实施 Coordinator 拆分(Phase 2) -2. 实施 ASR Provider 统一接口(Phase 3) -3. 补充文档(Phase 4) - -## 📊 预期收益 - -### 测试基础设施建设后 -- 测试覆盖率: 0% → 60%+ -- 重构风险: 降低 80%+ -- 代码质量: 提升 50%+ - -### 架构重构后 -- 代码可读性: 提升 50%+ -- 维护成本: 降低 40%+ -- 扩展性: 提升 100%+ -- 添加新 provider 成本: 降低 70%+ - ---- - -**审计结论**: 需要架构重构,优先建立测试基础设施 -**下一步**: 编写测试策略文档和架构重构设计文档 diff --git a/.github/audit-reports/system-level/tech-debt-matrix-20260504.md b/.github/audit-reports/system-level/tech-debt-matrix-20260504.md deleted file mode 100644 index 66f0303d..00000000 --- a/.github/audit-reports/system-level/tech-debt-matrix-20260504.md +++ /dev/null @@ -1,147 +0,0 @@ -# 技术债务矩阵 - -## 生成时间 -2026-05-04 23:15:40 - -## 1. 技术债务分类 - -### 架构债务(Architecture Debt) -| 债务 | 影响 | 偿还成本 | 利息 | 优先级 | -|------|------|---------|------|--------| -| Coordinator 过于庞大 | 高 | 2 周 | 每次修改都困难 | P1 | -| 缺少统一 ASR trait | 高 | 1 周 | 添加 provider 成本高 | P1 | -| Insertion 策略硬编码 | 中 | 1 周 | 扩展困难 | P2 | - -### 测试债务(Testing Debt) -| 债务 | 影响 | 偿还成本 | 利息 | 优先级 | -|------|------|---------|------|--------| -| 测试覆盖率接近 0% | 高 | 6 周 | 重构风险高 | P0 | -| 无 CI 自动化测试 | 高 | 1 周 | 手工测试成本高 | P0 | -| 无测试策略文档 | 中 | 2 天 | 测试质量无保障 | P1 | - -### 文档债务(Documentation Debt) -| 债务 | 影响 | 偿还成本 | 利息 | 优先级 | -|------|------|---------|------|--------| -| 缺少架构设计文档 | 中 | 3 天 | 新人上手困难 | P2 | -| 缺少 API 文档 | 低 | 2 天 | 集成困难 | P3 | -| 缺少测试指南 | 中 | 1 天 | 测试质量差 | P2 | - -### 代码债务(Code Debt) -| 债务 | 影响 | 偿还成本 | 利息 | 优先级 | -|------|------|---------|------|--------| -| coordinator.rs 3462 行 | 高 | 2 周 | 维护困难 | P1 | -| 代码重复(ASR providers) | 中 | 1 周 | 维护成本高 | P2 | -| 缺少错误处理(部分模块) | 中 | 1 周 | 稳定性差 | P2 | - -## 2. 技术债务总量 - -### 债务统计 -``` -总债务项: 13 -P0 优先级: 2 项(测试相关) -P1 优先级: 5 项(架构 + 测试 + 代码) -P2 优先级: 4 项(架构 + 文档 + 代码) -P3 优先级: 2 项(文档) - -预计偿还成本: 14 周(3.5 个月) -``` - -### 债务利息(每月) -- **架构债务利息**: 每次添加功能都需要修改 Coordinator,成本 +50% -- **测试债务利息**: 每次重构都有回归风险,成本 +100% -- **文档债务利息**: 新人上手时间 +2 周 -- **代码债务利息**: 维护成本 +30% - -## 3. 债务偿还计划 - -### Phase 1: 测试基础设施(6 周,P0) -**目标**: 建立测试基础设施,为后续重构保驾护航 - -**步骤**: -1. Week 1: 编写测试策略文档 -2. Week 2-3: 为核心模块补充单元测试 -3. Week 4-5: 添加集成测试 -4. Week 6: 配置 CI 自动化测试 - -**收益**: -- 测试覆盖率从 0% → 60%+ -- 重构风险降低 80%+ -- 为后续重构提供安全网 - -### Phase 2: Coordinator 拆分(2 周,P1) -**目标**: 将 3462 行的 Coordinator 拆分为多个子模块 - -**步骤**: -1. Week 1: 设计拆分方案,编写设计文档 -2. Week 2: 实施拆分,补充测试 - -**收益**: -- 代码可读性提升 50%+ -- 维护成本降低 40%+ -- 测试覆盖率提升 30%+ - -### Phase 3: ASR Provider 统一接口(1 周,P1) -**目标**: 定义统一的 ASRProvider trait,重构现有 provider - -**步骤**: -1. Day 1-2: 设计 trait 接口 -2. Day 3-4: 重构 Volcengine 和 Whisper -3. Day 5: 添加 provider 注册机制 - -**收益**: -- 添加新 provider 成本降低 70%+ -- 代码重复减少 50%+ -- 扩展性提升 100%+ - -### Phase 4: 文档补充(1 周,P2) -**目标**: 补充架构设计文档、测试指南 - -**步骤**: -1. Day 1-2: 编写架构设计文档 -2. Day 3: 编写测试指南 -3. Day 4-5: 编写 API 文档 - -**收益**: -- 新人上手时间减少 50%+ -- 测试质量提升 30%+ - -## 4. 债务偿还优先级 - -### 立即偿还(P0) -- [ ] 建立测试基础设施 -- [ ] 配置 CI 自动化测试 - -### 短期偿还(P1,1-2 个月) -- [ ] Coordinator 拆分 -- [ ] ASR Provider 统一接口 -- [ ] 测试策略文档 - -### 中期偿还(P2,2-3 个月) -- [ ] Insertion 策略重构 -- [ ] 架构设计文档 -- [ ] 测试指南 - -### 长期偿还(P3,3-6 个月) -- [ ] API 文档 -- [ ] 性能优化 - -## 5. 债务预防措施 - -### 代码审查清单 -- [ ] 新功能是否有测试? -- [ ] 新模块是否有文档? -- [ ] 是否引入了新的架构债务? -- [ ] 是否增加了代码重复? - -### 定期审计 -- 每月运行一次系统级审计 -- 每季度评估技术债务总量 -- 每半年制定债务偿还计划 - ---- - -**债务总结**: -- 总债务项: 13 -- 预计偿还成本: 14 周(3.5 个月) -- 优先偿还: 测试基础设施(P0) -- 债务利息: 每月增加 30-100% 的维护成本 diff --git a/.github/finding-reports/asr-analysis-20260504.md b/.github/finding-reports/asr-analysis-20260504.md deleted file mode 100644 index 915f86f9..00000000 --- a/.github/finding-reports/asr-analysis-20260504.md +++ /dev/null @@ -1,98 +0,0 @@ -# ASR 模块 Finding 报告 - -## 生成时间 -2026-05-04 22:59:01 - -## 1. ASR 模块结构 - -``` -total 48K --rw-r--r-- 1 luoxu 197609 7.8K May 4 12:41 frame.rs --rw-r--r-- 1 luoxu 197609 1.1K May 4 12:41 mod.rs --rw-r--r-- 1 luoxu 197609 28K May 4 12:41 volcengine.rs --rw-r--r-- 1 luoxu 197609 4.6K May 4 12:41 whisper.rs -``` - -## 2. ASR 模块代码量 - -``` - 252 openless-all/app/src-tauri/src/asr/frame.rs - 35 openless-all/app/src-tauri/src/asr/mod.rs - 749 openless-all/app/src-tauri/src/asr/volcengine.rs - 128 openless-all/app/src-tauri/src/asr/whisper.rs - 1164 total -``` - -## 3. ASR Provider 接口分析 - -### 当前接口 -- `AudioConsumer` trait: 接收 PCM 数据 -- `RawTranscript` struct: ASR 输出结果 - -### 问题 -- 缺少统一的 ASRProvider trait -- Volcengine 和 Whisper 实现重复代码 -- 扩展新 provider 需要大量手工集成 - -### 改进建议 -定义统一的 `ASRProvider` trait,包含: -- `open_session()`: 打开会话 -- `get_audio_consumer()`: 获取音频消费者 -- `close_session()`: 关闭会话并获取结果 -- `cancel_session()`: 取消会话 - -## 4. 混淆词纠错层设计 - -### 插入位置 -`coordinator.rs:616-617` - ASR 结果进入 polish 之前 - -### 数据结构 -```rust -struct CorrectionRule { - pattern: String, // 错误模式(支持正则) - replacement: String, // 正确词汇 - context: Option>, // 上下文关键词 - enabled: bool, -} -``` - -### 内置混淆词表(初版) -- issue / iOS -- PR / 批阅 -- CI / 西爱 -- commit / 靠米特 -- merge / 摸鸡 -- release / 瑞丽丝 - -## 5. 本地 ASR 技术选型 - -### 候选方案 - -| 项目 | 形态 | 平台 | 加速 | License | 备注 | -|---|---|---|---|---|---| -| whisper.cpp | C/C++ | 全平台 | Metal/CoreML/CUDA | MIT | 主流候选 | -| whisper-rs | Rust binding | 全平台 | 同上 | MIT/Apache-2.0 | Rust 集成更顺 | -| sherpa-onnx | C++ + ONNX | 全平台 | CoreML/CUDA | Apache-2.0 | 多模型支持 | - -### 推荐方案 -**whisper-rs** - Rust 原生集成,跨平台支持好 - -### 集成方式 -1. Rust crate 直接绑定(推荐) -2. 子进程 + HTTP(备选) - -## 6. 下一步行动 - -### Phase 1: 混淆词纠错(Week 1) -1. 收集 50+ 真实错词样本 -2. 实现 `asr/correction.rs` 模块 -3. 集成到 coordinator -4. 编写测试 - -### Phase 2: 本地 ASR(Week 2-4) -1. 完成技术选型文档 `docs/local-asr-plan.md` -2. 测试 whisper-rs 性能 -3. 实现模型下载管理 -4. 实现本地推理 -5. 跨平台测试 - diff --git a/.github/finding-reports/dependencies-20260504.md b/.github/finding-reports/dependencies-20260504.md deleted file mode 100644 index c5af0a59..00000000 --- a/.github/finding-reports/dependencies-20260504.md +++ /dev/null @@ -1,96 +0,0 @@ -# 模块依赖关系 Finding 报告 - -## 生成时间 -2026-05-04 22:59:01 - -## 1. Cargo 依赖 - -```toml -[dependencies] -tauri = { version = "2", features = ["macos-private-api", "tray-icon"] } -tauri-plugin-shell = "2" -tauri-plugin-updater = "2" -tauri-plugin-single-instance = "2" -tauri-plugin-autostart = "2" -serde = { version = "1", features = ["derive"] } -serde_json = "1" -tokio = { version = "1", features = ["full"] } -tokio-tungstenite = { version = "0.24", features = ["rustls-tls-native-roots"] } -futures-util = "0.3" -reqwest = { version = "0.12", default-features = false, features = ["json", "multipart", "rustls-tls"] } -thiserror = "1" -anyhow = "1" -log = "0.4" -env_logger = "0.11" -simplelog = "0.12" -parking_lot = "0.12" -once_cell = "1" -uuid = { version = "1", features = ["v4", "serde"] } -chrono = { version = "0.4", features = ["serde"] } -bytes = "1" -url = "2" -raw-window-handle = "0.6" - -# Hotkey + audio + insertion -global-hotkey = "0.6" -cpal = "0.15" -enigo = "0.2" -arboard = "3" -rdev = "0.5" - -[target.'cfg(target_os = "macos")'.dependencies] -block2 = "0.5" -core-foundation = "0.10" -core-graphics = "0.24" -objc2 = "0.5" -objc2-foundation = "0.2" -objc2-app-kit = "0.2" - -[target.'cfg(target_os = "windows")'.dependencies] -raw-window-handle = "0.6" -windows = { version = "0.58", features = [ - "Win32_Foundation", - "Win32_Globalization", - "Win32_Graphics_Dwm", - "Win32_Graphics_Gdi", - "Win32_System_Com", - "Win32_System_Ole", - "Win32_System_Registry", - "Win32_System_Threading", -``` - -## 2. 模块间依赖(通过 use 语句分析) - -### coordinator.rs 依赖 -``` -use crate::asr::{ -use crate::hotkey::{HotkeyEvent, HotkeyMonitor}; -use crate::insertion::TextInserter; -use crate::persistence::{ -use crate::polish::{OpenAICompatibleConfig, OpenAICompatibleLLMProvider}; -use crate::qa_hotkey::{QaHotkeyError, QaHotkeyEvent, QaHotkeyMonitor}; -use crate::recorder::{Recorder, RecorderError}; -use crate::selection::{capture_selection, SelectionContext}; -use crate::types::{ -use crate::windows_ime_ipc::ImeSubmitTarget; -use crate::windows_ime_session::{PreparedWindowsImeSession, WindowsImeSessionController}; -``` - -### recorder.rs 依赖 -``` -``` - -## 3. Mock 策略建议 - -### 需要 Mock 的外部依赖 -- **Volcengine ASR WebSocket**: 使用 mock WebSocket server -- **OpenAI Polish API**: 使用 mock HTTP server -- **Keychain**: 使用 trait abstraction + mock 实现 -- **Clipboard**: 使用 trait abstraction + mock 实现 -- **Audio Device**: 使用 mock audio stream - -### 推荐工具 -- `mockall`: 自动生成 mock -- `wiremock`: HTTP mock server -- `tokio-test`: 异步测试工具 - diff --git a/.github/finding-reports/finding-summary-20260504.md b/.github/finding-reports/finding-summary-20260504.md deleted file mode 100644 index 4dc18f3e..00000000 --- a/.github/finding-reports/finding-summary-20260504.md +++ /dev/null @@ -1,37 +0,0 @@ -# Finding 总结报告 - -**生成时间**: 2026-05-04 22:59:02 - -## 📊 关键指标 - -- **包含测试的文件数**: 15 -- **测试函数数**: 76 -- **核心模块数**: 17 -- **ASR 模块代码量**: 1164 行 - -## 📋 生成的报告 - -1. **测试覆盖率报告**: .github/finding-reports/test-coverage-20260504.md -2. **ASR 模块分析**: .github/finding-reports/asr-analysis-20260504.md -3. **依赖关系分析**: .github/finding-reports/dependencies-20260504.md - -## 🎯 下一步行动 - -### 立即开始(Week 1) -1. 阅读生成的 3 份报告 -2. 更新 EPIC-001 和 EPIC-002 的 Finding 任务状态 -3. 开始实现混淆词纠错层(快速产出) - -### 短期计划(Week 2-3) -1. 为 recorder.rs 补测试 -2. 为 asr/frame.rs 补测试 -3. 编写测试规范文档 - -### 中期计划(Week 4-6) -1. 完成本地 ASR 技术选型 -2. 实现本地 ASR 支持 -3. 建立 CI 自动化测试 - -## 📝 备注 - -所有报告已保存到 `.github/finding-reports/` 目录。 diff --git a/.github/finding-reports/test-coverage-20260504.md b/.github/finding-reports/test-coverage-20260504.md deleted file mode 100644 index 697a5969..00000000 --- a/.github/finding-reports/test-coverage-20260504.md +++ /dev/null @@ -1,97 +0,0 @@ -# 测试覆盖率 Finding 报告 - -## 生成时间 -2026-05-04 22:59:00 - -## 1. 现有测试文件统计 - -### Rust 测试模块 -``` -asr/frame.rs -asr/volcengine.rs -commands.rs -coordinator.rs -insertion.rs -lib.rs -persistence.rs -polish.rs -qa_hotkey.rs -selection.rs -types.rs -windows_ime_ipc.rs -windows_ime_profile.rs -windows_ime_protocol.rs -windows_ime_session.rs -``` - -### 测试数量统计 -``` -包含测试的文件数: 15 -测试模块数: 15 -测试函数数: 76 -``` - -## 2. 核心模块代码量 - -``` - 13256 total - 3462 openless-all/app/src-tauri/src/coordinator.rs - 992 openless-all/app/src-tauri/src/polish.rs - 844 openless-all/app/src-tauri/src/lib.rs - 785 openless-all/app/src-tauri/src/hotkey.rs - 770 openless-all/app/src-tauri/src/persistence.rs - 749 openless-all/app/src-tauri/src/asr/volcengine.rs - 730 openless-all/app/src-tauri/src/windows_ime_profile.rs - 712 openless-all/app/src-tauri/src/commands.rs - 590 openless-all/app/src-tauri/src/selection.rs - 530 openless-all/app/src-tauri/src/types.rs - 525 openless-all/app/src-tauri/src/recorder.rs - 489 openless-all/app/src-tauri/src/insertion.rs - 430 openless-all/app/src-tauri/src/windows_ime_ipc.rs - 428 openless-all/app/src-tauri/src/permissions.rs - 373 openless-all/app/src-tauri/src/qa_hotkey.rs - 253 openless-all/app/src-tauri/src/windows_ime_session.rs - 252 openless-all/app/src-tauri/src/asr/frame.rs - 173 openless-all/app/src-tauri/src/windows_ime_protocol.rs - 128 openless-all/app/src-tauri/src/asr/whisper.rs -``` - -## 3. 需要补测试的优先级模块 - -### 高优先级(核心功能) -- [ ] recorder.rs - 音频采集、watchdog -- [ ] coordinator.rs - 状态机、会话管理 -- [ ] asr/volcengine.rs - WebSocket ASR -- [ ] asr/frame.rs - 二进制帧编解码 - -### 中优先级(工具模块) -- [ ] persistence.rs - 数据持久化 -- [ ] types.rs - 类型定义、状态转换 -- [ ] insertion.rs - 文本插入 -- [ ] polish.rs - 文本润色 - -### 低优先级(平台特定) -- [ ] hotkey.rs - 热键监听 -- [ ] permissions.rs - 权限检查 -- [ ] windows_ime_*.rs - Windows IME - -## 4. 测试工具调研 - -### 推荐工具 -- **mockall**: Mock 框架,用于 mock 外部依赖 -- **proptest**: 属性测试,生成随机测试数据 -- **criterion**: 性能基准测试 -- **cargo-llvm-cov**: 代码覆盖率工具 - -### 安装命令 -```bash -cargo install cargo-llvm-cov -``` - -## 5. 下一步行动 - -1. 为 recorder.rs 编写单元测试(T1.1-T1.6) -2. 为 asr/frame.rs 扩展测试(T1.7-T1.10) -3. 建立测试编写规范文档 -4. 配置 CI 自动化测试 - diff --git a/.github/issues/EPIC-001-testing-infrastructure.md b/.github/issues/EPIC-001-testing-infrastructure.md deleted file mode 100644 index e068049f..00000000 --- a/.github/issues/EPIC-001-testing-infrastructure.md +++ /dev/null @@ -1,168 +0,0 @@ -# [EPIC] 测试基础设施建设 - -## 🎯 目标 - -建立完整的测试基础设施,将项目测试覆盖率从 ~0% 提升到 60%+,确保核心功能的稳定性和可维护性。 - -## 📊 现状分析 - -### 当前状态 -- ✅ 项目有 15 个模块包含 `#[cfg(test)]` -- ✅ `cargo test` 可以运行 -- ❌ 测试覆盖率接近 0% -- ❌ 只有 1 个 test 类型提交 vs 42 个 fix 提交 -- ❌ 无 CI 自动化测试 -- ❌ 无覆盖率报告 - -### 风险 -- 重构时容易引入回归 bug -- 修复一个 bug 可能破坏另一个功能 -- 新贡献者不敢大胆改代码 -- 缺乏质量门禁 - -## 🗺️ 总体规划 - -### Phase 1: 核心模块单元测试(Week 1-3) -为最关键的模块补充单元测试,建立测试编写规范。 - -**优先级排序**: -1. **recorder.rs** (525 行) - 音频采集、watchdog、RMS 计算 -2. **asr/frame.rs** (252 行) - 二进制帧编解码(已有 1 个测试) -3. **persistence.rs** (770 行) - JSON 序列化、Keychain 读写 -4. **types.rs** (530 行) - 状态机转换、错误类型 -5. **insertion.rs** (489 行) - 文本插入逻辑 - -### Phase 2: 集成测试(Week 4-5) -测试模块间协作和完整流程。 - -**测试场景**: -- 录音 → ASR → 润色 → 插入 全链路(mock 外部服务) -- 凭据管理流程(Keychain + JSON fallback) -- 热词注入与 ASR 偏置 -- 错误恢复与降级 - -### Phase 3: CI 自动化(Week 6) -建立持续集成流程,自动化测试和质量门禁。 - -**交付物**: -- GitHub Actions workflow -- 覆盖率报告(codecov / llvm-cov) -- PR 门禁(测试必须通过) -- 测试结果徽章 - -## 📋 子任务清单 - -### 🔍 Finding 阶段(进行中) - -- [ ] **F1.1** 审查所有现有测试,评估质量和覆盖范围 -- [ ] **F1.2** 识别核心模块的关键测试场景 -- [ ] **F1.3** 分析模块依赖关系,确定 mock 策略 -- [ ] **F1.4** 调研 Rust 测试最佳实践(criterion、proptest、mockall) -- [ ] **F1.5** 建立测试编写规范文档 - -### 🧪 Phase 1: 单元测试 - -#### recorder.rs -- [ ] **T1.1** 测试音频设备枚举和选择 -- [ ] **T1.2** 测试 PCM 数据采集和格式转换 -- [ ] **T1.3** 测试 RMS 计算准确性 -- [ ] **T1.4** 测试 watchdog 超时检测 -- [ ] **T1.5** 测试录音启动/停止状态转换 -- [ ] **T1.6** 测试错误处理(设备不可用、权限拒绝) - -#### asr/frame.rs -- [ ] **T1.7** 扩展现有测试覆盖所有帧类型 -- [ ] **T1.8** 测试帧序列化/反序列化 -- [ ] **T1.9** 测试边界条件(空帧、超大帧) -- [ ] **T1.10** 测试错误帧处理 - -#### persistence.rs -- [ ] **T1.11** 测试 history.json 读写和容量限制(200 条) -- [ ] **T1.12** 测试 preferences.json 序列化 -- [ ] **T1.13** 测试 dictionary.json 读写(注意:不能改名为 vocab.json) -- [ ] **T1.14** 测试 Keychain 凭据存储和读取 -- [ ] **T1.15** 测试 credentials.json fallback 逻辑 -- [ ] **T1.16** 测试跨平台路径处理(macOS/Windows/Linux) - -#### types.rs -- [ ] **T1.17** 测试状态机转换(Idle → Starting → Listening → Processing) -- [ ] **T1.18** 测试错误类型序列化 -- [ ] **T1.19** 测试 DictationSession 生命周期 -- [ ] **T1.20** 测试 PolishMode 枚举 - -#### insertion.rs -- [ ] **T1.21** 测试 AX focused-element 写入逻辑 -- [ ] **T1.22** 测试 clipboard + Cmd+V fallback -- [ ] **T1.23** 测试 copy-only fallback -- [ ] **T1.24** 测试跨平台修饰键映射(Cmd/Ctrl) - -### 🔗 Phase 2: 集成测试 - -- [ ] **T2.1** 全链路 mock 测试(recorder → ASR → polish → insertion) -- [ ] **T2.2** 凭据管理流程测试 -- [ ] **T2.3** 热词注入测试 -- [ ] **T2.4** 错误恢复测试(ASR 失败、polish 失败、insertion 失败) -- [ ] **T2.5** 并发场景测试(快速连续触发) - -### 🤖 Phase 3: CI 自动化 - -- [ ] **T3.1** 创建 `.github/workflows/test.yml` -- [ ] **T3.2** 配置 macOS / Windows / Linux 测试矩阵 -- [ ] **T3.3** 集成覆盖率工具(cargo-llvm-cov) -- [ ] **T3.4** 上传覆盖率到 codecov.io -- [ ] **T3.5** 添加 PR 门禁规则 -- [ ] **T3.6** 添加 README 徽章 - -## 📐 测试编写规范 - -### 命名约定 -```rust -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn test___() { - // Arrange - // Act - // Assert - } -} -``` - -### Mock 策略 -- 外部服务(Volcengine ASR、OpenAI polish):使用 `mockall` 或手写 mock -- 系统调用(Keychain、clipboard):使用 trait abstraction -- 时间相关:使用可注入的时钟 - -### 覆盖率目标 -- **核心模块**(coordinator, recorder, ASR):80%+ -- **工具模块**(persistence, types):70%+ -- **平台特定代码**(hotkey, insertion):60%+ -- **整体项目**:60%+ - -## 📈 成功指标 - -- [ ] 测试覆盖率达到 60%+ -- [ ] CI 自动化测试运行时间 < 5 分钟 -- [ ] 所有 PR 必须通过测试 -- [ ] 测试文档完善,新贡献者可以轻松添加测试 -- [ ] 至少 1 次通过测试发现的回归 bug - -## 🔗 相关资源 - -- [Rust 测试最佳实践](https://doc.rust-lang.org/book/ch11-00-testing.html) -- [cargo-llvm-cov](https://github.com/taiki-e/cargo-llvm-cov) -- [mockall](https://docs.rs/mockall/latest/mockall/) -- [proptest](https://docs.rs/proptest/latest/proptest/) - -## 📝 进度追踪 - -**创建时间**:2026-05-04 -**负责人**:Cooper -**当前阶段**:Finding -**完成度**:0% (0/41 tasks) - ---- - -**下一步行动**:开始 Finding 阶段,审查现有测试并建立测试规范。 diff --git a/.github/issues/EPIC-002-asr-enhancement.md b/.github/issues/EPIC-002-asr-enhancement.md deleted file mode 100644 index c10b75df..00000000 --- a/.github/issues/EPIC-002-asr-enhancement.md +++ /dev/null @@ -1,267 +0,0 @@ -# [EPIC] ASR 功能扩展与优化 - -## 🎯 目标 - -扩展 ASR 模块功能,提升语音识别准确性和用户体验,支持本地 ASR 和混淆词纠错。 - -## 📊 现状分析 - -### 当前架构 -``` -Recorder (16kHz mono Int16 PCM) - ↓ -AudioConsumer trait - ↓ -ASR Provider (Volcengine WebSocket / Whisper HTTP) - ↓ -RawTranscript - ↓ -Polish (OpenAI-compatible) - ↓ -Insertion -``` - -### 现有 ASR Providers -- **Volcengine Streaming ASR** (`asr/volcengine.rs`, 749 行) - - WebSocket 流式识别 - - 支持热词偏置 - - 需要云端凭据 - -- **Whisper Batch ASR** (`asr/whisper.rs`, 128 行) - - HTTP 批量识别 - - OpenAI 兼容接口 - - 需要 API key - -### 痛点 -1. **依赖云端服务**:离线场景、隐私敏感场景无法使用 -2. **混淆词问题**:同音词、近音词识别错误(issue → iOS, PR → 批阅) -3. **无本地 fallback**:网络故障时完全不可用 -4. **扩展性受限**:添加新 provider 需要大量重复代码 - -## 🗺️ 总体规划 - -### Phase 1: 混淆词纠错层(Week 1,快速产出) -在 ASR → Polish 之间插入纠错层,解决高频混淆词问题。 - -**优先级**:🔴 High(对应 #89) - -### Phase 2: 本地 ASR 支持(Week 2-4,核心功能) -集成 whisper.cpp 或 sherpa-onnx,支持完全离线识别。 - -**优先级**:🟡 Medium(对应 #211) - -### Phase 3: ASR Provider 架构优化(Week 5-6,长期改进) -重构 ASR 模块,提升扩展性和可维护性。 - -**优先级**:🟢 Low - -## 📋 子任务清单 - -### 🔍 Finding 阶段(进行中) - -#### F1: 混淆词纠错层调研 -- [ ] **F1.1** 收集真实 ASR 错词样本(至少 50 个) -- [ ] **F1.2** 分析错词模式(同音、近音、跨语言、缩写) -- [ ] **F1.3** 调研现有纠错方案(规则引擎、LLM、混合) -- [ ] **F1.4** 设计纠错层接口和数据结构 -- [ ] **F1.5** 确定上下文判断策略(避免误纠) - -#### F2: 本地 ASR 技术选型 -- [ ] **F2.1** 对比 whisper.cpp vs sherpa-onnx vs faster-whisper -- [ ] **F2.2** 评估集成方式(Rust crate / 子进程 / HTTP) -- [ ] **F2.3** 测试首字延迟和流式支持 -- [ ] **F2.4** 评估跨平台兼容性(macOS/Windows/Linux) -- [ ] **F2.5** 确认 License 合规性(代码 + 模型权重) -- [ ] **F2.6** 设计模型下载与管理方案 -- [ ] **F2.7** 编写 `docs/local-asr-plan.md` 技术方案 - -#### F3: ASR 架构分析 -- [ ] **F3.1** 绘制当前 ASR 模块依赖图 -- [ ] **F3.2** 识别重复代码和抽象机会 -- [ ] **F3.3** 分析 AudioConsumer trait 的局限性 -- [ ] **F3.4** 设计统一的 ASR Provider trait - -### 🛠️ Phase 1: 混淆词纠错层 - -#### 设计与实现 -- [ ] **T1.1** 创建 `asr/correction.rs` 模块 -- [ ] **T1.2** 定义 `CorrectionRule` 数据结构 - ```rust - struct CorrectionRule { - pattern: String, // 错误模式(支持正则) - replacement: String, // 正确词汇 - context: Option>, // 上下文关键词 - enabled: bool, - } - ``` -- [ ] **T1.3** 实现规则引擎 `CorrectionEngine` -- [ ] **T1.4** 内置高频混淆词表 - - issue / iOS - - PR / 批阅 - - CI / 西爱 - - commit / 靠米特 - - merge / 摸鸡 - - release / 瑞丽丝 - - workflow / 我可否楼 - - repository / 瑞泼贼特瑞 -- [ ] **T1.5** 支持用户自定义混淆词表(存储在 `dictionary.json`) -- [ ] **T1.6** 在 `coordinator.rs:616-617` 集成纠错层 -- [ ] **T1.7** 添加纠错日志(记录纠正前后对比) - -#### 测试 -- [ ] **T1.8** 单元测试:规则匹配逻辑 -- [ ] **T1.9** 单元测试:上下文判断 -- [ ] **T1.10** 集成测试:ASR → 纠错 → Polish 全链路 -- [ ] **T1.11** 回归测试:覆盖 #89 中的所有案例 - -#### 文档 -- [ ] **T1.12** 编写 `docs/asr-correction.md` 使用文档 -- [ ] **T1.13** 更新 CLAUDE.md 说明纠错层位置 - -### 🚀 Phase 2: 本地 ASR 支持 - -#### 技术方案(先完成 Finding F2) -- [ ] **T2.1** 完成 `docs/local-asr-plan.md` 并 review -- [ ] **T2.2** 选定技术栈(whisper.cpp / sherpa-onnx) -- [ ] **T2.3** 选定集成方式(Rust crate / 子进程 / HTTP) -- [ ] **T2.4** 选定默认模型(tiny / base / small) - -#### 模型管理 -- [ ] **T2.5** 设计模型存储路径 - - macOS: `~/Library/Application Support/OpenLess/models/` - - Windows: `%APPDATA%\OpenLess\models\` - - Linux: `$XDG_DATA_HOME/OpenLess/models/` -- [ ] **T2.6** 实现模型下载器(支持断点续传) -- [ ] **T2.7** 实现模型校验(sha256) -- [ ] **T2.8** 实现模型版本管理 -- [ ] **T2.9** 添加模型下载进度 UI(前端) - -#### 核心实现 -- [ ] **T2.10** 创建 `asr/local_whisper.rs` 或 `asr/local_sherpa.rs` -- [ ] **T2.11** 实现 `AudioConsumer` trait -- [ ] **T2.12** 实现流式识别(如果支持)或批量识别 -- [ ] **T2.13** 实现热词支持(如果底层支持) -- [ ] **T2.14** 实现错误处理和降级策略 - - 模型缺失 → 提示用户下载 - - 推理失败 → 返回空结果(不丢用户的话) -- [ ] **T2.15** 在 `coordinator.rs` 集成本地 ASR provider -- [ ] **T2.16** 添加 ASR provider 切换逻辑(Settings UI) - -#### 性能优化 -- [ ] **T2.17** 测试首字延迟(目标 < 500ms) -- [ ] **T2.18** 测试内存占用(目标 < 500MB) -- [ ] **T2.19** 测试 CPU 占用(目标 < 50%) -- [ ] **T2.20** 添加硬件加速支持 - - macOS: Metal / CoreML - - Windows: CUDA / DirectML - - Linux: CUDA - -#### 测试 -- [ ] **T2.21** 单元测试:模型下载和校验 -- [ ] **T2.22** 单元测试:本地推理 -- [ ] **T2.23** 集成测试:录音 → 本地 ASR → 插入 -- [ ] **T2.24** 性能测试:延迟、内存、CPU -- [ ] **T2.25** 跨平台测试(macOS/Windows/Linux) - -#### 文档 -- [ ] **T2.26** 更新 `docs/openless-development.md` 说明本地 ASR -- [ ] **T2.27** 编写用户文档:如何启用本地 ASR -- [ ] **T2.28** 编写开发者文档:如何添加新的本地 ASR provider -- [ ] **T2.29** 更新 CLAUDE.md 说明本地 ASR 架构 - -### 🏗️ Phase 3: ASR 架构优化 - -#### 重构目标 -- [ ] **T3.1** 定义统一的 `ASRProvider` trait - ```rust - #[async_trait] - pub trait ASRProvider: Send + Sync { - async fn open_session(&self, hotwords: Vec) -> Result<()>; - fn get_audio_consumer(&self) -> Arc; - async fn close_session(&self) -> Result; - async fn cancel_session(&self); - } - ``` -- [ ] **T3.2** 重构 Volcengine ASR 实现 `ASRProvider` -- [ ] **T3.3** 重构 Whisper ASR 实现 `ASRProvider` -- [ ] **T3.4** 重构本地 ASR 实现 `ASRProvider` -- [ ] **T3.5** 在 `coordinator.rs` 使用统一接口 -- [ ] **T3.6** 添加 ASR provider 注册机制(便于扩展) - -#### 可观测性 -- [ ] **T3.7** 添加 ASR 性能指标(延迟、准确率) -- [ ] **T3.8** 添加 ASR 错误日志和分类 -- [ ] **T3.9** 添加 ASR 使用统计(各 provider 使用次数) - -#### 文档 -- [ ] **T3.10** 编写 `docs/asr-architecture.md` 架构文档 -- [ ] **T3.11** 编写 `docs/add-asr-provider.md` 扩展指南 - -## 📐 技术约束 - -### 性能要求 -- **首字延迟**:< 500ms(用户感知流畅) -- **内存占用**:< 500MB(不影响其他应用) -- **CPU 占用**:< 50%(避免风扇狂转) - -### 兼容性要求 -- **平台**:macOS 12+, Windows 10+, Linux(主流发行版) -- **架构**:x86_64, aarch64(Apple Silicon) -- **离线可用**:本地 ASR 必须完全离线工作 - -### 安全要求 -- **隐私**:本地 ASR 不得上传音频数据 -- **凭据**:云端 ASR 凭据存储在 Keychain -- **License**:所有依赖必须 License 合规 - -## 📈 成功指标 - -### Phase 1: 混淆词纠错 -- [ ] 纠错规则覆盖 20+ 高频混淆词 -- [ ] 纠错准确率 > 95%(不误纠) -- [ ] 用户可自定义混淆词表 -- [ ] 解决 #89 中的所有案例 - -### Phase 2: 本地 ASR -- [ ] 支持至少 1 种本地 ASR 引擎 -- [ ] 首字延迟 < 500ms -- [ ] 识别准确率 > 90%(与云端 ASR 对比) -- [ ] 模型下载成功率 > 99% -- [ ] 跨平台一致性(macOS/Windows/Linux) - -### Phase 3: 架构优化 -- [ ] 统一 ASR Provider 接口 -- [ ] 添加新 provider 只需实现 1 个 trait -- [ ] ASR 模块代码减少 20%+(消除重复) -- [ ] 完善的架构文档 - -## 🔗 相关 Issues - -- #89 [asr] 增加 LLM 前置混淆词纠错层(priority: high) -- #211 feat(ASR): 增加对本地 ASR AI 的支持 -- #223 fix(providers): get_credentials 按 active ASR provider 返回配置状态(priority: high) - -## 🔗 相关资源 - -### 本地 ASR 引擎 -- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - C++ Whisper 实现 -- [whisper-rs](https://github.com/tazz4843/whisper-rs) - Rust binding -- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) - ONNX 多模型支持 -- [faster-whisper](https://github.com/SYSTRAN/faster-whisper) - CTranslate2 加速 - -### 混淆词纠错 -- [SymSpell](https://github.com/wolfgarbe/SymSpell) - 拼写纠错算法 -- [Homophone Disambiguation](https://en.wikipedia.org/wiki/Homophone) - 同音词消歧 - -## 📝 进度追踪 - -**创建时间**:2026-05-04 -**负责人**:Cooper -**当前阶段**:Finding -**完成度**:0% (0/71 tasks) - ---- - -**下一步行动**: -1. 开始 F1.1:收集真实 ASR 错词样本 -2. 开始 F2.1:对比本地 ASR 技术栈 diff --git a/.gitignore b/.gitignore index fde2ecf0..de335f6c 100644 --- a/.gitignore +++ b/.gitignore @@ -45,3 +45,37 @@ openless-all/app/windows-ime/obj/ # Planning docs are kept local only, not published to the public repo. docs/plans/ + +# Internal AI 协作 + 规划 + 审计文档:本地保留,绝不发布到公共仓库。 +# 之前误入库的整套已在 chore/remove-internal-docs 中 git rm --cached, +# 这里加规则避免再次被 `git add` 拉回来。 + +# 根目录 AI 协作指南 +/CLAUDE.md +/AGENTS.md +/issue-*-plan.md + +# .github 内部协作 + 审计报告(ISSUE_TEMPLATE / workflows / pr template 仍保留发布) +.github/COOPER_*.md +.github/BUILD_TEST_REPORT.md +.github/MULTI_SCALE_AUDIT.md +.github/P1_TEST_REPORT.md +.github/TEST_VERIFICATION.md +.github/WATCHDOG_RISK_ANALYSIS.md +.github/audit-reports/ +.github/finding-reports/ +.github/issues/EPIC-*.md + +# docs/ 下的规划 / 审计 / 内部 tracking(用户文档如 volcengine-setup / tauri-csp 保留) +docs/audit-*.md +docs/logic-review-*.md +docs/qa-reasoning-roadmap.md +docs/style-pack-marketplace.md +docs/auto-update-download-acceleration.md +docs/2026-*-investigation.md +docs/*-research.md +docs/windows-upstream-pr-workflow.md +docs/github-tracking/ +docs/superpowers/ +docs/windows-lifecycle-tracking/ +docs/windows-ui-tracking/ diff --git a/AGENTS.md b/AGENTS.md deleted file mode 100644 index 11b59cdb..00000000 --- a/AGENTS.md +++ /dev/null @@ -1,193 +0,0 @@ -# AGENTS.md - -This file provides guidance to Codex (Codex.ai/code) when working with code in this repository. - -## Project - -OpenLess is a menu-bar/tray voice-input layer. Hold or toggle a global hotkey, speak, and the dictated text is polished and inserted at the current cursor in any app. Product principles, state machine, and module list live in `docs/openless-development.md` and `docs/openless-overall-logic.md` — read those before changing product behavior. - -The active codebase lives at `openless-all/app/` and is **Tauri 2 + Rust backend + React/TS frontend**, targeting macOS 12+ and Windows. The legacy Swift implementation (Sources/, Tests/, Package.swift, appcast.xml, Sparkle pipeline) was removed in commit `34d2823`; do not resurrect it. - -UI must match `openless-all/design_handoff_openless/*.jsx` pixel-for-pixel; the JSX is reference-only, never imported. - -## Build, Run, Test - -### Tauri (current — start here) - -```bash -cd "openless-all/app" -npm ci - -# Dev: vite at :1420 + tauri shell -npm run tauri dev - -# Build .app (+ DMG) — use this script, not `tauri build` directly, -# because it threads Apple signing env vars and validates Info.plist. -./scripts/build-mac.sh # build, sign, install to /Applications, reset TCC -INSTALL=0 ./scripts/build-mac.sh # build only - -# Frontend-only TS check -npm run build # = tsc && vite build - -# Rust type-check without full compile -cargo check --manifest-path src-tauri/Cargo.toml -``` - -### Windows (cross-check only — no macOS runner in CI) - -```powershell -# Preflight: verify toolchain -.\scripts\windows-preflight.ps1 - -# Build (requires Windows host or cross-compile target) -.\scripts\windows-build-gnu.ps1 -``` - -Generated artifacts: -- `openless-all/app/src-tauri/target/release/bundle/macos/OpenLess.app` -- `openless-all/app/src-tauri/target/release/bundle/dmg/OpenLess__aarch64.dmg` - -Logs: `~/Library/Logs/OpenLess/openless.log` (macOS) / `%LOCALAPPDATA%\OpenLess\Logs\openless.log` (Windows). - -There is no test runner wired in for the frontend. `src/lib/providerSetup.test.ts` is a hand-rolled assertion script — run with `npx tsx src/lib/providerSetup.test.ts` if you need it. Rust backend unit tests are run with `cargo test --manifest-path src-tauri/Cargo.toml --lib`; hardware / OS-integration behavior is still verified by running the app. - -## Architecture - -`coordinator::Coordinator` is the **single owner of session state**. Hotkey edges drive a small phase enum (`Idle → Starting → Listening → Processing`); recorder, ASR, polish, insertion, and history are wired here and nowhere else. Library/module code never calls across modules — they each depend only on shared types. - -``` -Rust (openless-all/app/src-tauri/src) Purpose -────────────────────────────────────── ──────────────────────────────── -types.rs Pure value types: DictationSession, PolishMode, HotkeyBinding, errors -hotkey.rs Global hotkey monitor (modifier-key edges) -recorder.rs Mic → 16 kHz mono Int16 PCM, RMS callback -asr/{mod,frame,volcengine,whisper}.rs ASR providers: Volcengine streaming WebSocket + Whisper HTTP -polish.rs OpenAI-compatible chat completions (Ark / DeepSeek / etc.) -insertion.rs AX focused-element write → clipboard + Cmd+V → copy-only fallback -persistence.rs History/preferences/vocab JSON + platform credential vault -coordinator.rs + commands.rs + lib.rs State machine, IPC surface, tray icon, window plumbing -permissions.rs TCC checks (Accessibility / Microphone) - -Frontend (openless-all/app/src) -src/components/Capsule.tsx Capsule view + state enum -src/ (React) Main window UI: Overview / History / Vocab / Style / Settings -src/i18n/ react-i18next init + zh-CN / en resources -src/pages/_atoms.tsx Recoil atoms — global frontend state -src/state/HotkeySettingsContext.tsx HotkeySettings React context (capability + binding from backend) -``` - -### Dictation pipeline - -``` -hotkey edge (1st) → beginSession: Recorder.start → ASR.openSession → BufferingAudioConsumer.attach -hotkey edge (2nd) → endSession: Recorder.stop → ASR.sendLastFrame → awaitFinal → Polish → Insert → History.save -.cancelled → ASR.cancel, Recorder.stop, capsule .cancelled -``` - -Invariants: -- **Polish/ASR fallbacks are silent.** Missing Ark creds → insert raw transcript. Missing Volcengine creds → mock pipeline copies a placeholder. The contract is *"the user's words don't get lost"* — don't add hard errors here. -- **`BufferingAudioConsumer`** queues PCM until the WebSocket is ready, then drains. Recorder always pushes to it; ASR is attached after `openSession` resolves. -- **Hotkey is toggle-only**, not press-and-hold. The monitor yields one edge per modifier-key keydown; the coordinator interprets odd/even. - -### Permissions, credentials, on-disk state - -- **Bundle ID `com.openless.app`** is hard-coded in `openless-all/app/src-tauri/tauri.conf.json` and `CredentialsVault.serviceName`. Changing it breaks system credential vault lookups *and* every existing TCC grant. -- **TCC**: Microphone + Accessibility + AppleEvents. `NSMicrophoneUsageDescription` / `NSAccessibilityUsageDescription` / `NSAppleEventsUsageDescription` live in `openless-all/app/src-tauri/Info.plist`. After a fresh build that resets TCC, the app must be **fully quit and relaunched** after granting Accessibility before the global hotkey tap installs. -- **Credentials** live in the OS credential vault (macOS Keychain, Windows Credential Manager, Linux keyring) under service `com.openless.app`. The legacy plaintext JSON (`~/.openless/credentials.json` on macOS/Linux, `%APPDATA%\OpenLess\credentials.json` on Windows) is only a migration source and is removed after a successful vault write. Never hard-code keys or include legacy credential files in logs, exports, build artifacts, or bug reports. -- **Per-user data**: - - macOS: `~/Library/Application Support/OpenLess/{history.json, preferences.json, dictionary.json}` — capped at 200 history entries. **Do not rename `dictionary.json` to `vocab.json`** (drops user data). - - Windows: `%APPDATA%\OpenLess\` - - Linux: `$XDG_DATA_HOME/OpenLess` - -### Release pipeline - -Push a `v*-tauri` tag → `.github/workflows/release-tauri.yml` builds macOS arm64 `.dmg` and Windows x64 `.msi`. macOS Developer ID signing + notarization runs only when `APPLE_CERTIFICATE` / `APPLE_CERTIFICATE_PASSWORD` / `APPLE_ID` / `APPLE_PASSWORD` / `APPLE_TEAM_ID` secrets are set; otherwise it falls back to ad-hoc signing with a CI warning. - -When bumping versions, update **all** version fields: `openless-all/app/package.json`, `openless-all/app/package-lock.json`, `openless-all/app/src-tauri/tauri.conf.json`, `openless-all/app/src-tauri/Cargo.toml`, `openless-all/app/src-tauri/Cargo.lock`. 漏一个就会 mismatch。 - -#### Windows CI 红线(不要踩同一颗雷两次) - -Windows release 链路修过四颗雷,每一颗的 fix 都是不可合并的——"顺手统一" 一次就回归一次。改 `.github/workflows/release-tauri.yml` 的 Windows 段或 `windows-package-msvc.ps1` 之前必读: - -1. **手动 light.exe 调用必须带 `-sice:ICE80`** - `wix/openless-ime.wxs` 把 x64 + x86 OpenLessIme.dll 都装进 `INSTALLDIR\windows-ime\`。32-bit Component 落 64-bit Directory 触发 ICE80 (LGHT0204),但 DLL 是绝对路径、不依赖 SysWOW64 重定向,按 Microsoft 文档是合法用法。Tauri 2 没暴露 light 透传参数,所以 *它自己* 的 light 调用必失败;CI workflow 的 "Repair Windows MSI" 步骤和 `windows-package-msvc.ps1::Repair-TauriMsiBundle` 用 `-sice:ICE80` 重链兜底。 - - ✗ 不要去 "修" wxs 让 x86 落到 32-bit Directory(要么改 install 路径破坏 IME 注册,要么拆独立 32-bit MSI 是架构变更)。 - - ✗ 不要从 Repair 调用里拿掉 `-sice:ICE80`。 - -2. **Windows `tauri build` 必须拆两轮 invoke,NSIS 先 / MSI 后** - ```bash - tauri build --bundles nsis ... # Pass 1: 必成功(updater 硬依赖) - tauri build --bundles msi ... # Pass 2: 允许失败由 Repair 兜底 - ``` - Tauri 2 的 updater 签名 (`.exe.sig`) 是 *post-bundle 钩子*——单次 `tauri build` 内任何 bundler 失败,**所有** bundle 的 signature 都跳过。MSI 必踩 ICE80(见 #1),所以单 pass 拿不到 NSIS 的 `.exe.sig`,`write-updater-manifest.mjs` 必报 "Missing updater signature"。 - - ✗ 不要合并回 `--bundles nsis,msi` 单 pass。 - - ✗ 不要移除 NSIS pass 的 `if [ "$nsis_exit" -ne 0 ]` fail-fast。 - - ✗ 不要省略 `--bundles` 走默认 `targets: "all"`——Tauri 字母序 msi→nsis,MSI 一挂 NSIS 永不跑。 - -3. **Windows tauri build step 的 shell 必须 `bash`,不是 `pwsh`** - `pwsh` 调外部命令会吃掉 `'{"bundle":...}'` 的内部双引号,tauri 收到 `{bundle:...}` 当作无效 JSON 拒绝执行、连 candle 都不会跑。1.2.15 翻过一次。 - - ✗ 不要因 "Windows 默认 pwsh 更顺" 而改回去。 - -4. **Repair 假设 candle 已跑出 wixobj** - Repair 步骤兜的是 *light 阶段* 失败。如果 Pass 2 在 candle 之前就挂(比如 JSON 引号问题、wxs 语法错),Repair 会以 "Required WiX object missing" 死掉——别去 "加强" Repair,先去修上游为什么 candle 没跑。 - -#### 修 Windows CI 之前的固定动作 - -不看历史日志就盲改 workflow 是这一段坑反复刷新的根因。每次 Windows job 失败按这个顺序: - -1. `gh run view --json jobs -q '.jobs[]|select(.name|contains("windows"))|.databaseId'` 拿 job id。 -2. `gh api repos/appergb/openless/actions/jobs//logs > /tmp/win.log` 抓全日志。 -3. `grep -n "ICE\|light\|makensis\|Bundling\|Running\|Tauri\|Error\|exit code" /tmp/win.log` 找事件序列。 -4. `git show v1.2.13-tauri:.github/workflows/release-tauri.yml` 对比最后一个 known-good 版本——v1.2.13 是 IME wxs 加入前最后一次成功的 Windows release。 -5. 实质 diff 锁定后再动 workflow / wxs / 脚本。 - -#### 发版流程(保持现状,不要改) - -修 Windows CI 时按这个流程迭代: - -1. 改 workflow / wxs / 脚本,提交到 main。 -2. bump 五处版本号(见上)。 -3. `git tag v-tauri && git push origin v-tauri` → CI 跑 → action-gh-release 自动发版。 - -`release-tauri.yml` 触发条件只有 `tags: [v*-tauri]` + `workflow_dispatch`。release publish 步骤 gated on tag,所以 dispatch run 跑了不发版。 -- ✗ 不要把流程改成 "push main 自动跑 CI 验证再 tag"——已经讨论过否决了,现状的 bump+tag 流程是用户偏好。 -- ✗ 不要 `--amend` 已 push 的 tag 或 force-push。失败的 tag 留着、bump 一个新版本号继续。 - -## Repo conventions - -- **Comments, log messages, user-facing strings, and most docs are in Simplified Chinese.** UI strings additionally route through `react-i18next` (`src/i18n/{zh-CN,en}.ts`) so we ship English alongside; `zh-CN.ts` is source of truth. -- **macOS hotkey monitor must use native `CGEventTap`, never `rdev`.** `rdev` synchronously calls `TSMGetInputSourceProperty` from non-main threads, which macOS 14+ aborts via `dispatch_assert_queue_fail` → SIGTRAP. macOS uses CGEventTap; `rdev` is only used on Linux/Windows. -- **Don't `NSApp.activate` on the dictation path** — it steals focus and breaks insertion. Only call `set_activation_policy(Regular)` + `activateIgnoringOtherApps` from `show_main_window` / mic-permission prompts, never from `start_dictation`. -- Rust modules wrap shared mutable state with `Arc>` (parking_lot). Keep that locking discipline when adding fields. -- Rust modules depend only on `types.rs`. New cross-module wiring goes in `coordinator.rs`, not in the leaf modules. - -### Adding a new module - -1. Add a `.rs` (or directory) under `openless-all/app/src-tauri/src/`, importing only from `types`. -2. Register it in `lib.rs` (`mod ;`). -3. Wire it into `coordinator.rs` and expose any frontend-callable surface via `commands.rs` + `invoke_handler!`. -4. Add the matching TS wrapper in `openless-all/app/src/lib/ipc.ts` (with a mock branch for browser dev). - -### Third-party service integrations & library / platform API research - -When implementing features that depend on **anything outside this repo** — external HTTP APIs (ASR providers, polish endpoints, GitHub API), unfamiliar crates / npm packages, platform APIs (Apple Security framework, Win32, CoreFoundation), or any SDK whose surface shifts faster than your training cut-off — do not write integration code from memory. API surfaces drift; model training data is stale by definition. The same workflow below applies whether you are calling an HTTP endpoint, learning a new Rust crate, or wiring a system framework — substitute "endpoint URL" / "function signature" / "feature flag" as appropriate. - -Follow this research-first workflow: - -1. **Analyze before coding.** Identify every external call this feature needs: endpoint URL, HTTP method, authentication mechanism, request body schema, expected response schema, and error codes. -2. **Delegate web search to a sub-agent.** Spawn a read-only sub-agent whose sole job is to search for official documentation. The sub-agent runs in parallel — you continue other work instead of blocking on sequential web pages. -3. **Filter sub-agent results.** When the sub-agent returns, extract only the information directly relevant to the current implementation. Discard marketing pages, unrelated API versions, or tangential tutorials. -4. **Cross-verify one key finding.** Before writing code, validate at least one structural claim (endpoint URL, required header, auth format) with a direct `web_search` or `fetch_url` call. Sub-agents can hallucinate. -5. **Implement from verified documentation.** Only write integration code after the above steps. Never guess. - -**Sub-agent search brief:** -- Focus each sub-agent on a single external service or protocol — one service, one sub-agent. -- Prioritize official documentation domains (e.g., `docs.volcengine.com`, `platform.openai.com/docs`), falling back to the project's GitHub README. -- The sub-agent must return **structured** findings: endpoint URL, HTTP method, required headers, request body JSON Schema, response body JSON Schema, and error code meanings. -- If the documentation covers multiple API versions, the sub-agent must note which version was referenced. - -**Anti-patterns (do not do these):** -- ✗ Writing API integration code from memory without a documentation search. -- ✗ Pasting entire web pages into the main agent context — the sub-agent does the filtering. -- ✗ Mixing field names or endpoint paths from different API versions. -- ✗ Skipping error handling — every external call must degrade gracefully when the service is unavailable. diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index 71ddd9b1..00000000 --- a/CLAUDE.md +++ /dev/null @@ -1,266 +0,0 @@ -# CLAUDE.md - -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. - -## Project - -OpenLess is a menu-bar/tray voice-input layer. Hold or toggle a global hotkey, speak, and the dictated text is polished and inserted at the current cursor in any app. Product principles, state machine, and module list live in `docs/openless-development.md` and `docs/openless-overall-logic.md` — read those before changing product behavior. - -The active codebase lives at `openless-all/app/` and is **Tauri 2 + Rust backend + React/TS frontend**, targeting macOS 12+ and Windows. The legacy Swift implementation (Sources/, Tests/, Package.swift, appcast.xml, Sparkle pipeline) was removed in commit `34d2823`; do not resurrect it. - -UI must match `openless-all/design_handoff_openless/*.jsx` pixel-for-pixel; the JSX is reference-only, never imported. - -Adjacent docs: -- `AGENTS.md` is the parallel of this file for **Codex** sessions; the research-before-coding rules at the bottom of this file delegate to it. -- `README.md` / `README.zh.md` (root) are user-facing install + feature guides; `USAGE.md` covers runtime usage. Update them when shipping user-visible features, not for internal refactors. - -## Build, Run, Test - -### Tauri (current — start here) - -```bash -cd "openless-all/app" -npm ci - -# Dev: vite at :1420 + tauri shell -npm run tauri dev - -# Build .app (+ DMG) — use this script, not `tauri build` directly, -# because it threads Apple signing env vars and validates Info.plist. -./scripts/build-mac.sh # build, sign, install to /Applications, reset TCC -INSTALL=0 ./scripts/build-mac.sh # build only - -# Frontend-only TS check -npm run build # = tsc && vite build - -# Rust type-check without full compile -cargo check --manifest-path src-tauri/Cargo.toml -``` - -### Windows (cross-check only — no macOS runner in CI) - -```powershell -# Preflight: verify toolchain -.\scripts\windows-preflight.ps1 - -# Build (requires Windows host or cross-compile target) -.\scripts\windows-build-gnu.ps1 -``` - -Generated artifacts: -- `openless-all/app/src-tauri/target/release/bundle/macos/OpenLess.app` -- `openless-all/app/src-tauri/target/release/bundle/dmg/OpenLess__aarch64.dmg` - -Logs: `~/Library/Logs/OpenLess/openless.log` (macOS) / `%LOCALAPPDATA%\OpenLess\Logs\openless.log` (Windows). - -There is no test runner wired in for the frontend. `src/lib/providerSetup.test.ts` is a hand-rolled assertion script — run with `npx tsx src/lib/providerSetup.test.ts` if you need it. Rust backend unit tests are run with `cargo test --manifest-path src-tauri/Cargo.toml --lib`; hardware / OS-integration behavior is still verified by running the app. - -## Architecture - -`coordinator::Coordinator` is the **single owner of all session state** — both the dictation phase machine (`Idle → Starting → Listening → Processing → Inserting → Done`) **and** the parallel QA phase machine (`Idle → Recording → Processing`). Hotkey edges drive both. Recorder, ASR, polish, insertion, selection capture, and history are wired here and nowhere else. Leaf modules never call across each other — they each depend only on `types.rs`. - -The coordinator was split into a module: `coordinator.rs` is the public entry; `coordinator/{dictation,qa,resources}.rs` carry per-pipeline logic; `coordinator_state.rs` is the pure (no Tauri / audio / clipboard) state-transition layer that makes phase decisions unit-testable. - -``` -Rust (openless-all/app/src-tauri/src) Purpose -────────────────────────────────────────── ──────────────────────────────── -types.rs Pure value types: sessions, PolishMode, HotkeyBinding, errors, QaChatMessage -coordinator.rs Public entry; owns Inner, hotkey wiring, capsule emits -coordinator/{dictation,qa,resources}.rs Dictation pipeline / QA pipeline / shared helpers (begin/end/cancel) -coordinator_state.rs Pure state transitions — Tauri-free, unit-testable -commands.rs + lib.rs + main.rs IPC surface (`invoke_handler!`), tray icon, window plumbing, entry -permissions.rs TCC checks (Accessibility / Microphone / AppleEvents) - -— Hotkeys (three parallel monitors) — -hotkey.rs Modifier-only hotkey via native CGEventTap (macOS) / rdev (Win/Linux) -combo_hotkey.rs Custom-combo dictation hotkey (when user picks combo over modifier-only) -qa_hotkey.rs QA toggle hotkey (default Cmd/Ctrl+Shift+;) via `global-hotkey` crate -global_hotkey_runtime.rs Shared `global-hotkey` Carbon/Win event runtime (combo + QA share it) -shortcut_binding.rs Shared parse/validate of user-configurable bindings - -— Audio / ASR / LLM — -recorder.rs Mic → 16 kHz mono Int16 PCM, RMS callback -audio_mute.rs System-output mute guard while recording (RAII) -asr/{mod,frame,volcengine,whisper}.rs + asr/local/* ASR providers: Volcengine streaming WS, Whisper HTTP, Bailian, local Foundry -polish.rs OpenAI-compatible chat completions (Ark / DeepSeek / Codex OAuth reuse) -llm_gemini.rs Native Google Gemini client — NOT OpenAI-compatible (separate auth, thinkingConfig, role:model) -correction.rs User-defined correction rules (separate from vocab dictionary) - -— Insertion (two paths) — -insertion.rs AX focused-element write → clipboard + paste shortcut → copy-only fallback -windows_ime_{ipc,profile,protocol,session}.rs Windows IME-side text injection over IPC (parallel insertion path; activates OpenLess TSF profile and submits text via named pipe) -selection.rs Cross-platform selection capture for QA: macOS AX → Cmd/Ctrl+C simulate-copy → Linux PRIMARY (best-effort) - -persistence.rs history.json / preferences.json / dictionary.json + platform credential vault - -Frontend (openless-all/app/src) -src/components/Capsule.tsx Capsule view + state enum -src/ (React) Main window UI: Overview / History / Vocab / Style / Settings -src/i18n/ react-i18next init + zh-CN / en resources (zh-CN is source of truth) -src/pages/_atoms.tsx Recoil atoms — global frontend state -src/state/HotkeySettingsContext.tsx HotkeySettings React context (capability + binding from backend) -``` - -### Dictation pipeline - -``` -hotkey edge (1st) → beginSession: Recorder.start → ASR.openSession → BufferingAudioConsumer.attach -hotkey edge (2nd) → endSession: Recorder.stop → ASR.sendLastFrame → awaitFinal → Polish → Insert → History.save -.cancelled → ASR.cancel, Recorder.stop, capsule .cancelled -``` - -Invariants: -- **Polish/ASR fallbacks are silent.** Missing Ark creds → insert raw transcript. Missing Volcengine creds → mock pipeline copies a placeholder. The contract is *"the user's words don't get lost"* — don't add hard errors here. -- **`BufferingAudioConsumer`** queues PCM until the WebSocket is ready, then drains. Recorder always pushes to it; ASR is attached after `openSession` resolves. -- **Hotkey is toggle-only**, not press-and-hold. The monitor yields one edge per modifier-key keydown; the coordinator interprets odd/even. - -### Q&A pipeline (selection-based ask-the-LLM) - -Parallel state machine, lives in `coordinator/qa.rs` + `qa_hotkey.rs` + `selection.rs`. Default trigger: `Cmd+Shift+;` (macOS) / `Ctrl+Shift+;` (Win/Linux). - -``` -QA hotkey edge → toggle panel: open → capture front_app, clear messages, show QA window - close → cancel session, hide window, sweep capsule -Option/dictation edge → routed by panel_visible flag (see below): - while panel_visible & dictation Idle → handle_qa_option_edge: - QaPhase::Idle → begin_qa_session: capture_selection() → Recorder.start → ASR.openSession - QaPhase::Recording → end_qa_session: Recorder.stop → ASR final → LLM (with selection as context) → emit qa:state - QaPhase::Processing→ ignored (LLM in flight) - otherwise handle_pressed (normal dictation) -``` - -Invariants & gotchas: -- **Hotkey routing.** When the QA panel is visible, the dictation hotkey edge routes to QA — *unless* a dictation session is already mid-flight (`Starting/Listening/Processing/Inserting`), in which case the edge stays with dictation. Otherwise QA's `begin_qa_session` would race for the same mic device (cpal rejects the second `build_input_stream` on macOS/Win, PipeWire opens two streams on Linux — neither is recoverable from the QA panel UI). See audit 3.3.1 in `coordinator/dictation.rs`. -- **Capsule sweep on panel open.** Open emits a fresh `CapsuleState::Idle` *only if* dictation is Idle. If dictation is Recording/Polishing/Inserting/Done, the sweep is suppressed so the user's in-flight feedback isn't wiped. See audit 3.3.4. -- **Selection capture is a 3-tier fallback** (`selection.rs`): (1) macOS AX `kAXSelectedTextAttribute` direct read, no clipboard touched; (2) macOS/Windows simulate Cmd/Ctrl+C → snapshot + restore original clipboard, 80 ms read window; (3) Linux PRIMARY via `wl-paste` / `xclip` / `xsel`, best-effort. Returns `None` when the user genuinely selected nothing. -- **Selection truncation.** Hard cap 4000 chars; over → keep first 2000 + `[…truncated…]` + last 2000. Don't raise this without checking LLM context budgeting — Gemini and Ark have different limits. -- **Multi-turn memory.** `QaSessionState.messages` accumulates `user→assistant` pairs across turns within a single panel session; closing the panel clears them. - -### Insertion paths - -`insertion.rs` is the cross-platform default. On Windows there is a **second insertion path** in `windows_ime_{ipc,profile,protocol,session}.rs` that activates a TSF profile (CLSID + GUID baked in `windows_ime_profile.rs`) and submits text over a named-pipe IPC. The coordinator picks one based on user preference / fallback status; both routes return the same `InsertStatus` (`Inserted` / `CopiedFallback`). When changing insertion behavior, decide which path you're touching — they don't share code. - -### Permissions, credentials, on-disk state - -- **Bundle ID `com.openless.app`** is hard-coded in `openless-all/app/src-tauri/tauri.conf.json` and `CredentialsVault.serviceName`. Changing it breaks system credential vault lookups *and* every existing TCC grant. -- **TCC**: Microphone + Accessibility + AppleEvents. `NSMicrophoneUsageDescription` / `NSAccessibilityUsageDescription` / `NSAppleEventsUsageDescription` live in `openless-all/app/src-tauri/Info.plist`. After a fresh build that resets TCC, the app must be **fully quit and relaunched** after granting Accessibility before the global hotkey tap installs. -- **Credentials** live in the OS credential vault (macOS Keychain, Windows Credential Manager, Linux keyring) under service `com.openless.app`. The legacy plaintext JSON (`~/.openless/credentials.json` on macOS/Linux, `%APPDATA%\OpenLess\credentials.json` on Windows) is only a migration source and is removed after a successful vault write. Never hard-code keys or include legacy credential files in logs, exports, build artifacts, or bug reports. -- **Per-user data**: - - macOS: `~/Library/Application Support/OpenLess/{history.json, preferences.json, dictionary.json}` — capped at 200 history entries. **Do not rename `dictionary.json` to `vocab.json`** (drops user data). - - Windows: `%APPDATA%\OpenLess\` - - Linux: `$XDG_DATA_HOME/OpenLess` - -### Release pipeline - -Push a `v*-tauri` tag → `.github/workflows/release-tauri.yml` builds macOS arm64 `.dmg` and Windows x64 `.msi`. macOS Developer ID signing + notarization runs only when `APPLE_CERTIFICATE` / `APPLE_CERTIFICATE_PASSWORD` / `APPLE_ID` / `APPLE_PASSWORD` / `APPLE_TEAM_ID` secrets are set; otherwise it falls back to ad-hoc signing with a CI warning. - -When bumping versions, update **both** `version` fields: `openless-all/app/package.json` and `openless-all/app/src-tauri/tauri.conf.json` (and `Cargo.toml`). - -### Branch & release-channel workflow - -Two-channel branching. **Branch name = release channel.** - -- **`beta`** — **Beta channel** (开发版). Default branch, integration buffer. **All PRs target `beta`** (never `main`). Beta builds may exist but are not pushed to general users — only opt-in users on the Beta channel see them. -- **`main`** — **Stable channel** (正式版). Always-releasable. Updated only by `beta → main` merges performed by maintainers after a two-platform smoke build. Release tags `v-tauri` are pushed on `main` and trigger `release-tauri.yml` (tag-driven; unaffected by branch renames). - -Per-PR contract: - -- Run the change locally on your target platform before opening the PR (build green + manual verification of the affected feature). -- `pr-agent.yml` runs one AI review pass per PR — treat it as advisory, do not iterate on it. -- Keep AI rework rounds tight (1–2). If a fix resists, escalate to a human or restart with fresh context. -- `ci.yml` runs on push/PR for both `main` and `beta`; no extra wiring needed when adding new branches off `beta`. - -For maintainers: - -- Merge `beta → main` only after the two-platform (macOS + Windows) smoke build passes. **Beta work must not leak to Stable** — that gate exists for a reason. -- Tag `v-tauri` **on `main`**, not on `beta`. The release workflow keys off the tag, but tagging on `main` keeps the release commit linear with the always-releasable line. -- Avoid direct pushes to `main` outside the `beta → main` merge — it bypasses the smoke-test gate. - -Channel distribution (manual-download opt-in): - -- **Tag convention.** `v-tauri` → Stable release (GitHub `prerelease=false`, manifest `latest-{tgt}-{arch}.json`). `v-beta-tauri` → Beta release (GitHub `prerelease=true`, manifest `latest-{tgt}-{arch}-beta.json`). The two manifest filenames never overlap, so the in-app updater endpoint (which is fixed at compile time to the no-suffix file) cannot pick up Beta releases. This is the **physical isolation** that guarantees Beta does not leak to Stable users. -- **Why not auto-update for Beta.** `tauri-plugin-updater` 2.10's `Builder` does not expose `endpoints()` — endpoints are only readable from `tauri.conf.json` at build time and cannot be swapped at runtime. Rather than fork the plugin or write a custom updater (~500 lines, high risk), Beta opt-in is implemented as a manual-download flow: Settings → About has a "Join Beta channel" toggle that, when on, calls `fetch_latest_beta_release` (GitHub Releases API), shows the latest pre-release tag, and routes the user to the GitHub release page to download manually. No installer signing/install path needs to be re-implemented. -- **Where the wiring lives.** Pref field: `UserPreferences::update_channel` (`types.rs`). IPC: `get_update_channel` / `set_update_channel` / `fetch_latest_beta_release` (`commands.rs`). UI: `BetaChannelControl` inside `AboutMini` (`SettingsModal.tsx`). i18n: `settings.about.betaChannel*` keys. - -### Release verification checklist (run after every tag push) - -Run after pushing **either** a `v*-tauri` or `v*-beta-tauri` tag, **before** announcing the release: - -1. **GitHub Release page** matches expectation: - - Stable tag: not marked `Pre-release`, in the `releases/latest` redirect. - - Beta tag: marked `Pre-release`, **not** the target of `releases/latest`. -2. **Release assets** are channel-correct: - - Stable tag includes `latest-{darwin,windows,linux}-{aarch64,x86_64}.json` + their `-mirror.json` siblings, **without** `-beta` suffix. - - Beta tag includes `latest-{tgt}-{arch}-beta.json` + `-beta-mirror.json`, **without** the no-suffix variant. -3. **Stable user flow.** Install a Stable build, click `Settings → About → Check for updates`. After a Stable release: should offer the new version. After a Beta release only: should report "up to date" (Beta must not appear). -4. **Beta user flow.** In the same Stable build, toggle on `Join Beta channel`. The latest Beta tag should appear (or "no Beta released yet"). Clicking the download button should open the corresponding GitHub release page. -5. **Updater endpoint sanity.** `curl -fsSL https://github.com/appergb/openless/releases/latest/download/latest-darwin-aarch64.json` returns the Stable manifest (version field matches the latest Stable tag). It should never return a Beta version, regardless of which tag was pushed most recently. - -If any step fails, do not announce the release; investigate `release-tauri.yml` channel detection (`endsWith(github.ref_name, '-beta-tauri')`) and the `OPENLESS_RELEASE_CHANNEL` env propagation in the run logs. - -## Repo conventions - -- **Comments, log messages, user-facing strings, and most docs are in Simplified Chinese.** UI strings additionally route through `react-i18next` (`src/i18n/{zh-CN,en}.ts`) so we ship English alongside; `zh-CN.ts` is source of truth. -- **macOS hotkey monitor must use native `CGEventTap`, never `rdev`.** `rdev` synchronously calls `TSMGetInputSourceProperty` from non-main threads, which macOS 14+ aborts via `dispatch_assert_queue_fail` → SIGTRAP. macOS uses CGEventTap; `rdev` is only used on Linux/Windows. -- **Don't `NSApp.activate` on the dictation path** — it steals focus and breaks insertion. Only call `set_activation_policy(Regular)` + `activateIgnoringOtherApps` from `show_main_window` / mic-permission prompts, never from `start_dictation`. -- Rust modules wrap shared mutable state with `Arc>` (parking_lot). Keep that locking discipline when adding fields. -- Rust modules depend only on `types.rs`. New cross-module wiring goes in `coordinator.rs`, not in the leaf modules. - -### Adding a new module - -1. Add a `.rs` (or directory) under `openless-all/app/src-tauri/src/`, importing only from `types`. -2. Register it in `lib.rs` (`mod ;`). -3. Wire it into `coordinator.rs` and expose any frontend-callable surface via `commands.rs` + `invoke_handler!`. -4. Add the matching TS wrapper in `openless-all/app/src/lib/ipc.ts` (with a mock branch for browser dev). - -## 调研先于编码:派子 agent 查 API / 库 / 平台文档 - -**完整规则在 [AGENTS.md `Third-party service integrations & library / platform API research`](AGENTS.md) 段落(line 171-191)。** 这里列的是 Claude Code 入场后用得上的具体工具映射。 - -### 触发条件 — 命中任一项都先派子 agent 调研,再下笔 - -- 第三方 HTTP API(ASR 厂家 / LLM 端点 / GitHub API / Tauri plugin 服务等) -- 不熟的 Rust crate / npm 包:连签名和 feature flag 都不确定时 -- 平台 API:Apple Security framework / CoreFoundation / Win32 / Carbon / AppKit -- 仓库 lock 文件锁着的某版本到底支持什么 — 训练记忆和 `Cargo.lock` / `package-lock.json` 实际版本可能不一致 -- 任何跟「训练 cutoff 之后才迭代过」相关的接口 - -### 不需要派子 agent - -- 仓库代码里已有现成调用 → `rg` / `grep` 找参考即可(仓库即文档) -- 通用编程 / 算法 / 自己能推导的语言特性 -- 单文件 surgical 改动且改动点的 API 已有用例 -- 查本仓库已有模块(`types.rs` / `coordinator.rs` 等)— 直接 Read - -### 工具优先级 - -```text -1. Context7 MCP(最高优先 — 主流库覆盖广,version-aware) - - mcp__context7__resolve-library-id → 拿 library id - - mcp__context7__query-docs → 当前版本的官方文档片段 - -2. documentation-lookup skill - /skill documentation-lookup —— 包装 Context7,含路由 + 缓存。 - -3. Agent 子 agent(subagent_type=general-purpose) - 场景:Context7 没覆盖(小众 crate / 新 SDK / 非英文文档), - 或需多源交叉(官方文档 + GitHub README + Stack Overflow)。 - 子 agent 用 WebFetch / WebSearch / Context7 综合,回 200-400 字结构化结果。 - -4. 单点兜底:直接 WebFetch 单页文档(只读最权威一篇时) -``` - -### 子 agent prompt 必备字段 - -1. **目标问题**:一句话讲清要解决的具体技术问题(不要"了解一下 X"这种空靶) -2. **本仓库现状**:当前 lock 着的版本(`Cargo.lock` / `package-lock.json` 拉一下)+ 现有调用点 `file:line`(若有) -3. **必须返回的结构**:函数/端点签名 → 最小可运行示例(≤20 行)→ **版本兼容范围**(vs 训练记忆的核心校验点)→ 已知坑 / 平台差异 / 弃用计划 -4. **禁令**:不改本仓库代码;不贴文档原文(distill 关键部分,避免上下文撑爆);多个独立服务分别派 agent — 一个服务一个 agent - -### 反例 - -- ✗ 凭训练记忆写第三方 API 调用,假定参数签名就这样 -- ✗ 把整段官方文档 paste 进主上下文 -- ✗ 先写代码再查文档 -- ✗ 单子 agent 同时调研 5 个不相关库(每个独立 prompt + 独立上下文) -- ✗ 子 agent 返回后跳过 cross-verify 直接写代码 — AGENTS.md 第 4 步要求至少用一次 `WebFetch` 直接命中官方源核对一项关键事实 diff --git a/docs/2026-05-02-windows-terminal-clipboard-restore-investigation.md b/docs/2026-05-02-windows-terminal-clipboard-restore-investigation.md deleted file mode 100644 index b785ec46..00000000 --- a/docs/2026-05-02-windows-terminal-clipboard-restore-investigation.md +++ /dev/null @@ -1,334 +0,0 @@ -# Windows terminal clipboard restore investigation (2026-05-02) - -Scope: `openless-all/app/src-tauri/src/insertion.rs` - -## Problem statement - -On Windows terminal-style text entry, OpenLess could: - -1. put the new dictated text into the clipboard -2. send `Ctrl+V` -3. restore the old clipboard too early -4. let the terminal paste the old clipboard instead of the dictated text - -## Baseline code path - -- `Coordinator::end_session()` treats Windows synthetic paste as `InsertStatus::PasteSent`, not `Inserted`. -- `TextInserter::insert()` calls `insert_with_clipboard_restore()`. -- Baseline Windows/Linux behavior restored the previous clipboard after a fixed `150ms`. -- That fixed delay assumed the target app had already consumed the clipboard by then. - -## Automated evidence - -### 1. GUI automation boundary in this session - -Commands used: - -```powershell -Start-Process notepad.exe -PassThru -Start-Process cmd.exe -PassThru -EnumWindows(...) -``` - -Observed result: - -- `explorer.exe` exists in `SessionId=1` -- newly started `notepad.exe`, `cmd.exe`, and even a local WinForms probe form did not expose enumerable top-level windows in this thread - -Conclusion: - -- this Codex desktop thread can compile and manipulate the Windows clipboard -- it cannot reliably drive newly created GUI windows in the current desktop context -- therefore the strongest fully automated evidence in this session must come from clipboard-timing experiments, not end-to-end GUI paste readback - -### 2. Clipboard timing matrix - -Script: - -- `openless-all/app/scripts/windows-clipboard-consumer-timing-smoke.ps1` - -Command: - -```powershell -$cases = @( - @{ consumer = 50; restore = 150 }, - @{ consumer = 250; restore = 150 }, - @{ consumer = 250; restore = 750 } -) -foreach ($case in $cases) { - powershell -ExecutionPolicy Bypass -File openless-all/app/scripts/windows-clipboard-consumer-timing-smoke.ps1 -ConsumerDelayMs $case.consumer -RestoreDelayMs $case.restore -} -``` - -Observed outputs: - -```json -{"consumerDelayMs":50,"restoreDelayMs":150,"insertedText":"OPENLESS_DICTATED_TEXT","previousText":"OPENLESS_OLDER_CLIPBOARD","observedText":"OPENLESS_DICTATED_TEXT","matchedInserted":true} -{"consumerDelayMs":250,"restoreDelayMs":150,"insertedText":"OPENLESS_DICTATED_TEXT","previousText":"OPENLESS_OLDER_CLIPBOARD","observedText":"OPENLESS_OLDER_CLIPBOARD","matchedInserted":false} -{"consumerDelayMs":250,"restoreDelayMs":750,"insertedText":"OPENLESS_DICTATED_TEXT","previousText":"OPENLESS_OLDER_CLIPBOARD","observedText":"OPENLESS_DICTATED_TEXT","matchedInserted":true} -``` - -Interpretation: - -- a fast consumer (`50ms`) succeeds with the old `150ms` restore window -- a slower consumer (`250ms`) fails with the old `150ms` restore window -- the same slower consumer succeeds once restore is delayed to `750ms` - -This isolates the bug to clipboard restore timing, independent of ASR, polish, QA hotkey, or selection logic. - -### 3. Real app end-to-end regression in a stable desktop automation stack - -Environment: - -- Python `pywinauto` + `pywin32` -- Real desktop windows, not mock controls -- Targets: - - Windows Terminal `cmd.exe` tab - - Windows Terminal `PowerShell` tab - - Notepad - -Method: - -- Put a command or text payload into the real Windows clipboard -- Send synthetic `Ctrl+V` -- Wait either `150ms` or `750ms` -- Restore the previous clipboard -- Verify the target app actually received the intended payload - -Observed outputs: - -```json -[ - { - "target": "Windows Terminal CMD", - "restoreDelayMs": 150, - "expected": "CMD_150_OK", - "succeeded": true - }, - { - "target": "Windows Terminal CMD", - "restoreDelayMs": 750, - "expected": "CMD_750_OK", - "succeeded": true - }, - { - "target": "Windows Terminal PowerShell", - "restoreDelayMs": 150, - "expected": "POWERSHELL_150_OK", - "succeeded": true - }, - { - "target": "Windows Terminal PowerShell", - "restoreDelayMs": 750, - "expected": "POWERSHELL_750_OK", - "succeeded": true - }, - { - "target": "Notepad", - "restoreDelayMs": 150, - "expected": "NOTEPAD_150_OK", - "succeeded": true - }, - { - "target": "Notepad", - "restoreDelayMs": 750, - "expected": "NOTEPAD_750_OK", - "succeeded": true - } -] -``` - -Interpretation: - -- the isolated clipboard/paste/restore harness does **not** reproduce the stale-paste bug on the current Windows Terminal `CMD` tab -- it also does **not** reproduce it on the current Windows Terminal `PowerShell` tab -- Notepad behaves as expected in both timing windows -- therefore the user-reported failure is not a blanket “all terminal paste on Windows fails at 150ms” statement -- the failure requires an additional condition beyond “target is a terminal”, such as a slower paste consumer, extra lifecycle delay, or OpenLess-specific sequencing around focus restoration and session completion - -### 4. Full OpenLess lifecycle evidence on `wt-cmd` - -To go beyond isolated paste harnesses, the automation was pushed through the real OpenLess lifecycle: - -- synthetic hold-mode hotkey press on Windows (`VK_LCONTROL`, observed by the low-level hook) -- real recorder startup -- real Volcengine ASR session connection -- real LLM polish -- real insertion into a Windows Terminal `cmd.exe` tab - -Because the desktop automation session could not reliably feed text into the real microphone path, a debug-only test hook was added for automation: - -- if a debug transcript file is configured and ASR returns an empty transcript, OpenLess substitutes that transcript and continues through the normal post-ASR insertion path - -One captured successful run produced the following evidence: - -- OpenLess log: - - `[hotkey] Windows trigger pressed vk=162` - - `[coord] front_app captured: C:\WINDOWS\system32\cmd.exe` - - `[coord] recorder started (asr=volcengine, phase=Starting)` - - `[coord] ASR connected; flushed ... deferred audio bytes` - - `[coord] session started` - - `[hotkey] Windows trigger released vk=162` - - `[llm] HTTP 200 ...` - -- History record: - -```json -{ - "rawTranscript": "瀑布它的白沫其实非常喜欢。", - "finalText": "瀑布的白沫其实非常喜欢。", - "insertStatus": "pasteSent" -} -``` - -- Windows Terminal `cmd.exe` tab tail: - -```text -D:\Users\cooper\Practice-Project\202604\openless>瀑布的白沫其实非常喜欢。 -``` - -Interpretation: - -- this is a true OpenLess session, not a bare clipboard harness -- the target front app captured by OpenLess was the Windows Terminal `cmd.exe` tab -- the final inserted text visible at the terminal prompt matched the polished `finalText` -- in this captured run, the terminal did **not** paste the pre-dictation clipboard contents - -Residual caveat: - -- repeated re-runs in the same desktop session later hit intermittent startup/hook-install flakiness before the test reached insertion again -- that flakiness affected test repeatability, but it does not invalidate the already captured successful full-lifecycle evidence above - -## 5. Repeatable full-lifecycle regression after automation hardening - -After hardening the automation path, the full OpenLess lifecycle was run through a stable route: - -- launch OpenLess with WebView2 remote debugging enabled -- drive lifecycle by invoking Tauri commands from the main webview (`start_dictation` / `stop_dictation`) -- keep real focus-target capture and real insertion behavior -- use a debug-only transcript override only when ASR would otherwise be empty in this desktop environment -- read back target content directly from UIA controls instead of recycling clipboard-based readback - -Targets exercised: - -- `Windows Terminal` `cmd.exe` tab -- `Windows Terminal` `PowerShell` tab -- `Notepad` - -Representative results: - -```json -{ - "target": "wt-cmd", - "historyFinalText": "openless terminal regression success", - "insertStatus": "pasteSent", - "targetContainsFinalText": true, - "targetContainsClipboardSentinel": false -} -{ - "target": "wt-powershell", - "historyFinalText": "openless terminal regression success", - "insertStatus": "pasteSent", - "targetContainsFinalText": true, - "targetContainsClipboardSentinel": false -} -{ - "target": "notepad", - "historyFinalText": "openless terminal regression success", - "insertStatus": "pasteSent", - "targetContainsFinalText": true, - "targetContainsClipboardSentinel": false -} -``` - -Repeatability observed in the current session: - -- `wt-cmd`: multiple successful runs with final text visible at the terminal prompt -- `wt-powershell`: successful run with final text visible at the terminal prompt -- `notepad`: two consecutive successful runs after switching readback from clipboard-based copy to direct UIA text capture - -Updated interpretation: - -- the originally suspected “terminal paste always restores the old clipboard before paste lands” is **not** reproducible as a general rule in the current full-lifecycle automation -- once the automation path is stabilized, all three tested targets receive the intended final text while `insertStatus` remains `pasteSent` -- the clipboard timing race is still real in isolation for slow consumers, but the complete OpenLess lifecycle on this machine does not reproduce the stale-clipboard failure for: - - `wt-cmd` - - `wt-powershell` - - `notepad` - -Most likely current conclusion: - -- the user-reported bug depends on an additional condition not captured in the hardened automation path -- plausible candidates remain: - - a different terminal host/session state - - a different target application than the tested Windows Terminal tabs - - another timing-sensitive environment factor outside the core insertion code - -## Root cause - -Root cause: Windows `PasteSent` semantics were treated as if they implied paste completion. - -- `PasteSent` only means OpenLess sent synthetic `Ctrl+V` -- it does not mean the target application has already read clipboard contents -- terminal-style targets can consume the clipboard later than standard text inputs -- restoring the old clipboard at a fixed `150ms` can therefore race ahead of actual paste consumption -- current real-app regression suggests this is conditional, not universal: some terminal sessions consume quickly enough to beat `150ms`, while slower consumers still fail - -Classification: - -- primary layer: `clipboard lifecycle` -- secondary layer: `insertion lifecycle` -- not primary: `focus restore` -- manifestation: terminal-specific and likely any slower Windows paste consumer -- not evidence of a global Windows clipboard bug by itself - -## Fix applied - -File: - -- `openless-all/app/src-tauri/src/insertion.rs` - -Change: - -- Windows clipboard restore delay changed from `150ms` to `750ms` -- restore now runs on a background thread instead of blocking the insert path -- Linux keeps the previous `150ms` behavior - -## Verification run - -Commands: - -```powershell -cargo fmt --all -cargo check --lib -cargo test --lib --no-run -cargo check --tests -powershell -NoProfile -Command "[void][scriptblock]::Create((Get-Content -Raw 'openless-all/app/scripts/windows-clipboard-consumer-timing-smoke.ps1')); 'script-parse-ok'" -``` - -Observed result: - -- compile/check passed -- test binaries compiled -- new smoke scripts parse successfully -- real desktop automation passed on: - - Windows Terminal `CMD` tab at `150ms` and `750ms` - - Windows Terminal `PowerShell` tab at `150ms` and `750ms` - - Notepad at `150ms` and `750ms` - -## Remaining gap - -Still needed if we want to exactly mirror the original user report: - -- drive **OpenLess itself** through the full dictation lifecycle in the same run -- keep the target specifically in the same terminal/input setup where the stale paste was originally observed -- capture whether the failing case depends on: - - OpenLess focus-target restore timing - - ASR/polish latency - - the exact terminal host/session state - - another app-specific delay not present in the isolated paste harness - -## Suggested issue / PR title - -- Issue: `[windows][insertion] terminal paste can restore stale clipboard before synthetic paste lands` -- PR: `fix(windows): delay clipboard restore after synthetic paste` diff --git a/docs/audit-2026-05-06.md b/docs/audit-2026-05-06.md deleted file mode 100644 index d791d879..00000000 --- a/docs/audit-2026-05-06.md +++ /dev/null @@ -1,293 +0,0 @@ -# OpenLess 系统化工程审计报告 - -> 审计日期:2026-05-06 -> 项目版本:1.2.20 -> 审计范围:`openless-all/app/` 主项目(Rust 后端 + React 前端) - ---- - -## 一、后端检查与优化 - -### 1.1 架构总评 - -后端整体架构清晰,遵循「单 Coordinator 状态机 + 独立叶子模块」的分层设计。模块之间只通过 `types.rs` 共享类型,跨模块调用全部收敛到 `coordinator.rs`,与 CLAUDE.md 约定的架构一致。代码质量整体较高:大量使用 `thiserror` / `anyhow` 进行错误处理,关键路径有 `#[cfg(test)]` 单元测试覆盖,热路径有状态竞态保护。 - -**模块清单**:`asr/`(火山引擎流式 + Whisper HTTP + 本地 Qwen3-ASR)、`polish.rs`(OpenAI-compatible LLM)、`hotkey.rs`(macOS CGEventTap / Windows WH_KEYBOARD_LL / Linux rdev)、`recorder.rs`(cpal 音频采集)、`insertion.rs`(跨平台文本插入)、`persistence.rs`(JSON 文件 + OS 凭据库)、`permissions.rs`(TCC 权限)、`selection.rs`(划词捕获)。 - -### 1.2 值得优化的后端问题 - -#### 问题 A:`coordinator.rs` 过于臃肿(严重程度:中) - -`coordinator.rs` 当前 **3842 行**,包含了 dictation 状态机、QA 状态机、双 hotkey supervisor 循环、recorder 错误监控、Windows IME 会话管理、capsule 事件发射、录音 mute 管理、本地 ASR 预加载/释放等全部胶水逻辑。单一文件内职责过多。 - -**建议**:按子状态机拆分为多个 coordinator 子模块: -- `coordinator/dictation.rs` — 主听写 session 生命周期 -- `coordinator/qa.rs` — QA 划词追问 session 生命周期 -- `coordinator/resources.rs` — recorder / ASR / mute 等资源管理 - -#### 问题 B:`commands.rs` 包含过多业务逻辑(严重程度:中) - -`commands.rs` 中有大量本不属于「IPC 薄层」的业务逻辑,例如: -- WAV 静音文件编码(`encode_wav_16k_mono_silence`) -- ASR 端点 URL 拼接(`asr_transcriptions_url`) -- 模型列表 JSON 解析(`parse_model_ids`) -- LLM/ASR provider 连接验证(`validate_llm_provider` / `validate_asr_provider`) - -这些应该下沉到对应的叶子模块(`asr/` 或 `polish.rs`),`commands.rs` 只做参数接收和类型转换。 - -#### 问题 C:平台条件编译代码分散(严重程度:低) - -`lib.rs`、`coordinator.rs`、`insertion.rs` 中大量 `#[cfg(target_os = "macos")]` / `#[cfg(target_os = "windows")]` 块散落在主流程代码中。虽然不是编译期问题,但降低了可读性。建议将平台适配代码集中到 `platform/` 子模块,用 trait 抽象。 - -#### 问题 D:本地 ASR 引擎缓存释放策略依赖时间阈值(严重程度:低) - -`LocalAsrCache` 的释放依赖 `local_asr_keep_loaded_secs` 定时器(默认 300 秒)。如果用户在 5 分钟内未再次使用,引擎释放。但在 Windows 上本地 ASR 引擎根本不可用(仅在 macOS 编译),相关代码却仍在 coordinator 中占据逻辑分支。建议将平台不可用的功能路径在编译期完全消除,而非运行时静默跳过。 - -#### 问题 E:Volcengine 凭据与通用 Provider 凭据同时存在(严重程度:低) - -系统中同时维护了 Volcengine 专用凭据字段(`volcengine_app_key` / `volcengine_access_key` / `volcengine_resource_id`)和通用 Provider 凭据路径(`asr_api_key` / `asr_endpoint`),导致 `get_credentials` 返回的 `CredentialsStatus` 需要同时维护 `volcengineConfigured` 和 `asrConfigured` 两个字段。历史迁移可理解,但长期维护增加复杂度。 - -### 1.3 后端与其他应用混杂检查 - -经检查,`openless-all/app/` 是纯净的单一 Tauri 项目,未发现与其他应用混杂的代码。但仓库根目录存在以下与主项目无关的目录: - -| 目录 | 内容 | 建议 | -|------|------|------| -| `promo-openless/` | Remotion 宣传视频项目 | 移至独立仓库或 `marketing/` 子目录 | -| `promo-openless-v2/` | Remotion 宣传视频 v2 | 同上 | -| `SC/` | 录屏素材文件(.mov / .mp4) | 建议移出仓库或用 Git LFS | -| `docs/` | 开发调研文档 | 保留,但与主项目解耦 | - -当前这些目录虽然在 git 仓库中,但不会参与 Tauri 构建,不会导致臃肿或冲突。**与用户提到的「此前与其他软件冲突或臃肿问题」对比,当前架构没有重复该问题。** - ---- - -## 二、前端检查 - -### 2.1 UI Bug 分析 - -#### Bug 1:Tab 切换动画的双重渲染竞态(严重程度:中) - -`FloatingShell.tsx` 的 tab 切换使用 `displayTab` / `tabPhase` 机制:旧页先播 `ol-page-fadeout`(180ms),之后切 `displayTab` 并播 `ol-page-slide`。但 `key={displayTab}` 会让 React 在 `displayTab` 改变时**卸载旧组件树并挂载新组件树**。问题: - -- 如果用户在 180ms 内快速切换两次 tab,第一次的 timeout 触发时 `displayTab` 已被第二次覆盖,会看到页面闪变 -- 旧页的 `useEffect` cleanup 和新页的 `useEffect` 在 180ms 内交错执行,若两者都触发了 IPC 调用,会产生竞态 - -**修复建议**:使用 `useTransition` 或 CSS `animationend` 事件代替固定 `setTimeout`,确保动画结束后再切 DOM。 - -#### Bug 2:Capsule 窗口 `state === 'idle'` 时返回空 div(严重程度:低) - -`Capsule.tsx` 的 `if (state === 'idle') { return
; }`。问题:胶囊窗口尺寸由 Rust 端 `position_capsule_bottom_center` 设定(220×110),但 React 返回 0×0 的 div 时,Tauri webview 的 CSS 尺寸与窗口尺寸不一致。在 Windows 上可能导致透明区域响应鼠标事件(mouse event 穿透到下层窗口)。 - -#### Bug 3:QA 浮窗滚动容器缺少 `overflow-anchor`(严重程度:低) - -`QaPanel.tsx` 的流式答案到达时,用 `scrollRef.current.scrollTop = scrollRef.current.scrollHeight` 手动滚到底。如果用户在流式过程中手动向上滚动查看前面的消息,新 chunk 到达时会强制跳回底部,打断阅读。应该加入「用户是否主动滚离底部」的检测(类似聊天的 scroll-to-bottom 按钮逻辑)。 - -#### Bug 4:`dangerouslySetInnerHTML` 的 XSS 表面(严重程度:低) - -`QaPanel.tsx` 和 `StreamingAssistantBubble` 使用 `dangerouslySetInnerHTML` 渲染 Markdown。虽然 `renderQaMarkdown` 使用 `marked` 库且配置了 sanitize,但在流式场景下不完整的 Markdown 可能导致 HTML 结构断裂(如未闭合的 `` 块)。当前有 fallback 到 `renderQaPlainText`,但错误边界不覆盖 dangerouslySetInnerHTML 渲染错误。 - -### 2.2 动效与交互流畅度 - -**当前状态良好**: -- 胶囊波形的 `AudioBars` 使用 `cubic-bezier(0.22, 1, 0.36, 1)` 缓动曲线,过渡平滑 -- 所有 transition 使用 CSS 变量 `var(--ol-motion-*)` 统一缓动 -- `willChange` 属性在动画元素上正确设置(Capsule 的 `transform, box-shadow`) - -**可优化的点**: - -1. **音频电平更新频率**:`LEVEL_EMIT_MIN_INTERVAL_MS = 33`(~30Hz),配合 CSS 0.18s transition 效果尚可。但如果窗口失去焦点时 `requestAnimationFrame` 降频,可能出现电平条「冻住」的观感。建议在 coordinator 侧用 `setInterval` 兜底。 - -2. **QA 浮窗出场动画缺失**:`QaPanel` 关闭时直接 `hide()`,没有退场动画。可以加一个 `qa:fadeout` 事件让前端先播动画,100ms 后再由 Rust 端 actual hide。 - -3. **SettingsModal 无入场动画**:Settings 弹窗使用 `animation: 'ol-prompt-pop 0.26s var(--ol-motion-spring)'`,但关闭时瞬间消失。 - ---- - -## 三、项目工程化与功能完善 - -### 3.1 工程化水平评估 - -#### 优点 - -- **模块化清晰**:Rust 端严格遵循「叶子模块只依赖 types,胶水只写在 coordinator」的约定 -- **错误处理完备**:关键路径全部使用 `Result`,无 `unwrap()` 裸奔 -- **测试覆盖**:`commands.rs` 和 `polish.rs` 有单元测试,`persistence.rs` 有集成测试 -- **类型安全 IPC**:前后端类型通过 `types.rs` ↔ `types.ts` 镜像定义,序列化字段名一致 -- **Mock 支持**:前端 `invokeOrMock` 允许在浏览器中脱离 Tauri 环境开发 -- **i18n 国际化**:支持 zh-CN、en、ja、ko、zh-TW 五种语言 -- **自动更新**:Tauri updater 插件集成完整 - -#### 可改善 - -1. **缺少 CI 质量门禁**:当前只有 `release-tauri.yml` 构建流水线,没有 lint / test / typecheck 门禁(虽然有 Rust `cargo check` 和前端 `tsc` 命令,但未在 CI 强制)。 - -2. **缺少 E2E 测试**:没有端到端测试(如 Playwright + Tauri driver),无法验证「按热键 → 录音 → 插入」的完整链路。 - -3. **`.gitignore` 不完整**:`node_modules/` 出现在多个子目录(`promo-openless/`、`promo-openless-v2/`),但根 `.gitignore` 未统一管理。 - -4. **版本号同步风险**:CLAUDE.md 指出需要同时更新 `package.json`、`tauri.conf.json`、`Cargo.toml` 三处的版本号,容易遗漏。建议用脚本或 workspace 版本管理。 - -### 3.2 功能完整性 - -当前功能矩阵: - -| 功能 | macOS | Windows | 备注 | -|------|-------|---------|------| -| 全局热键听写 | ✅ | ✅ | macOS CGEventTap / Windows WH_KEYBOARD_LL | -| 火山引擎流式 ASR | ✅ | ✅ | | -| Whisper HTTP ASR | ✅ | ✅ | | -| 本地 Qwen3-ASR | ✅ | ❌ | 仅 macOS 编译 | -| LLM 润色(四种模式) | ✅ | ✅ | | -| LLM 翻译模式 | ✅ | ✅ | Shift 修饰键触发 | -| 划词 QA 问答 | ✅ | ✅ | 双 hotkey 架构 | -| 热词词典 | ✅ | ✅ | | -| 历史记录 | ✅ | ✅ | | -| 开机自启 | ✅ | ✅ | | -| 自动更新 | ✅ | ✅ | | -| 录音时系统静音 | ✅ | ✅ | | -| Windows IME 直写 | N/A | ✅ | C++ TSF 模块 | -| 系统托盘图标 | ✅ | ✅ | | - ---- - -## 四、多端逻辑与体验一致性 - -### 4.1 平台差异对比 - -| 维度 | macOS | Windows | 一致性 | -|------|-------|---------|--------| -| Capsule 物理尺寸 | 220×110,visual height 96 | 220×84(118),visual height 52 | ⚠️ 不一致 | -| 插入策略 | AX 直写(通过 Accessibility API) | enigo 模拟粘贴 / TSF | 不同,但策略合理 | -| 窗口圆角 | 系统原生圆角 | 手动 CreateRoundRectRgn(18px) | ⚠️ 视觉差异 | -| 窗口背景 | NSVisualEffectView 磨砂 | Mica + 自定义渐变 | ⚠️ 视觉差异 | -| 默认热键 | 右 Option(Toggle) | 右 Control(Toggle) | 不一致,但符合平台惯例 | -| QA 默认热键 | Cmd+Shift+; | Ctrl+Shift+; | 符合平台惯例 ✅ | -| 启动权限检查 | 阻塞式弹窗检查 | 异步轮询(最多 10s) | 合理差异 | -| 窗口控制按钮 | 系统红黄绿 | 自绘最小化/最大化/关闭 | 合理差异 | - -### 4.2 逻辑完备性 - -- **翻译模式**:两端均通过 Shift 修饰键触发,但 macOS 用 `flagsChanged` 事件,Windows 用 `WH_KEYBOARD_LL` 的 Shift 边沿。逻辑层在 coordinator 中以 `translation_modifier_seen` flag 统一,完备。 -- **QA 浮窗拖动**:macOS 用 `movableByWindowBackground`,Windows 用 `data-tauri-drag-region`。两端都能拖,但 macOS 整窗口可拖,Windows 仅 toolbar 区域可拖。 -- **粘贴后剪贴板恢复**:Windows/Linux 支持(`restore_clipboard_after_paste`),macOS 走 AX 直写不涉及剪贴板。完备。 -- **降级兜底**:插入失败 → 文本留在剪贴板,用户可手动粘贴。两端一致。 - -### 4.3 需要关注的多端差异 - -1. **Windows Capsule 翻译模式高度变化**:翻译模式激活时 capsule 窗口从 84→118,macOS 保持 110 不变。设计合理但视觉差异可能困扰跨平台用户。 - -2. **Windows WindowChrome 自绘标题栏**:`WindowChrome.tsx` 的 `WinTitleBar` 绘制的关闭按钮走 `getCurrentWindow().close()`,但 Rust 端 `RunEvent::WindowEvent::CloseRequested` 只对 `label == "main"` 做 `prevent_close` + `hide`,其他窗口(capsule / qa)的关闭行为不一致。 - -3. **capsule 窗口 `skipTaskbar`**:两端都设了,正确。 - ---- - -## 五、UI 与后端接口映射校验 - -### 5.1 前后端命令对照 - -经逐一核对 `invoke_handler!` 宏(`lib.rs`)与 `ipc.ts` 的函数导出,**所有后端命令在前端都有对应的 TypeScript wrapper**: - -| 后端命令 | 前端函数 | 状态 | -|----------|----------|------| -| `get_settings` | `getSettings()` | ✅ | -| `set_settings` | `setSettings(prefs)` | ✅ | -| `get_hotkey_status` | `getHotkeyStatus()` | ✅ | -| `get_hotkey_capability` | `getHotkeyCapability()` | ✅ | -| `get_windows_ime_status` | `getWindowsImeStatus()` | ✅ | -| `get_credentials` | `getCredentials()` | ✅ | -| `set_credential` | `setCredential(account, value)` | ✅ | -| `read_credential` | `readCredential(account)` | ✅ | -| `set_active_asr_provider` | `setActiveAsrProvider(provider)` | ✅ | -| `set_active_llm_provider` | `setActiveLlmProvider(provider)` | ✅ | -| `validate_provider_credentials` | `validateProviderCredentials(kind)` | ✅ | -| `list_provider_models` | `listProviderModels(kind)` | ✅ | -| `list_history` | `listHistory()` | ✅ | -| `delete_history_entry` | `deleteHistoryEntry(id)` | ✅ | -| `clear_history` | `clearHistory()` | ✅ | -| `list_vocab` | `listVocab()` | ✅ | -| `add_vocab` | `addVocab(phrase, note)` | ✅ | -| `remove_vocab` | `removeVocab(id)` | ✅ | -| `set_vocab_enabled` | `setVocabEnabled(id, enabled)` | ✅ | -| `list_vocab_presets` | `listVocabPresets()` | ✅ | -| `save_vocab_presets` | `saveVocabPresets(store)` | ✅ | -| `start_dictation` | `startDictation()` | ✅ | -| `stop_dictation` | `stopDictation()` | ✅ | -| `cancel_dictation` | `cancelDictation()` | ✅ | -| `handle_window_hotkey_event` | `handleWindowHotkeyEvent(...)` | ✅ | -| `inject_hotkey_click_for_dev` | N/A(debug only) | ✅ | -| `repolish` | `repolish(rawText, mode)` | ✅ | -| `set_default_polish_mode` | `setDefaultPolishMode(mode)` | ✅ | -| `set_style_enabled` | `setStyleEnabled(mode, enabled)` | ✅ | -| `check_accessibility_permission` | `checkAccessibilityPermission()` | ✅ | -| `request_accessibility_permission` | `requestAccessibilityPermission()` | ✅ | -| `check_microphone_permission` | `checkMicrophonePermission()` | ✅ | -| `request_microphone_permission` | `requestMicrophonePermission()` | ✅ | -| `open_system_settings` | `openSystemSettings(pane)` | ✅ | -| `trigger_microphone_prompt` | `triggerMicrophonePrompt()` | ✅ | -| `export_error_log` | `exportErrorLog(name)` | ✅ | -| `restart_app` | `restartApp()` | ✅ | -| `get_qa_hotkey_label` | `getQaHotkeyLabel()` | ✅ | -| `set_qa_hotkey` | `setQaHotkey(binding)` | ✅ | -| `qa_window_dismiss` | `qaWindowDismiss()` | ✅ | -| `qa_window_pin` | `qaWindowPin(pinned)` | ✅ | -| `local_asr_*` (15 个命令) | 对应的 15 个函数 | ✅ | - -**结论**:UI 与后端接口完全 1:1 对应,无遗漏。 - -### 5.2 事件订阅对照 - -| 事件名 | 发射端 | 订阅端 | 状态 | -|--------|--------|--------|------| -| `capsule:state` | coordinator | Capsule.tsx | ✅ | -| `qa:state` | coordinator | QaPanel.tsx | ✅ | -| `qa:dismiss` | coordinator | QaPanel.tsx | ✅ | -| `qa:level` | coordinator | QaPanel.tsx | ✅ | -| `prefs:changed` | commands::set_settings | QaPanel.tsx | ✅ | -| `local-asr:download-progress` | DownloadManager | LocalAsr.tsx | ✅ | - ---- - -## 六、改进建议汇总 - -### 立即修复(P0) - -无。当前版本功能完整,无明显崩溃或数据丢失风险。 - -### 短期优化(P1 — 建议在下一版迭代中处理) - -| # | 问题 | 位置 | 工作量 | -|---|------|------|--------| -| 1 | coordinator.rs 拆分 | 后端 | 2-3h | -| 2 | commands.rs 业务逻辑下沉 | 后端 | 1-2h | -| 3 | Tab 切换动画竞态修复 | FloatingShell.tsx | 1h | -| 4 | QA 流式滚动打断问题 | QaPanel.tsx | 1h | -| 5 | SettingsModal / QaPanel 退场动画 | 前端 | 1h | - -### 中长期改善(P2) - -| # | 问题 | 建议 | -|---|------|------| -| 1 | 缺少 CI lint/test 门禁 | 添加 GitHub Actions workflow:`cargo clippy` + `cargo test` + `npx tsc --noEmit` | -| 2 | 缺少 E2E 测试 | 引入 Playwright + Tauri driver 测试核心链路 | -| 3 | 平台代码分散 | 创建 `src-tauri/src/platform/` 模块,用 trait 抽象平台差异 | -| 4 | 版本号同步 | 用 workspace Cargo.toml + 构建脚本自动同步三处版本号 | -| 5 | 仓库清理 | `promo-openless/`、`SC/` 移至独立仓库或 Git LFS | -| 6 | Windows Capsule 尺寸与 macOS 视觉差异 | 文档化或统一 visual height | - ---- - -## 七、总结 - -OpenLess 1.2.20 的代码质量在同类开源项目中属于**中上水平**。架构设计清晰(单 Coordinator + 叶子模块),错误处理完备,前后端类型安全,IPC 接口 1:1 映射无遗漏。 - -核心问题集中在: -1. **coordinator.rs 的单一文件过大**(3842 行),需要拆分子状态机 -2. **commands.rs 的业务逻辑应下沉**到叶子模块 -3. **前端 Tab 切换的竞态**可能导致动画异常 -4. **QA 浮窗的流式滚动打断用户体验** - -没有发现「与其他应用混杂」或「耦合臃肿」的问题——代码遵循了严格的模块隔离约定。多平台覆盖完整,macOS 和 Windows 的核心行为一致,仅在视觉尺寸、窗口装饰等平台原生差异上有所不同。 - -**项目整体健康。建议在下个迭代中优先处理 coordinator 拆分和前端动画竞态修复。** diff --git a/docs/audit-2026-05-10-validated.md b/docs/audit-2026-05-10-validated.md deleted file mode 100644 index ab69555f..00000000 --- a/docs/audit-2026-05-10-validated.md +++ /dev/null @@ -1,700 +0,0 @@ -# Audit Validation — 2026-05-10 - -Validation of the 21 audit items against the current source tree at -`openless-all/app/src-tauri/src/`. Every CONFIRMED finding cites the -exact file and line range read from the working copy on 2026-05-10. - -## Summary - -| ID | Severity | Status | One-line | -|----|----------|--------|----------| -| 2.2.1 | — | CONFIRMED | TS `UserPreferences` interface and `mockSettings` both miss `updateChannel` | -| 3.1.1 | 严重 | CONFIRMED | `MacHotkeyAdapter` does not override `HotkeyAdapter::shutdown`; `CFRunLoopRun` runs forever | -| 3.1.2 | 中 | CONFIRMED | `hotkey_supervisor_loop` is `loop { ... sleep(3s) }` with no exit signal | -| 3.1.3 | 中 | CONFIRMED | `start_listener_thread` spawns the listener and drops the `JoinHandle` | -| 3.2.1 | 高 | FALSE_POSITIVE | Channel is `std::sync::mpsc::channel()` (unbounded async); `tx.send` does not block | -| 3.2.2 | 高 | CONFIRMED | `emit_capsule` calls `window.show()/hide()` from the cpal `process_callback` thread | -| 3.2.3 | 中 | CONFIRMED | `inner.inserter.insert(...)` runs sync `arboard`+`enigo` from async `end_session` | -| 3.2.4 | 中 | CONFIRMED | `AudioMuteGuard::activate` shells out to `osascript` / `wpctl` / `pactl` synchronously | -| 3.2.5 | 低 | CONFIRMED (Linux only) | `probe_input_stream` calls `thread::sleep(120ms)` from the async permission gate | -| 3.3.1 | 高 | CONFIRMED | `handle_pressed_edge` routes to QA when `panel_visible=true` regardless of dictation phase | -| 3.3.2 | 中 | PARTIAL | Two bridge loops both touch `state` for the same modifier event; non-fatal contention, no integrity bug | -| 3.3.3 | 中 | FALSE_POSITIVE | Cancelled doesn't reset coordinator latch, but the OS-side `Shared::trigger_held` already gates auto-repeat | -| 3.3.4 | 中 | CONFIRMED | `open_qa_panel` always emits `CapsuleState::Idle`, clobbering any in-flight dictation capsule | -| 3.3.5 | 低 | CONFIRMED | `finish_cancel_session_state` skips `focus_target = None` when `phase == Processing` | -| 3.3.6 | 低 | FALSE_POSITIVE | `take_current_prepared_windows_ime_session_for_restore` removes the slot on first call; second call is a true no-op | -| 3.4.1 | 中 (advisory) | ADVISORY_ONLY | `Inner` carries 16 `Mutex` + 4 `AtomicBool` fields (20 concurrent fields) | -| 3.4.2 | 中 (advisory) | ADVISORY_ONLY | 66 of 67 `Ordering` usages in coordinator/hotkey are `SeqCst` | -| 3.4.3 | 低 (advisory) | ADVISORY_ONLY | ~102 `unsafe`/`unsafe fn`/`unsafe impl`/`unsafe extern` sites; many lack SAFETY comments | -| 3.4.4 | 低 | CONFIRMED | `start_dispatcher` in `global_hotkey_runtime.rs` is `loop {}` with no exit | -| 20 (NEW) | — | FALSE_POSITIVE | `read_or_default` already falls back to `UserPreferences::default()` on decode failure; `expect()` only fires on filesystem errors | -| 2.3.3 | — | CONFIRMED (no action) | All four backend events (`capsule:state`, `qa:state`, `qa:level`, `vocab:updated`) have matching frontend listeners | - -**Tally**: 11 CONFIRMED · 4 FALSE_POSITIVE · 1 PARTIAL · 3 ADVISORY_ONLY · 1 CONFIRMED-no-action · 1 CONFIRMED-Linux-only - -## Recommended PR groupings - -Group by file to minimize merge conflict risk. Suggested order: - -1. **PR A — `hotkey.rs` lifecycle** (3.1.1, 3.1.3): add `MacHotkeyAdapter::shutdown` (post a synthetic `CFRunLoopStop`/`CFRunLoopWakeUp` from `Drop`) and store the listener `JoinHandle` so panics surface. Same file, same review. -2. **PR B — TS type alignment** (2.2.1): add `updateChannel: UpdateChannel` to `src/lib/types.ts` and `mockSettings` in `src/lib/ipc.ts`. One file pair, trivial. -3. **PR C — coordinator hotkey supervisor exits** (3.1.2, 3.4.4): add an `AtomicBool` shutdown flag to `hotkey_supervisor_loop` and `global_hotkey_runtime::start_dispatcher`. Same module concern, no overlap with PR A. -4. **PR D — async hygiene** (3.2.3, 3.2.4, 3.2.5): wrap `inserter.insert`, `AudioMuteGuard::activate`, and `probe_input_stream` in `tokio::task::spawn_blocking`. Touches `coordinator/dictation.rs`, `coordinator/resources.rs`, and `coordinator.rs` — coordinate with PR E to avoid shared-line conflicts. -5. **PR E — QA / dictation routing race** (3.3.1, 3.3.4): make `handle_pressed_edge` consult dictation phase before routing to QA, and skip the `Idle` capsule clobber in `open_qa_panel` when dictation is active. Same file (`coordinator/qa.rs` + `coordinator/dictation.rs`). -6. **PR F — capsule UI thread marshaling** (3.2.2): bounce `window.show()/hide()` through `app.run_on_main_thread`; emit-only path can stay (Tauri marshals events internally). Touches `coordinator.rs::emit_capsule`. Independent of PR E. -7. **PR G — focus_target leak fix** (3.3.5): in `finish_cancel_session_state`, also clear `focus_target` when the cancelled phase is `Processing`. Pure `coordinator_state.rs` edit. - -Advisory items (3.4.1 / 3.4.2 / 3.4.3) need no PR; they are tracked here for future hardening. - -## Detail per item - -### 2.2.1 — `updateChannel` missing in TS types -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/types.rs:216-219`, `openless-all/app/src/lib/types.ts:118-177`, `openless-all/app/src/lib/ipc.ts:46-79` - -**Evidence (Rust source of truth)**: -```rust -// types.rs:216-219 -/// Auto-update 渠道偏好。stable = 跟正式版(默认);beta = Settings 里多 -/// 一个手动下载 Beta 的入口。不影响 plugin-updater 的自动检查路径。 -#[serde(default)] -pub update_channel: UpdateChannel, -``` - -**Evidence (TS gap)**: `UserPreferences` ends at `startMinimized: boolean;` (line 176). No `updateChannel` field. `mockSettings` (ipc.ts:46-79) ends at `startMinimized: false,`. No `updateChannel` key. - -**Notes**: Channel state today is read/written via separate `getUpdateChannel` / `setUpdateChannel` IPC commands (`ipc.ts:170-176`), so `getSettings()` still works — the TS shape is just lying about what the Rust backend actually serializes. Setting via `setSettings(prefs)` round-trips through Rust's `UserPreferencesWire` which `#[serde(default)]`-fills the field, so currently no data corruption, but the type is incorrect and any consumer that destructures `UserPreferences` will silently miss the field. Trivial fix. - ---- - -### 3.1.1 — `MacHotkeyAdapter::shutdown` empty [严重] -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/hotkey.rs:111` (default), `:301-325` (`MacHotkeyAdapter` impl), `:419-458` (`run_listen_loop`), contrast `:769-776` (Windows shutdown). - -**Evidence (default)**: -```rust -// hotkey.rs:102-112 -pub trait HotkeyAdapter: Send + Sync { - fn kind(&self) -> HotkeyAdapterKind; - fn update_binding(&self, binding: HotkeyBinding); - fn update_modifier_shortcuts(...); - fn reset_held_state(&self); - fn shutdown(&self) {} -} -``` - -**Evidence (mac adapter impl is silent on `shutdown`)**: -```rust -// hotkey.rs:305-325 -impl HotkeyAdapter for MacHotkeyAdapter { - fn kind(&self) -> HotkeyAdapterKind { ... } - fn update_binding(&self, binding: HotkeyBinding) { ... } - fn update_modifier_shortcuts(...) { ... } - fn reset_held_state(&self) { reset_shared_held_state(&self.shared); } - // <-- no shutdown override -} -``` - -**Evidence (no exit path)**: -```rust -// hotkey.rs:454-457 -log::info!("[hotkey] CGEventTap 已启动"); -let _ = status_tx.send(Ok(())); -CFRunLoopRun(); -// CFRunLoopRun never returns absent CFRunLoopStop; the listener thread leaks. -``` - -**Notes**: `Drop for HotkeyMonitor` does call `self.adapter.shutdown()` (line 170-174), but Mac falls through to the empty default. On every preference-driven monitor swap the old `CFRunLoop` thread + tap leak. Visible on macOS as a steady leak of background threads on long-running sessions that change hotkey bindings. - -**Fix sketch**: Store the `CfRunLoopRef` returned by `CFRunLoopGetCurrent()` (currently captured in `run_listen_loop` only) in the `MacHotkeyAdapter`, plus the `CfMachPortRef`. On `shutdown`, call `CGEventTapEnable(tap, false)` then `CFRunLoopStop(rl)`. Both are FFI-safe to call from any thread. Mirror the Windows pattern (`PostThreadMessageW(thread_id, WM_QUIT)`). Storing those refs needs `CallbackContext` exposed to the adapter, easiest via shared `parking_lot::Mutex`. - ---- - -### 3.1.2 — `hotkey_supervisor_loop` no-exit -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/coordinator.rs:766-833` - -**Evidence**: -```rust -// coordinator.rs:766-833 -fn hotkey_supervisor_loop(inner: Arc) { - let mut attempts: u32 = 0; - let capability = HotkeyMonitor::capability(); - loop { - let prefs = inner.prefs.get(); - if inner.hotkey.lock().is_some() { return; } - // ... try start, on failure: - std::thread::sleep(std::time::Duration::from_secs(3)); - } -} -``` - -**Notes**: The only successful exit is when the hotkey is already installed (line 772-774). On error the supervisor keeps spinning. There is no `AtomicBool` / `Sender<()>` shutdown signal exposed for app shutdown. Same pattern repeats in `qa_hotkey_supervisor_loop`, `combo_hotkey_supervisor_loop`, `translation_hotkey_supervisor_loop`, `action_hotkey_supervisor_loop`. Worth a single shared shutdown flag. - ---- - -### 3.1.3 — `start_listener_thread` drops `JoinHandle` -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/hotkey.rs:196-229` - -**Evidence**: -```rust -// hotkey.rs:218-228 -let (status_tx, status_rx) = mpsc::channel::>(); -std::thread::Builder::new() - .name(thread_name.into()) - .spawn(move || run_listen_loop(thread_shared, tx, status_tx)) - .map_err(|e| install_error("spawn_failed", format!("hotkey 线程启动失败: {e}")))?; - -match status_rx.recv_timeout(Duration::from_secs(3)) { ... } -``` - -**Notes**: The `Result` from `spawn(...)` is `?`'d for the spawn error, but the `JoinHandle` itself is silently dropped (the spawn return value isn't bound). `ListenerThread` only stores `shared` and a single `startup` value. If the listener panics (e.g. `parking_lot::RwLock` poisoning, FFI bug), there is no path for the supervisor to learn about it — the channel just stops receiving. Storing the handle and using `JoinHandle::is_finished()` (Rust 1.61+) or pairing with a "thread alive" `AtomicBool` would let the supervisor restart the listener on panic. - ---- - -### 3.2.1 — Blocking `tx.send` in event-tap callback -**Status**: FALSE_POSITIVE -**Files**: `openless-all/app/src-tauri/src/hotkey.rs:183-187`, `:218`, `coordinator.rs:650`, `:781` - -**Evidence**: -```rust -// hotkey.rs:183-187 -fn send_or_log(tx: &Sender, evt: HotkeyEvent) { - if let Err(e) = tx.send(evt) { - log::warn!("[hotkey] 事件发送失败: {e}"); - } -} -``` - -```rust -// coordinator.rs:650 (and :781) -let (tx, rx) = mpsc::channel::(); -``` - -**Notes**: `std::sync::mpsc::channel()` is the **unbounded asynchronous** variant — `Sender::send` only blocks-and-fails when the receiver has been dropped, in which case it returns `Err(SendError)` immediately. There is no "rendezvous" backpressure. The only way `tx.send` would block long enough to trip `kCGEventTapDisabledByTimeout` is if std mpsc internally allocated under contention, which would be milliseconds at worst, not the seconds-scale macOS uses for the tap-disabled timeout. The audit conflated `mpsc::channel()` with `mpsc::sync_channel(0)` (rendezvous). No fix needed. - ---- - -### 3.2.2 — `emit_capsule` runs `show()/hide()` on cpal callback thread -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/coordinator/dictation.rs:336-365`, `coordinator.rs:3617-3660`, `recorder.rs:458-490` - -**Evidence (call site is the audio callback)**: -```rust -// coordinator/dictation.rs:336-365 -let level_handler: Arc = Arc::new(move |level| { - // ...phase guard, throttle to ~30 Hz... - emit_capsule( - &inner_for_level, - CapsuleState::Recording, - level, - elapsed, - None, - None, - ); -}); -// ... -match Recorder::start(microphone_device_name, consumer, level_handler) { ... } -``` - -**Evidence (callback thread is cpal's `process_callback`)**: -```rust -// recorder.rs:458-482 (process_callback) -fn process_callback(...) { - // ...resampling, RMS computation... - level_handler(level); // synchronously invoked from cpal audio thread -} -``` - -**Evidence (`emit_capsule` directly touches the window)**: -```rust -// coordinator.rs:3637-3656 -if let Some(window) = app.get_webview_window("capsule") { - let visible = !matches!(state, CapsuleState::Idle); - maybe_position_capsule_bottom_center(inner, &window, payload.translation); - if show_capsule && visible { - if !show_capsule_window_no_activate(&app, &window) { - let _ = window.show(); - } - // ... - } else { - hide_capsule_window_if_present(); - let _ = window.hide(); - } -} -let _ = app.emit_to("capsule", "capsule:state", payload); -``` - -**Notes**: `app.emit_to` is fine — Tauri's event bus is thread-safe and marshals to the JS runtime internally. The risk is `window.show() / window.hide()` and the position helper, all of which call NSWindow / HWND APIs that expect the main thread. On macOS this can stall the audio callback (NSWindow ops grab the AppKit run loop), risking `kAudioUnitErr_TooManyFramesToProcess` if the callback misses its deadline. The throttle to ~30 Hz mitigates frequency but doesn't change the per-call risk. Worth bouncing the window-touching half through `app.run_on_main_thread`. - -**Fix sketch**: Split `emit_capsule` into `emit_capsule_event` (just `app.emit_to`, safe everywhere) and `apply_capsule_window_state` (called inside `app.run_on_main_thread` or only from already-main paths). The level-handler path only needs the event; window show/hide already happens in begin/end/cancel which run on the tokio runtime where `run_on_main_thread` is cheap. - ---- - -### 3.2.3 — Sync inserter from async `end_session` -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/coordinator/dictation.rs:900-913`, `insertion.rs:42-56`, `:74-95`, `:136-149` - -**Evidence**: -```rust -// coordinator/dictation.rs:900-913 -#[cfg(not(target_os = "windows"))] -{ - inner.inserter.insert(&polished, restore_clipboard) -} -// ... -} else if allow_non_tsf_insertion_fallback { - inner.inserter.copy_fallback(&polished) -} -``` - -```rust -// insertion.rs:42-46 (non-macOS impl) -pub fn insert(&self, text: &str, restore_clipboard_after_paste: bool) -> InsertStatus { - if text.is_empty() { return InsertStatus::CopiedFallback; } - insert_with_clipboard_restore(text, restore_clipboard_after_paste) -} -``` - -**Notes**: `end_session` is `async`. On Linux (and on macOS too — `simulate_paste()` is FFI-light but still sync), `insert` calls `arboard::Clipboard::new()` which can block on X11/wayland for tens of ms, then `enigo` keystroke synthesis which is also sync. Blocking the tokio worker for 50–200 ms isn't catastrophic but contributes to latency under load. macOS path uses `CGEventPost` via FFI — fast, non-blocking in practice; mostly a Linux/Windows concern. - -**Fix sketch**: Wrap the platform-specific `inserter.insert` / `inserter.copy_fallback` in `tokio::task::spawn_blocking(move || ...).await.unwrap_or(InsertStatus::Failed)`. `TextInserter` is `Sync`, so the move only needs `&self` cloned via `Arc` (already in `Inner`). - ---- - -### 3.2.4 — `AudioMuteGuard` shells out synchronously -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/audio_mute.rs:127-152` (macOS), `:202-249` (Linux), `coordinator/resources.rs:112-131`, `coordinator/dictation.rs:369` - -**Evidence (macOS `osascript`)**: -```rust -// audio_mute.rs:143-151 -let output = Command::new("osascript") - .args(["-e", script]) - .output() - .map_err(|e| format!("set output mute failed: {e}"))?; -``` - -**Evidence (Linux `wpctl`/`pactl` — same shape)**: -```rust -// audio_mute.rs:215-223 -let output = Command::new("wpctl") - .args(["set-mute", "@DEFAULT_AUDIO_SINK@", value]) - .output() - .map_err(|e| format!("wpctl set-mute failed: {e}"))?; -``` - -**Evidence (called from async `begin_session`)**: -```rust -// coordinator/dictation.rs:369 -acquire_recording_mute(inner, "dictation"); -match Recorder::start(microphone_device_name, consumer, level_handler) { ... } -``` - -```rust -// coordinator/resources.rs:117-127 -if mute.holders == 0 { - match crate::audio_mute::AudioMuteGuard::activate() { - Ok(guard) => { mute.guard = Some(guard); ... } - Err(err) => { ... } - } -} -``` - -**Notes**: `osascript` typically takes 100–300 ms to spawn + execute on macOS (AppleScript runtime startup). On a hot-key press → begin_session, this delays the recording start by exactly that amount, on the tokio worker thread. Windows path uses native COM (`IAudioEndpointVolume::SetMute`) which is fast and OK. Linux `wpctl`/`pactl` is similar to macOS osascript in cost. - -**Fix sketch**: Wrap `AudioMuteGuard::activate()` in `tokio::task::spawn_blocking`. Since `acquire_recording_mute` itself is sync and called from `begin_session` (async), the cleanest path is making `acquire_recording_mute` async and `.await`-ing the spawn_blocking. Drop path (`PlatformMuteGuard::restore`) also runs `osascript` and is currently called from `Drop` in `release_recording_mute`; moving that to a detached `tokio::spawn_blocking` is sufficient (release path doesn't need to await). - ---- - -### 3.2.5 — `thread::sleep(120ms)` in async permission probe -**Status**: CONFIRMED (Linux/non-macOS path only) -**Files**: `openless-all/app/src-tauri/src/permissions.rs:323-357`, `coordinator/dictation.rs:137`, `coordinator.rs:1732-1763` - -**Evidence**: -```rust -// permissions.rs:343-356 -let stream = match sample_format { - SampleFormat::F32 => build_probe!(f32), - // ... -}?; -stream.play().map_err(|e| e.to_string())?; -std::thread::sleep(Duration::from_millis(120)); -drop(stream); -Ok(()) -``` - -**Evidence (called from async)**: -```rust -// coordinator/dictation.rs:137 -if let Err(message) = ensure_microphone_permission(inner) { ... } -``` - -```rust -// coordinator.rs:1732-1763 -fn ensure_microphone_permission(_inner: &Arc) -> Result<(), String> { - #[cfg(target_os = "windows")] { ... return Ok(()); } // Windows skips probe - let status = permissions::check_microphone(); - // ... -} -``` - -**Notes**: On macOS `check_microphone` uses `AVAudioApplication` / `AVCaptureDevice` and never reaches `probe_input_stream` (that helper is in the `cfg(not(target_os = "macos"))` module). So this 120 ms blocking sleep only fires on **Linux** (and any other non-macOS, non-Windows path) when probing mic permission. On Linux dictation, every `begin_session` blocks the tokio worker for 120 ms before the recorder even starts. - -**Fix sketch**: `tokio::time::sleep(Duration::from_millis(120)).await` is the correct call but requires turning `probe_input_stream` async. Alternatively keep it sync and wrap the whole `check_microphone()` non-macOS path in `tokio::task::spawn_blocking`. The latter is mechanically simpler. - ---- - -### 3.3.1 — Pressed edge routes to QA without checking dictation phase -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/coordinator/dictation.rs:11-22`, `coordinator/qa.rs:64-77` - -**Evidence**: -```rust -// coordinator/dictation.rs:11-22 -pub(super) async fn handle_pressed_edge(inner: &Arc) { - let was_held = inner.hotkey_trigger_held.swap(true, Ordering::SeqCst); - if !was_held { - // 路由:QA 浮窗可见时,rightOption 边沿走 QA;否则走主听写。详见 issue #118 v2。 - let panel_visible = inner.qa_state.lock().panel_visible; - if panel_visible { - handle_qa_option_edge(inner).await; - } else { - handle_pressed(inner).await; - } - } -} -``` - -**Evidence (QA path doesn't stop dictation)**: -```rust -// coordinator/qa.rs:64-77 -pub(super) async fn handle_qa_option_edge(inner: &Arc) { - let phase = inner.qa_state.lock().phase; - log::info!("[coord] QA option edge (phase={phase:?})"); - match phase { - QaPhase::Idle => { let _ = begin_qa_session(inner).await; } - QaPhase::Recording => { let _ = end_qa_session(inner).await; } - QaPhase::Processing => {} - } -} -``` - -**Notes**: `panel_visible` flips true via `open_qa_panel`, which is triggered by the QA hotkey (`Cmd+Shift+;` by default). If the user opens the QA panel mid-dictation (dictation `phase = Listening`, mic open, ASR session live), the next dictation-hotkey press routes into `begin_qa_session`. `begin_qa_session` will call `Recorder::start` again on the same mic device. cpal will reject the second `build_input_stream` on most platforms, but on Linux/PipeWire it sometimes succeeds and you end up with two concurrent capture streams. Even where it fails, the dictation session's recorder is still running and the user has no UX path to stop it from the QA panel. - -**Fix sketch**: In `handle_pressed_edge`, check `inner.state.lock().phase`. If `Listening` or `Starting`, route to `handle_pressed` (which is the dictation toggle path) regardless of `panel_visible`, and either close the QA panel or refuse to open it while dictation is active. Decision belongs to product, but the *technical* race is real. - ---- - -### 3.3.2 — Dual TranslationModifier handlers -**Status**: PARTIAL -**Files**: `openless-all/app/src-tauri/src/coordinator.rs:1130-1139`, `:1376-1384`, `:1402-1410` - -**Evidence**: -```rust -// coordinator.rs:1130-1139 (translation_hotkey_bridge_loop, runs on its own thread) -fn translation_hotkey_bridge_loop(inner: Arc, rx: mpsc::Receiver) { - while let Ok(evt) = rx.recv() { - if inner.shortcut_recording_active.load(Ordering::SeqCst) { continue; } - if matches!(evt, ComboHotkeyEvent::Pressed) { - mark_translation_modifier_seen(&inner); - } - } -} - -// coordinator.rs:1402-1410 (hotkey_bridge_loop, separate thread) -HotkeyEvent::TranslationModifierPressed => { - let translation_hotkey = inner_cloned.prefs.get().translation_hotkey; - if is_builtin_translation_shift(&translation_hotkey) - || crate::shortcut_binding::legacy_modifier_trigger(&translation_hotkey) - .is_some() - { - mark_translation_modifier_seen(&inner_cloned); - } -} - -// coordinator.rs:1376-1384 -fn mark_translation_modifier_seen(inner: &Arc) { - let phase = inner.state.lock().phase; - if matches!(phase, SessionPhase::Starting | SessionPhase::Listening) { - inner.translation_modifier_seen.store(true, Ordering::SeqCst); - } -} -``` - -**Notes**: Both bridge loops run on independent `std::thread`s and both ultimately call `mark_translation_modifier_seen`, which locks `inner.state`. They never run *racing on integrity* — they both write the same `AtomicBool::store(true)`, idempotent. The audit's framing of "Both lock `inner.state` independently" is technically true but not a bug — `state` is a `Mutex`, only one acquires at a time, and both write the same flag. Worst case is one log-line of `[coord] translation modifier seen during ...` printed twice for one Shift press. Not worth a code change unless 3.3.1's fix touches the same code. - ---- - -### 3.3.3 — Cancelled doesn't reset `hotkey_trigger_held` -**Status**: FALSE_POSITIVE -**Files**: `openless-all/app/src-tauri/src/coordinator.rs:1399-1401`, `hotkey.rs:530-538`, `coordinator/dictation.rs:11-22` - -**Evidence**: -```rust -// coordinator.rs:1399-1401 -HotkeyEvent::Cancelled => { - cancel_session(&inner_cloned); -} -``` - -**Why it doesn't actually wedge**: `HotkeyEvent::Cancelled` is emitted by the OS-side hotkey monitor only when the user presses **Esc** (`hotkey.rs:565-570` for macOS, `:867-871` for Windows), not when the dictation trigger key is released. The dictation trigger's "is currently held" state lives in `Shared::trigger_held` inside the platform monitor (`hotkey.rs:117`). That atomic gates re-emission of Pressed via the `is_active && !was_held` check (`hotkey.rs:530-538`). So even if `Inner::hotkey_trigger_held` stays `true` in the coordinator after Esc, the next Pressed edge from the OS will only fire when the user releases and re-presses the trigger key — and the OS path will set `was_held=false` again before sending Pressed. The coordinator's `hotkey_trigger_held` swap on the next Pressed will return `false` (because `handle_released_edge` has run between the previous press and this one, OR the user never released, in which case no new Pressed is queued). - -The audit confused two layers: the OS-edge dedupe in `hotkey.rs::Shared::trigger_held` (which is the gating thing) and the coordinator's `Inner::hotkey_trigger_held` (which is just a bookkeeping latch tied to Pressed/Released edges that already came in). Cancelled doesn't change either's correctness. - -**Notes**: There's a *cosmetic* asymmetry — after Esc, `Inner::hotkey_trigger_held=true` until the user releases the trigger. If the user keeps holding past the Esc, `Released` fires later and resets it. Defensive cleanup would be to also reset on Cancelled, but it doesn't fix any user-visible bug. - ---- - -### 3.3.4 — `open_qa_panel` clobbers Done capsule -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/coordinator/qa.rs:79-104` - -**Evidence**: -```rust -// coordinator/qa.rs:79-104 -pub(super) fn open_qa_panel(inner: &Arc) { - { - let mut state = inner.qa_state.lock(); - state.panel_visible = true; - state.phase = QaPhase::Idle; - // ... - } - // 先把胶囊清干净,避免主听写上一次 Done 状态残留的 message/insertedChars - // 在 QA Done 阶段被 capsule UI 错误复用("已之一粘贴这个 0" 那种)。 - emit_capsule(inner, CapsuleState::Idle, 0.0, 0, None, None); - // ... -} -``` - -**Notes**: The comment shows the design intent is *intentional* — sweep stale Done state from a previous dictation. But it sweeps *any* in-flight capsule too. If the user opens the QA panel within the ~1.5 s `CAPSULE_AUTO_HIDE_DELAY_MS` window after dictation finishes, they lose the "已粘贴 N 字" toast. More importantly, if dictation is still in `Polishing` or `Inserting` phase (LLM hasn't returned yet), opening QA hides the polish-progress capsule mid-flight. The user sees their dictation "vanish" until insertion completes. - -**Fix sketch**: Before calling `emit_capsule(Idle, ...)`, check `inner.state.lock().phase`. Only sweep if dictation is in `Idle`. If dictation is mid-flight, leave the capsule visible — QA panel doesn't need the capsule cleared to function. Pairs cleanly with the 3.3.1 fix (same source files). - ---- - -### 3.3.5 — `focus_target` leaks on Processing-phase cancel -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/coordinator_state.rs:155-173`, `:347-374` (test that already proves the gap) - -**Evidence**: -```rust -// coordinator_state.rs:168-173 -pub(crate) fn finish_cancel_session_state(state: &mut SessionState, decision: CancelDecision) { - if decision.phase != SessionPhase::Processing { - state.phase = SessionPhase::Idle; - state.focus_target = None; - } -} -``` - -**Evidence (existing test acknowledges the gap)**: -```rust -// coordinator_state.rs:370-372 -if matches!(initial, SessionPhase::Starting | SessionPhase::Listening) { - assert!(state.focus_target.is_none(), "initial={initial:?}"); -} -// Note: no assertion that Processing-phase cancel clears focus_target. -``` - -**Notes**: When cancel hits `Processing`, `finish_cancel_session_state` deliberately keeps `phase=Processing` (the comment says "交给 end_session 自己收尾"), but it also keeps `focus_target` populated. `end_session` does eventually reset state via the `proceed_to_insert=false` branch (`coordinator/dictation.rs:862-878`) which sets `phase=Idle` but doesn't touch `focus_target`. Net result: stale `focus_target` (a `Vec` index, basically a plain `usize`) lives on into the next `begin_session_state`, which overwrites it (`coordinator_state.rs:80`). So the leak is bounded — next session clobbers it. Real risk is only between cancel and next begin, where `restore_focus_target_if_possible(focus_target)` could pick up a stale value if anyone reads it. Code review didn't surface a reader on that interval, so this is a low-impact correctness gap, not a user-facing bug. - -**Fix sketch**: In `finish_cancel_session_state`, set `state.focus_target = None` unconditionally (before the phase check). The Processing branch's existing semantic — "let `end_session` collapse to Idle" — doesn't depend on `focus_target` staying set. - ---- - -### 3.3.6 — Double-restore of prepared Windows IME session -**Status**: FALSE_POSITIVE -**Files**: `openless-all/app/src-tauri/src/coordinator.rs:1594-1633` - -**Evidence**: -```rust -// coordinator.rs:1594-1602 -fn take_matching_prepared_windows_ime_session( - slots: &mut Vec, - session_id: SessionId, -) -> Option { - let index = slots - .iter() - .position(|slot| slot.session_id == session_id)?; - Some(slots.remove(index).prepared) -} -``` - -```rust -// coordinator.rs:1620-1633 -fn restore_prepared_windows_ime_session(inner: &Arc, session_id: SessionId) { - let state = inner.state.lock(); - let prepared = { - let mut slot = inner.prepared_windows_ime_session.lock(); - take_current_prepared_windows_ime_session_for_restore( - &mut slot, session_id, state.session_id, - ) - }; - if let Some(prepared) = prepared { inner.windows_ime.restore_session(prepared); } -} -``` - -**Notes**: First call to `restore_prepared_windows_ime_session` for a given `session_id` does `slots.remove(index)` regardless of the freshness check on `current_session_id`. The slot is gone after that. Second call's `slots.iter().position(...)` returns `None`, the `?` short-circuits, the function silently no-ops. So even if `cancel_session → end_session` (or vice versa) both invoke `restore_prepared_windows_ime_session` with the same `session_id`, the IME state is restored at most once. The audit's worry is unfounded. - ---- - -### 3.4.1 — Inner has many lock fields -**Status**: ADVISORY_ONLY -**Files**: `openless-all/app/src-tauri/src/coordinator.rs:91-141` - -**Inventory**: 16 `Mutex<...>` + 4 `AtomicBool` (excluding `Arc<...>` indirection counts) plus an `Arc>>` for the windows IME slot vector. Specifically: - -| Field | Type | -|---|---| -| `app` | `Mutex>` | -| `state` | `Mutex` | -| `asr` | `Mutex>>` | -| `recorder` | `Mutex>>` | -| `recording_mute` | `Mutex` | -| `hotkey` | `Mutex>` | -| `hotkey_status` | `Mutex` | -| `hotkey_trigger_held` | `AtomicBool` | -| `shortcut_recording_active` | `AtomicBool` | -| `combo_hotkey` | `Mutex>` | -| `translation_hotkey` | `Mutex>` | -| `switch_style_hotkey` | `Mutex>` | -| `open_app_hotkey` | `Mutex>` | -| `translation_modifier_seen` | `AtomicBool` | -| `qa_hotkey` | `Mutex>` | -| `qa_state` | `Mutex` | -| `capsule_layout` | `Mutex>` | -| `qa_asr` | `Mutex>>` | -| `qa_recorder` | `Mutex>` | -| `qa_stream_cancelled` | `Arc` (one of two AtomicBools-in-Arc) | -| `prepared_windows_ime_session` (windows-only) | `Arc>>` | - -No deadlock pattern was found in the read paths — most call sites take one lock at a time. `mark_translation_modifier_seen` and `cancel_session` both touch `inner.state`, but in disjoint critical sections. Documenting only. - ---- - -### 3.4.2 — Heavy `Ordering::SeqCst` use -**Status**: ADVISORY_ONLY -**Files**: `coordinator.rs`, `coordinator/*.rs`, `hotkey.rs` - -**Evidence**: `grep -rn "Ordering::SeqCst" coordinator.rs coordinator/ hotkey.rs | wc -l` → 66. Total `Ordering::*` uses in those files: 67 (one `Relaxed` in `recorder.rs::process_callback`, 66 SeqCst). - -**Notes**: Most uses are simple set/load on independent `AtomicBool`s where `Ordering::Relaxed` would suffice. A few that gate cross-thread visibility (`hotkey_trigger_held` swap in `handle_pressed_edge` synchronizing with the audio thread reading session state) might justify Acquire/Release. `SeqCst` is correct everywhere — just over-strong. Not a bug. - ---- - -### 3.4.3 — Many `unsafe` blocks, audit SAFETY comments -**Status**: ADVISORY_ONLY -**Files**: cross-tree (predominantly `hotkey.rs`, `insertion.rs`, `windows_ime_*.rs`, `permissions.rs`) - -**Evidence**: `grep -rn "unsafe " src/ --include="*.rs" | grep -E "unsafe \{|unsafe fn|unsafe impl|unsafe extern"` → 102 sites. - -**Notes**: Almost all are platform FFI (CoreFoundation/CoreGraphics on macOS, win32 on Windows, msg_send! on macOS objc2). Sample inspected (`insertion.rs::send_text` near line 340 and `post_cmd_v` near line 420) — function-level invariants are documented at module level, but inline `// SAFETY:` comments are sparse. Same for `hotkey.rs::run_listen_loop` which leaks `Box::into_raw` for FFI context — a `// SAFETY: ctx is dropped only inside the listener after CFRunLoopRun returns; reentrancy guarded by ...` comment would help. Documentation-grade improvement, no soundness bug detected. - ---- - -### 3.4.4 — Global hotkey dispatcher loop has no exit -**Status**: CONFIRMED -**Files**: `openless-all/app/src-tauri/src/global_hotkey_runtime.rs:94-107` - -**Evidence**: -```rust -// global_hotkey_runtime.rs:94-107 -fn start_dispatcher(runtime: Arc) { - std::thread::Builder::new() - .name("openless-global-hotkey-dispatch".into()) - .spawn(move || { - let receiver = GlobalHotKeyEvent::receiver(); - loop { - match receiver.recv_timeout(Duration::from_millis(250)) { - Ok(event) => runtime.dispatch(event), - Err(_) => continue, - } - } - }) - .expect("spawn global hotkey dispatcher"); -} -``` - -**Notes**: `GlobalHotKeyEvent::receiver()` is process-global from the upstream `global-hotkey` crate. The 250 ms timeout means the loop wakes regularly but never checks an exit flag. On app shutdown the thread leaks; harmless for a single-instance app but trips `tokio::test` and any future `Drop`-based teardown (e.g. integration tests that spin coordinator up/down). - -**Fix sketch**: Pair with `Inner` shutdown flag added in 3.1.2 fix, or use a `parking_lot::RwLock>` "dispatcher alive" gate inside `GlobalHotkeyRuntime` that the loop reads each iteration. Same shape as the Windows hotkey thread's `WM_QUIT` plumbing. - ---- - -### 20 (NEW) — `PreferencesStore::new().expect(...)` panics on bad prefs -**Status**: FALSE_POSITIVE -**Files**: `openless-all/app/src-tauri/src/coordinator.rs:169`, `:210`, `persistence.rs:790-811`, `:146-156`, `types.rs:368-422` - -**Evidence (deserialization fallback already exists)**: -```rust -// persistence.rs:790-811 -impl PreferencesStore { - pub fn new() -> Result { - let dir = data_dir()?; - ensure_dir(&dir)?; - let path = dir.join(PREFERENCES_FILE); - let prefs = if path.exists() { - read_or_default::(&path).unwrap_or_else(|e| { - log::warn!( - "[prefs] load {} failed, using defaults: {}", - path.display(), - e - ); - UserPreferences::default() - }) - } else { - UserPreferences::default() - }; - Ok(Self { path, state: Mutex::new(prefs), }) - } -} -``` - -```rust -// persistence.rs:146-156 (read_or_default) -fn read_or_default Deserialize<'de> + Default>(path: &Path) -> Result { - if !path.exists() { return Ok(T::default()); } - let bytes = fs::read(path).with_context(|| format!("read failed: {}", path.display()))?; - if bytes.is_empty() { return Ok(T::default()); } - serde_json::from_slice::(&bytes) - .with_context(|| format!("decode failed: {}", path.display())) -} -``` - -**Why the audit was wrong**: The custom `Deserialize for UserPreferences` (types.rs:368-422) does call `default_dictation_hotkey_from_legacy(...).map_err(serde::de::Error::custom)?` which can return a serde error for `trigger == Custom` without `customComboHotkey`. That error bubbles through `serde_json::from_slice::` to `read_or_default`, which propagates it as `Result::Err`. But `PreferencesStore::new` then catches it at `.unwrap_or_else(|e| { log::warn!(...); UserPreferences::default() })`. So the `expect("preferences store init")` at coordinator.rs:169 only fires if `data_dir()?` or `ensure_dir(&dir)?` fails — i.e. the OS-level Application Support directory cannot be created/accessed, which is a legitimate fail-fast condition (no preferences-file storage, no point continuing). - -The audit conflated "deserialization fails" with "PreferencesStore::new returns Err". In the current code those are different. - -**Notes**: No fix needed. The "bad prefs file" case is already handled silently (log + default). If you want belt-and-braces against panic on the truly-impossible filesystem failure, you can add a final `.unwrap_or_else` that logs and returns a fully-default in-memory store, but that's defensive coding for a case where the user's machine is so broken that Application Support is unwritable. - ---- - -### 2.3.3 — Event-name alignment between backend emit and frontend listen -**Status**: CONFIRMED (no action) -**Files**: `coordinator.rs:3659`, `coordinator/qa.rs:94-101`, `coordinator/dictation.rs:929-931`, `src/components/Capsule.tsx:293`, `src/pages/QaPanel.tsx:55,116`, `src/pages/Vocab.tsx:51` - -**Notes**: Backend emits `capsule:state`, `qa:state`, `qa:level`, `vocab:updated`. Frontend listens to all four under matching names (Capsule, QaPanel, Vocab respectively). No mismatch. As stated in the audit prompt — already verified, retained here for completeness. - -## Files referenced - -- `openless-all/app/src-tauri/src/types.rs` (lines 57, 200-525, especially 216-219, 277-325, 368-422) -- `openless-all/app/src-tauri/src/hotkey.rs` (lines 1-250, 280-572, 698-870, 1126-1190) -- `openless-all/app/src-tauri/src/coordinator.rs` (lines 91-141, 156-313, 640-833, 837-940, 1130-1139, 1376-1419, 1582-1636, 3617-3700) -- `openless-all/app/src-tauri/src/coordinator/dictation.rs` (lines 1-160, 320-410, 810-1050) -- `openless-all/app/src-tauri/src/coordinator/qa.rs` (full file, 1-124) -- `openless-all/app/src-tauri/src/coordinator/resources.rs` (lines 1-160) -- `openless-all/app/src-tauri/src/coordinator_state.rs` (full file, 1-485) -- `openless-all/app/src-tauri/src/global_hotkey_runtime.rs` (full file, 1-107) -- `openless-all/app/src-tauri/src/audio_mute.rs` (lines 1-263) -- `openless-all/app/src-tauri/src/permissions.rs` (lines 200-360) -- `openless-all/app/src-tauri/src/insertion.rs` (lines 1-150, 300-450) -- `openless-all/app/src-tauri/src/persistence.rs` (lines 146-156, 785-811) -- `openless-all/app/src-tauri/src/recorder.rs` (lines 28-490) -- `openless-all/app/src-tauri/src/commands.rs` (lines 975-1000) -- `openless-all/app/src/lib/types.ts` (lines 118-275) -- `openless-all/app/src/lib/ipc.ts` (lines 40-176) -- `openless-all/app/src/components/Capsule.tsx` (line 293) -- `openless-all/app/src/pages/QaPanel.tsx` (lines 55, 116-120) -- `openless-all/app/src/pages/Vocab.tsx` (line 51) diff --git a/docs/auto-update-download-acceleration.md b/docs/auto-update-download-acceleration.md deleted file mode 100644 index b86fd54a..00000000 --- a/docs/auto-update-download-acceleration.md +++ /dev/null @@ -1,63 +0,0 @@ -# Auto Update Download Acceleration - -## Problem - -OpenLess used a single Tauri updater endpoint on GitHub Releases: - -```text -https://github.com/appergb/openless/releases/latest/download/latest-{{target}}-{{arch}}.json -``` - -The manifest also pointed installer downloads back to GitHub Releases. On networks where GitHub release assets are slow, a small updater package can take minutes to download. - -Desktop apps do not reliably inherit a user's shell proxy environment. Instead of making updater correctness depend on whether a proxy is visible to the app process, the updater should use a GitHub release acceleration URL directly. - -## Runtime Behavior - -The app does not manually probe local proxy ports. It lets the OS/process network stack do whatever it normally does, while the updater endpoint itself points at `fastgit.cc` first. This keeps the rule simple: proxy or no proxy, updater traffic should prefer the fastgit transport. - -## Fastgit Acceleration - -Release builds now publish two updater manifests per target: - -```text -latest--.json -latest---mirror.json -``` - -The client checks the mirror manifest first, then GitHub. The mirror manifest points its installer URL at: - -```text -https://fastgit.cc/https://github.com//releases/latest/download/ -``` - -The updater signature still protects the downloaded package. The mirror only changes transport; it cannot replace the signed payload without verification failing. - -## Maintainer Notes - -Set `OPENLESS_UPDATE_MIRROR_BASE_URL` in CI to change the mirror host. Keep it formatted as a prefix for GitHub URLs, for example: - -```text -https://fastgit.cc/https://github.com -``` - -If a mirror becomes unreliable, replace that environment value and the mirror endpoint in `openless-all/app/src-tauri/tauri.conf.json`. - -## Evidence - -Measured from Windows on 2026-05-01. Direct GitHub release downloads were tested with local proxy disabled to reproduce the slow path. `fastgit.cc` was tested both through the normal local proxy environment and with local proxy disabled; results vary by route, so do not treat one machine's no-proxy number as a CDN SLA. - -```text -Direct GitHub installer asset, 4.78 MB, proxy disabled: -run 1: timed out after 90.75s, 1.73 MB received -run 2: timed out after 90.06s, 2.44 MB received - -fastgit.cc installer asset, 4.78 MB, normal local proxy environment: -with protocol prefix: 3.12s / 3.63s / 3.39s -without protocol prefix: 2.92s / 2.45s / 2.87s - -fastgit.cc target-user signal: -manual browser/download usage reported completing in under 1s without enabling a proxy. -``` - -This is enough to justify a `fastgit.cc` mirror path, but not enough to treat a public mirror as permanently trusted infrastructure. `fastgit.cc` explicitly documents support for GitHub release/archive acceleration and accepts GitHub links with or without the protocol prefix. Keep the mirror configurable and re-test before each release if download performance is a release blocker. diff --git a/docs/github-tracking/issue-139-capsule-lifecycle.md b/docs/github-tracking/issue-139-capsule-lifecycle.md deleted file mode 100644 index d962d5cf..00000000 --- a/docs/github-tracking/issue-139-capsule-lifecycle.md +++ /dev/null @@ -1,87 +0,0 @@ -## 现象 / Symptom - -这不是单一的 click dead zone bug,而是一组已经在 Windows 实机上被观察到、且共享同一根因的 helper-window lifecycle 症状: - -- click dead zone:原 Capsule 区域附近会挡住底层输入框或按钮 -- screenshot selectable:截图工具仍然可以选中这块透明区域 -- drag stutter:在该区域拖拽时出现明显卡顿或 compositor 异常 -- lingering transparent overlay:录音结束后,Capsule 仍可能以透明顶层窗 linger - -当前证据说明:这些现象不应拆成多个互不相关的问题,而应视为同一个生命周期语义偏差。 - -### 证据 / Evidence - -运行与代码证据: - -- `openless-all/app/src-tauri/tauri.conf.json:33-47` - - `capsule` 被配置为 `transparent + alwaysOnTop + focus:false + visible:false` -- `openless-all/app/src-tauri/src/lib.rs:594-623` - - Windows 端 `capsule` runtime host bounds 为 `220x84/118`,明显大于可见 pill `196x52` -- `openless-all/app/src-tauri/src/coordinator.rs:2398-2432` - - Windows 端显示路径走 `ShowWindow(SW_SHOWNOACTIVATE)` + `SetWindowPos(...SWP_NOACTIVATE...)` -- `openless-all/app/src-tauri/src/coordinator.rs:2455-2479` - - 结束阶段依赖 `window.hide()` 作为生命周期结束语义 -- `openless-all/app/src/components/Capsule.tsx:278-281` - - 前端 `idle` 只把可见内容缩成 `0x0`,真正结束仍取决于后端窗口是否已完全退出参与 -- [2026-05-02-platform-lifecycle-audit.md](/D:/Users/cooper/Practice-Project/202604/openless/docs/2026-05-02-platform-lifecycle-audit.md) - - 审计已把该问题收敛为 Windows helper-window lifecycle contract 偏差 - -现场证据: - -- 用户已在 Windows 上观察到 dead zone / screenshot selectable / drag stutter / lingering overlay -- 这些表现与透明顶层 helper window 未真正退出 OS 参与的形态一致 - -### 5 Whys / 根因分析 - -1. 为什么会出现点击死区、截图可选中、拖拽卡顿? - - 因为录音结束后,Windows 上的 Capsule host window 仍可能继续存在并参与桌面层级。 -2. 为什么录音结束后窗口还会继续参与? - - 因为当前实现把“生命周期结束”主要建模成 `hide()`,而不是“保证 helper window 不再参与 hit-test / capture / z-order / compositor”。 -3. 为什么这个问题在 Windows 上更容易暴露? - - 因为 Windows 的 Capsule host geometry 更大、show path 更特殊,并且是透明顶层窗;一旦 hide 语义失守,残留面积极大且更容易干扰系统行为。 -4. 为什么这和 macOS 的原始设计意图不一致? - - macOS 的原始意图是:Capsule 只在 active stage 短暂出现,结束后自然收起,不再作为前台交互对象继续存在;Windows 当前更像“视觉结束了,但 OS 对象还挂着”。 -5. 为什么之前没有被门禁拦住? - - 现有检查更多关注“窗口显示/隐藏”和几何配置,没有直接验证 inactive state 下它是否真的退出系统参与。 - -### 平台边界 / Platform Scope - -- 直接症状范围:当前已确认是 Windows 实机问题。 -- 问题层面:backend helper-window lifecycle contract + Windows native window participation。 -- 全平台风险判断:根因模式不是 Windows 独有,任何透明 helper window 只要“视觉隐藏 != 生命周期结束”都可能中招;Capsule 目前是 Windows 上最先爆出来的样板案例。 - -### 认领 / Ownership - -- owner intent:`@Cooper-X-Oak` -- 当前对应 draft PR:`#140` - -### 当前状态 / Current status - -- lifecycle 主线修复已完成第一波 -- 人工桌面回归结果: - - click dead zone:通过 - - screenshot selectable:通过 - - drag stutter:通过 -- 当前建议:从“问题收敛中”推进到“regression review 中” - -## 影响 / Impact - -- 直接影响 Windows 端核心输入体验与系统交互可信度 -- 会误伤底层 app 的点击、截图、拖拽,用户容易误判成其他应用故障 -- 因为残留对象透明且顶层,这类问题隐蔽、难复现、难定位 -- 如果不从生命周期语义修,后续即使修掉某一个 dead zone,仍可能继续遗留 screenshot / z-order / compositor 问题 - -## 建议接受标准 / Proposed Acceptance Criteria - -- [ ] Windows 上 Capsule 的“结束”语义与 macOS 对齐:inactive 后不再继续参与系统交互 -- [ ] inactive Capsule 不再造成 click dead zone -- [ ] inactive Capsule 不再被截图工具选中 -- [ ] inactive Capsule 不再引入 drag/compositor stutter -- [ ] 为 Windows 增加一条直接验证 inactive Capsule non-participating 的 smoke / regression check -- [ ] 修复方案明确区分 visual state 与 host-window lifecycle state,而不是继续叠加局部 workaround - -## TODO / 不确定项 - -- 是否需要把 `capsule hidden => no hit-test / no capture / no topmost participation` 抽成统一 helper-window contract,复用于 QA panel -- 当前 `PR #140` 建议保持 draft tracking 角色,待范围与根因完全收敛后再转 ready -建议 issue 标题:`[ui][windows] Capsule 隐藏后仍参与系统交互` diff --git a/docs/github-tracking/issue-98-startup-visible-ready.md b/docs/github-tracking/issue-98-startup-visible-ready.md deleted file mode 100644 index 40e357a1..00000000 --- a/docs/github-tracking/issue-98-startup-visible-ready.md +++ /dev/null @@ -1,74 +0,0 @@ -## 现象 / Symptom - -Windows 冷启动路径里,`visible` 与 `ready` 目前是脱钩的:主窗口可以先被用户看见,但 global hotkey / runtime lifecycle 还在后台异步安装。 - -这不是单纯的 UI 小闪烁,而是 startup lifecycle ownership 不统一: - -- `main` 在配置层默认 `visible:false` -- backend 负责 `show_main_window()` / tray reopen / single-instance focus -- frontend `App.tsx` 又在 mount 后主动 `currentWindow.show()` -- Windows 路径下 `gate` 初始值直接是 `ready` - -### 证据 / Evidence - -- `openless-all/app/src-tauri/tauri.conf.json:17-30` - - `main.visible = false` -- `openless-all/app/src-tauri/src/lib.rs:314-356` - - backend 明确拥有 `show_main_window()` / `hide_main_window()` 生命周期入口 -- `openless-all/app/src-tauri/src/lib.rs:158-163` - - hotkey listener 与 QA hotkey listener 在 setup 后异步启动 -- `openless-all/app/src/App.tsx:23-52` - - Windows 路径初始化时直接 `gate='ready'` - - mount 后又在 `requestAnimationFrame` 里调用 `currentWindow.show()` -- [2026-05-02-platform-lifecycle-audit.md](/D:/Users/cooper/Practice-Project/202604/openless/docs/2026-05-02-platform-lifecycle-audit.md) - - 审计已将该问题归类为 startup lifecycle ownership 偏差 - -### 5 Whys / 根因分析 - -1. 为什么用户会看到一个看似 ready 的窗口,但热键/运行态未必已经 ready? - - 因为窗口可见时机和 runtime readiness 时机不是一个 source of truth。 -2. 为什么这两个时机分离了? - - 因为 backend 和 frontend 同时持有 `main` visibility 的一部分控制权。 -3. 为什么 Windows 上更明显? - - 因为 Windows 启动路径跳过了 macOS 那种明确的 permission gate / startup shell,正式 UI 更早暴露。 -4. 为什么这偏离了 macOS 的原始设计意图? - - 原始意图是“用户看见主窗口时,它已经进入可用或可解释的阶段”;Windows 当前更像“窗口先到,能力后到”。 -5. 为什么之前没被系统性识别? - - 现有 smoke 主要验证“进程活着 + 稍后日志出现 hotkey installed”,没有验证“first visible frame == operationally ready”。 - -### 平台边界 / Platform Scope - -- 直接症状范围:当前主要在 Windows 冷启动观察到。 -- 问题层面:startup lifecycle ownership、window visibility contract、runtime readiness contract。 -- 全平台风险判断:这是全平台架构层风险,但 Windows 因跳过 startup gate、前端主动 show,最先表现为真实用户问题。 - -### 认领 / Ownership - -- owner intent:`@Cooper-X-Oak` -- 对应 draft PR:待创建 - -### 当前状态 / Current status - -- startup lifecycle 主线修复已生效 -- 最新测试入口改为 frontend-managed first show,不再用 backend immediate show 污染结果 -- 人工冷启动体验反馈:几乎没有问题,人眼很难分辨 -- 当前建议:保留 draft,继续观察 first-paint / startup latency,而不是继续扩大主修补丁 - -## 影响 / Impact - -- 用户会把尚未 ready 的窗口误判为已经 ready -- 会放大“热键没反应 / 运行态未安装”的首屏困惑 -- 让后续任何 Windows 启动问题更难分辨是 UI 问题、hotkey 问题,还是 lifecycle contract 问题 - -## 建议接受标准 / Proposed Acceptance Criteria - -- [ ] `main` 窗口的首次可见时机只由一个 owner 控制 -- [ ] first visible frame 与 runtime readiness 的关系被明确定义并可验证 -- [ ] Windows 冷启动下,用户首次看到主窗口时,至少处于明确的 `startup` 或 `ready` 状态,而不是 ambiguous ready -- [ ] 增加一条启动 smoke:覆盖 `visible`、`hotkey installed`、`first usable state` 的先后顺序 - -## TODO / 不确定项 - -- 是否应把 `main` visibility 完全收回 backend,frontend 只负责内容 gate -- 是否要把现有 `issue #143` 的 first-paint 问题作为本 issue 的下游视觉子问题处理,还是继续分票并行跟踪 -建议 issue 标题:`[tauri][windows] 冷启动时 visible 与 ready 脱钩` diff --git a/docs/github-tracking/issue-windows-dual-hotkey-sources.md b/docs/github-tracking/issue-windows-dual-hotkey-sources.md deleted file mode 100644 index 82197002..00000000 --- a/docs/github-tracking/issue-windows-dual-hotkey-sources.md +++ /dev/null @@ -1,77 +0,0 @@ -## Symptom - -Windows dictation / QA lifecycle previously had two event sources driving the same state machine: - -- OS-level low-level keyboard hook -- renderer / window-local hotkey forwarding - -That design is risky even when the product "seems to work": - -- press/release edges can come from different sources -- focus switches can strand half an edge -- hold mode and toggle mode can drift differently on Windows only - -## Evidence - -- [openless-all/app/src/App.tsx](/D:/Users/cooper/Practice-Project/202604/openless/openless-all/app/src/App.tsx) - - Windows window-local forwarding existed in the frontend path -- [openless-all/app/src-tauri/src/coordinator.rs](/D:/Users/cooper/Practice-Project/202604/openless/openless-all/app/src-tauri/src/coordinator.rs) - - backend also accepted `handle_window_hotkey_event` -- [openless-all/app/src-tauri/src/hotkey.rs](/D:/Users/cooper/Practice-Project/202604/openless/openless-all/app/src-tauri/src/hotkey.rs) - - Windows already owns a `WH_KEYBOARD_LL` low-level hook - -Current convergence from this repair track: - -- QA hotkey / follow-up flow works -- Windows owner source should be the backend low-level hook -- window-local forwarding should not keep driving the same lifecycle - -## Root Cause Convergence - -This was not just "an extra fallback path". - -It was an ownership problem: - -```text -Two independent input sources were allowed to influence one dictation / QA -lifecycle state machine without an explicit precedence contract. -``` - -## 5 Whys - -1. Why is this a lifecycle issue and not just a convenience fallback? - - Because the second path was able to trigger real start/stop edges. -2. Why is that dangerous? - - Because mixed-source ordering can desynchronize phase transitions. -3. Why is it primarily a Windows issue? - - Because Windows carried both the low-level hook and the renderer-forward path. -4. Why does this diverge from original intent? - - Because one user gesture should map to one stable lifecycle transition. -5. Why is this near closure now? - - Because current repair work has already converged on a single owner source: backend low-level hook. - -## Platform Scope - -- Direct symptom scope: Windows implementation risk -- Problem layer: input source ownership, lifecycle precedence, focus-sensitive edge delivery - -## Related Issues - -- #154 main issue anchor -- #147 settings-to-runtime listener refresh contract -- #158 governance issue for helper-window / native-window contract family - -## Impact - -- Without a single owner source, Windows-only lifecycle drift remains hard to reproduce and harder to trust -- With ownership clarified, regression review can focus on evidence instead of guessing which path fired - -## Proposed Acceptance Criteria - -- [ ] Windows lifecycle owner source is explicitly documented as backend low-level hook -- [ ] window-local forwarding no longer drives the main lifecycle unless a future explicit fallback contract is introduced -- [ ] regression review confirms no new mixed-source ordering evidence - -## Status Note - -Current recommendation: treat this issue as near-closure and use it as a regression-review anchor rather than a new large refactor anchor. diff --git a/docs/github-tracking/issue-windows-terminal-clipboard-restore.md b/docs/github-tracking/issue-windows-terminal-clipboard-restore.md deleted file mode 100644 index 2c5b5107..00000000 --- a/docs/github-tracking/issue-windows-terminal-clipboard-restore.md +++ /dev/null @@ -1,146 +0,0 @@ -## 现象 / Symptom - -Windows terminal 文本输入场景历史上出现过两类现象: - -- 用户反馈 terminal 里不会自动上屏,需要再手动 `Ctrl+V` -- 本地测试曾观察到一次“目标最终拿到的是旧剪贴板,而不是本次听写结果”的现象 - -这两类现象都指向同一条 Windows insertion 链路:OpenLess 通过 clipboard + synthetic `Ctrl+V` 完成插入,而 terminal 是最敏感的目标类型之一。 - -### 证据 / Evidence - -- `openless-all/app/src-tauri/src/insertion.rs` - - Windows 路径的成功语义是 `PasteSent` - - `PasteSent` 只代表已经发出 synthetic `Ctrl+V` - - 它不代表目标已经完成 clipboard 消费 -- `docs/2026-05-02-windows-terminal-clipboard-restore-investigation.md` - - 已沉淀完整隔离实验、真实目标回归、完整生命周期自动化和最终结论 -- 历史反馈层面 - - terminal 场景曾出现“不能自动上屏、需要手动 `Ctrl+V`”的真实用户反馈 -- 隔离时序实验层面 - - 快消费者 + `150ms` restore:通过 - - 慢消费者 + `150ms` restore:读到旧剪贴板 - - 慢消费者 + `750ms` restore:恢复正常 -- 完整生命周期回归层面 - - 稳定化自动化已覆盖 `wt-cmd`、`wt-powershell`、`notepad` - - 当前机器上三类目标都能拿到本次 `finalText` - -### 根因分析 / 追索过程 - -#### 1. 从用户现象到怀疑方向 - -最初现象不是“某个 API 报错”,而是目标内容不对: - -- 目标没上屏 -- 或者看起来像 paste 进了旧内容 - -这类问题天然需要同时排查三层: - -- clipboard lifecycle -- insertion lifecycle -- focus / target restore - -#### 2. 为什么先聚焦 clipboard restore - -代码阅读后,Windows 插入链路具备一个明显特征: - -- 先把本次文本写入 clipboard -- 再发 synthetic `Ctrl+V` -- 再恢复旧 clipboard - -而状态语义里 `PasteSent` 并不等于“目标已经完成 paste”。 -因此最早的根因假设是: - -- 如果目标消费 clipboard 较慢,restore 可能会抢在目标 paste 之前发生 - -#### 3. 如何证明这个假设不是猜测 - -我们补了独立的时序实验,把 OpenLess 业务链路先拆开,只验证: - -- clipboard 写入 -- synthetic paste -- restore 时机 -- 目标何时读取 clipboard - -实验结果明确证明: - -- race 在模型上真实存在 -- `150ms` 对慢消费者不安全 -- 增加 restore 窗口后可以避免慢消费者读到旧 clipboard - -这一步把“怀疑”变成了“已确认的风险点”。 - -#### 4. 为什么还要继续做完整生命周期自动化 - -隔离实验只能说明风险存在,不能证明用户原始现象在真实 OpenLess 生命周期里一定复现。 - -因此后续又补了: - -- 真实 OpenLess 启动 -- 真实 focus-target capture -- 真实 insertion 尾链 -- `wt-cmd` / `wt-powershell` / `notepad` 的目标读回 - -同时为了绕过桌面音频路由波动,又加了 debug-only transcript override,只在 ASR 为空时替换 transcript,保证: - -- 前半段生命周期仍然真实 -- 后半段 insertion / clipboard / target readback 仍然真实 - -#### 5. 最终根因判断 - -最终可以明确的根因不是“terminal 当前一定有 bug”,而是: - -- Windows insertion 链路原本存在一个真实的 clipboard restore timing 风险 -- 这个风险可以解释历史上 terminal 场景里的不稳定反馈 -- 我们已经把这个风险点补了 hardening 修复 - -换句话说,这次 issue 真正承接的是: - -- 一条历史上确实不够稳的 Windows terminal insertion 链路 -- 以及其中一个已经被确认和修补的底层时序风险 - -### 平台边界 / Platform Scope - -- 直接范围:Windows -- 关注层次:`clipboard lifecycle`、`insertion lifecycle` -- terminal 是重点观察目标,但不是唯一可能受影响的慢消费者 -- `focus restore` 不是本轮主要根因 - -### 认领 / Ownership - -- owner intent:`@Cooper-X-Oak` -- 当前对应 draft/ready PR:`#160` - -## 影响 / Impact - -- 影响 Windows terminal 文本输入的稳定性认知 -- 会让 `PasteSent` 的用户语义和目标实际表现产生偏差 -- 增加“为什么目标没上屏 / 为什么需要手动 Ctrl+V”的排障成本 -- 对 Windows insertion 这条核心路径的可信度有直接影响 - -## 建议接受标准 / Proposed Acceptance Criteria - -- [x] 明确 Windows `PasteSent` 与“目标已完成 paste”不是同一语义 -- [x] 明确并记录 clipboard restore timing 风险模型 -- [x] 完成最小 hardening 修复: - - [x] Windows restore 延后到 `750ms` - - [x] restore 改为异步执行 -- [x] 提供隔离时序实验,证明 race 模型成立 -- [x] 提供稳定化完整生命周期自动化,覆盖: - - [x] `wt-cmd` - - [x] `wt-powershell` - - [x] `notepad` -- [x] 记录当前环境下的最终结论: - - [x] 历史风险真实存在 - - [x] 当前回归未再出现目标吃到旧 clipboard 的结果 - - [x] 当前稳定性较历史状态已有改善 - -## TODO / 不确定项 - -- 是否需要进一步收紧 `PasteSent` 相关用户文案,避免被理解为“已确认粘贴成功” -- 若后续再收到用户现场反馈,是否需要补充更细的环境标签: - - terminal host / profile - - 输入法状态 - - 前台切换时序 - -建议 issue 标题:`[windows][insertion] 终端旧剪贴板粘贴风险已收敛,当前整链路回归稳定` diff --git a/docs/github-tracking/pr-140-capsule-lifecycle.md b/docs/github-tracking/pr-140-capsule-lifecycle.md deleted file mode 100644 index 67b77dd8..00000000 --- a/docs/github-tracking/pr-140-capsule-lifecycle.md +++ /dev/null @@ -1,52 +0,0 @@ -## 摘要 - -Closes #139 - -这个 PR 现在从“问题收敛中”推进到“regression review 中”。 - -本轮已经完成: - -- Windows helper-window lifecycle root cause 收敛 -- `inactive` 路径的 native hide / non-topmost 收口 -- 冷启动最新 debug 包回归 -- 人工桌面症状回归: - - click dead zone:通过 - - screenshot selectable:通过 - - drag stutter:通过 - -## 修复 / 新增 / 改进 - -- 对齐 PR 目标:关注 Windows Capsule helper-window lifecycle,而不是单点 dead zone workaround -- 收口 Windows 上 `visible / hidden / inactive / non-participating` 的 Capsule 语义 -- 在 backend 上补齐 inactive 后的 native hide 行为,避免 transparent topmost helper window lingering -- 新增 lifecycle contract / smoke 辅助脚本,帮助后续回归持续验证 -- 与 [issue-139-capsule-lifecycle.md](/D:/Users/cooper/Practice-Project/202604/openless/docs/github-tracking/issue-139-capsule-lifecycle.md) 保持同一问题口径 - -## 兼容 - -- 不包含:Capsule geometry / rounded corner / titlebar frame 纯视觉适配 -- 不包含:QA hotkey / selection ask 输入源逻辑 -- 对现有用户 / 本地环境 / 构建流程的影响:只聚焦 lifecycle 主线,不扩大到 UI polish 线 - -## 测试计划 - -- [x] 命令:`node openless-all/app/scripts/windows-lifecycle-contract.test.mjs` -- [x] 结果:通过 -- [x] 证据路径:本地命令输出 - -- [x] 命令:`npm run build` -- [x] 结果:通过 -- [x] 证据路径:本地命令输出 - -- [x] 命令:`cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml` -- [x] 结果:通过 -- [x] 证据路径:本地命令输出 - -- [x] 命令:`powershell -ExecutionPolicy Bypass -File openless-all/app/scripts/windows-runtime-smoke.ps1` -- [x] 结果:通过 launch / hotkey installed baseline -- [x] 证据路径:本地命令输出 - -- [x] 命令:人工桌面回归(latest debug cold start -> dictation start/stop) -- [x] 结果:点击 / 截图 / 拖拽三项全部通过 -- [x] 证据路径:当前线程回归记录 -关联 issue 建议标题:`[ui][windows] Capsule 隐藏后仍参与系统交互` diff --git a/docs/github-tracking/pr-145-cold-start-first-paint.md b/docs/github-tracking/pr-145-cold-start-first-paint.md deleted file mode 100644 index d74843df..00000000 --- a/docs/github-tracking/pr-145-cold-start-first-paint.md +++ /dev/null @@ -1,46 +0,0 @@ -## 摘要 - -Closes #98 -References #143 - -这条 PR 已经不再只是 tracking 入口,而是承接本轮 Windows startup lifecycle 主线修复的实际变更。 - -当前结论: - -- `visible / ready` 脱钩的主问题已收敛 -- 冷启动入口已从 backend immediate show 调整为 frontend-managed first show -- 最新人工回归反馈是:启动过程基本流畅,人眼很难再分辨出明显的一闪 -- `#143` 现在更适合作为已收敛的 first-paint 症状票引用,而不是继续作为主 closure 目标 - -## 修复 / 新增 / 改进 - -- 收口 Windows 启动阶段的 first-show ownership -- 在 `checking -> ready` 之间加入明确 gate,避免正式壳层在 startup transient phase 过早暴露 -- 增加冷启动测试脚本,默认优先拉最新 debug build,并区分: - - frontend-managed first show - - backend immediate show(仅调试用) -- 增加 startup lifecycle contract test,锁住 hidden-on-create 与 readiness gate 语义 - -## 兼容 - -- 不包含:主窗口圆角 / 外框 / titlebar frame 等纯视觉适配 -- 不包含:更细粒度 startup latency 优化 -- 对现有用户 / 本地环境 / 构建流程的影响:聚焦 startup lifecycle 主线,不扩张到 UI polish 线 - -## 测试计划 - -- [x] 命令:`node openless-all/app/scripts/windows-startup-lifecycle-contract.test.mjs` -- [x] 结果:通过 -- [x] 证据路径:本地命令输出 - -- [x] 命令:`npm run build` -- [x] 结果:通过 -- [x] 证据路径:本地命令输出 - -- [x] 命令:`powershell -ExecutionPolicy Bypass -File openless-all/app/scripts/windows-cold-start.ps1 -PreferDebug -ShowMain` -- [x] 结果:能够走 frontend-managed first show -- [x] 证据路径:本地命令输出 - -- [x] 命令:冷启动截图与人工主观回归 -- [x] 结果:首屏体验明显改善,当前主观反馈为“几乎没有问题,人眼很难分辨” -- [x] 证据路径:`artifacts-cold-start-screenshot.png`、`artifacts-cold-start-screenshot-8s.png`、`artifacts-cold-start-screenshot-front-managed.png` 与当前线程回归记录 diff --git a/docs/github-tracking/pr-154-windows-dual-hotkey.md b/docs/github-tracking/pr-154-windows-dual-hotkey.md deleted file mode 100644 index e0a5d12e..00000000 --- a/docs/github-tracking/pr-154-windows-dual-hotkey.md +++ /dev/null @@ -1,52 +0,0 @@ -## Summary - -Closes #154 - -This draft PR now serves as a near-closure tracking anchor for the Windows dual-hotkey-source problem. - -Current conclusion: - -- Windows dictation / QA lifecycle should be owned by the backend low-level hook -- renderer / window-local forwarding should not keep driving the same lifecycle -- future work here should focus on regression review, not on reopening the architecture without new evidence - -## Current Status - -- keep draft for now -- close to regression review -- not a parked native-strategy problem like #153 - -## Scope - -- source ownership -- lifecycle precedence -- mixed-source risk on Windows - -Out of scope: - -- helper-window drag semantics -- main window / radius / appearance work -- broad hotkey adapter rewrites without new evidence - -## Key Finding - -```text -One lifecycle needs one owner source. -On Windows, that owner source should be the backend low-level hook. -``` - -## Evidence - -- QA hotkey and follow-up flow remain healthy after ownership tightening -- no evidence from this repair track suggests the renderer-forward path should remain a co-owner - -## Next Step - -- use this PR as the place to summarize regression evidence -- only reopen architecture scope if new mixed-source failures appear - -## Validation Plan - -- [x] Manual verification: QA hotkey flow remains functional -- [x] Manual verification: lifecycle tightening did not break follow-up QA -- [ ] Regression review: confirm no new mixed-source phase drift evidence diff --git a/docs/github-tracking/pr-windows-terminal-clipboard-restore.md b/docs/github-tracking/pr-windows-terminal-clipboard-restore.md deleted file mode 100644 index aafb2ec5..00000000 --- a/docs/github-tracking/pr-windows-terminal-clipboard-restore.md +++ /dev/null @@ -1,82 +0,0 @@ -## 摘要 - -Closes #159 - -这个 PR 承接的是 Windows terminal insertion 链路的一次收敛修复: - -- 历史上 terminal 场景出现过“不能自动上屏、需要手动 `Ctrl+V`”的用户反馈 -- 本地测试也曾观察到一次“目标最终拿到旧剪贴板”的现象 -- 本轮排查确认了其中一处真实存在的底层风险:clipboard restore timing - -因此,这个 PR 的目标不是去声称“当前存在一个稳定复现的 terminal bug”,而是: - -- 修补一处已经被确认的 Windows insertion 时序风险 -- 把整条链路的回归覆盖补齐 -- 把最终结论收敛到可审阅、可维护的状态 - -## 修复 / 新增 / 改进 - -- Windows clipboard restore 从 `150ms` 提高到 `750ms` -- clipboard restore 改为后台线程执行,不阻塞插入返回 -- 新增 Windows clipboard timing smoke,用于验证慢消费者 race -- 新增完整生命周期自动化脚本,覆盖: - - `wt-cmd` - - `wt-powershell` - - `notepad` -- 稳定化自动化入口: - - 通过 WebView2 remote debugging 连接主页面 - - 通过 Tauri invoke 驱动 `start_dictation` / `stop_dictation` -- 新增 debug-only transcript override - - 仅用于桌面音频路由不稳定时继续覆盖真实 insertion 尾链 -- 调整目标读回方式: - - terminal 走 UIA 读取 `TermControl` - - notepad 走 UIA 直接读取文本 -- 更新调查文档与 tracking 文档 - -## 兼容 - -- 正常用户路径不依赖 debug transcript override -- debug transcript override 仅在 `debug_assertions` / test 构建下参与 -- Linux restore delay 保持原行为 -- 不涉及 UI/视觉顺手修改 -- 不涉及 QA hotkey / selection 主线逻辑修改 - -## 测试计划 - -- [x] `cargo fmt --all` -- [x] `cargo check --lib` -- [x] `python -m py_compile openless-all/app/scripts/windows-openless-lifecycle-e2e.py` -- [x] `windows-real-asr-insertion-smoke.ps1` 脚本解析通过 -- [x] 隔离时序实验: - - [x] 快消费者 + `150ms` - - [x] 慢消费者 + `150ms` - - [x] 慢消费者 + `750ms` -- [x] 完整生命周期自动化: - - [x] `wt-cmd` - - [x] `wt-powershell` - - [x] `notepad` -- [x] 证据路径: - - `docs/2026-05-02-windows-terminal-clipboard-restore-investigation.md` - - `docs/github-tracking/issue-windows-terminal-clipboard-restore.md` - -## 当前结论 - -- 历史上的 Windows terminal insertion 不稳定反馈是真实的 -- 本轮排查确认并修补了一处真实存在的 clipboard restore timing 风险 -- 稳定化完整生命周期自动化下: - - `wt-cmd` 通过 - - `wt-powershell` 通过 - - `notepad` 通过 -- 当前环境中,目标最终都拿到本次 `finalText`,未再出现旧 clipboard 上屏 - -因此,这个 PR 的技术定位应当是: - -- 针对历史不稳定现象的一次 hardening 修复 -- 外加完整的回归覆盖补强 - -## 剩余风险 - -- `750ms` 仍然是启发式保护,不是目标确认式握手 -- 如果未来再出现 terminal 现场问题,更可能是更窄的环境因子,而不是当前这条主链路已经明确存在的稳定故障 - -建议 PR 标题:`fix(windows): 延后剪贴板恢复并补齐插入回归覆盖` diff --git a/docs/issue-420-wayland-hotkey-research.md b/docs/issue-420-wayland-hotkey-research.md deleted file mode 100644 index 6f9c09de..00000000 --- a/docs/issue-420-wayland-hotkey-research.md +++ /dev/null @@ -1,401 +0,0 @@ -# Issue #420 调研笔记:Wayland 下全局快捷键不可用 - -> 状态:调研稿(未实施任何代码改动) -> 范围:仅评估方案;落地方案以第 7 节为推荐基线。 -> 日期:2026-05-13 - ---- - -## 1. 问题与现状 - -OpenLess Linux 端的全局热键监听走 `rdev::listen`,实现在 `openless-all/app/src-tauri/src/hotkey.rs:1183-1530`。代码在启动时检查 `XDG_SESSION_TYPE`,命中 `wayland` 直接 `Err("wayland_unsupported", "Wayland 暂不支持全局热键,请切到 X11 session 后再试")`(`hotkey.rs:1204-1208`)。 - -Issue #420 用户 aeoform 与另一位评论者在 Debian Wayland 上看到这条错误,明确建议: - -> "建议补充对应的脚本或者命令让用户去系统设置中配置快捷键即可" - -也就是:**不要求 OpenLess 自己抓全局按键**,**让桌面环境的快捷键设置去调用 OpenLess 的命令**。这是一个常见的 Linux 端规避模式,已经被同领域产品(Murmure 等)当成默认实践,详见第 3.2 节与 [Murmure docs](https://docs.murmure.app/configure-shortcuts-on-linux/)。 - -仓库现有支点: -- `tauri-plugin-single-instance = "2"` 已在 `Cargo.toml:24` 启用,并在 `lib.rs:73` 注册了回调(目前仅用于聚焦主窗口)。 -- 可用 IPC 命令:`start_dictation` / `stop_dictation` / `cancel_dictation`(`commands.rs:1099-1110`),QA panel 控制(`commands.rs:1324-1330`),以及完整 hotkey 配置 surface。 - ---- - -## 2. 为什么 Wayland 不允许传统全局热键 - -X11 的设计里任何客户端都能 grab 整个键盘或注册全局快捷键 — 这同时让 X11 成了「天然的键盘记录器平台」。Wayland 协议在 2008 年重新设计时把这条路直接关掉:**键盘事件只在 surface 获得焦点时才送达对应客户端**。 - -权威表述出自 Wayland Book seat/keyboard 章节:"the server sends `wl_keyboard.enter` when a surface receives keyboard focus, and `wl_keyboard.leave` when it's lost" — 协议层面没有任何「未聚焦也能读键」的接口([wayland-book.com](https://wayland-book.com/seat/keyboard.html))。 - -`pynput` / `rdev` / 任何依赖 X11 keyboard grab 的库在 Wayland 上「故意」失效,原因即此([Wayland Fragmentation](https://www.semicomplete.com/blog/xdotool-and-exploring-wayland-fragmentation/)、[Vocalinux issue #80](https://github.com/jatinkrmalik/vocalinux/issues/80))。 - -只要应用要在「自己窗口没聚焦」时收到按键,就必须走以下「半民间」方案之一: - -| 方案 | 取舍 | -|------|------| -| **evdev/uinput** 直接读 `/dev/input/event*` | 绕过 Wayland 协议,X11/Wayland/TTY 都能用;**需要把用户加入 `input` group 或 setuid**,安全模型差 | -| **libei + xdg-desktop-portal RemoteDesktop** | 用户每次启动都要授权;文档稀少;只在做合成器自动化时合理 | -| **xdg-desktop-portal GlobalShortcuts** | 走门户协商;标准化但合成器实现不齐(见 3.1) | -| **合成器私有协议** | 如 `hyprland-global-shortcuts-v1`;只在单一合成器有效 | - -来源:[Wayland Fragmentation: xdotool adventure](https://www.semicomplete.com/blog/xdotool-and-exploring-wayland-fragmentation/)、[Wayland - keyboard-shortcuts-inhibit-unstable-v1](https://wayland.app/protocols/keyboard-shortcuts-inhibit-unstable-v1)。 - ---- - -## 3. 可选方案(含适配范围、成熟度、维护代价) - -### 3.1 xdg-desktop-portal GlobalShortcuts - -**协议**:`org.freedesktop.portal.GlobalShortcuts`([规范](https://flatpak.github.io/xdg-desktop-portal/docs/doc-org.freedesktop.portal.GlobalShortcuts.html))。 - -应用调用 `CreateSession → BindShortcuts`,门户弹出一个对话框让用户**给每个 shortcut 选实际按键**。之后通过 `Activated` / `Deactivated` 信号通知应用。`ConfigureShortcuts` 方法在 v2 加入,允许应用打开门户的修改 UI。 - -合成器实现状态(截至 2026-05): - -| 合成器 | 状态 | 备注 | -|--------|------|------| -| **KDE Plasma 6** | 已稳定 | xdg-desktop-portal-kde 自 MR !80 起原生支持,2024-2025 持续迭代([!368 改进流程](https://invent.kde.org/plasma/xdg-desktop-portal-kde/-/merge_requests/368)、[!449 记住拒绝项](https://invent.kde.org/plasma/xdg-desktop-portal-kde/-/merge_requests/449)) | -| **GNOME (Mutter)** | **尚未原生落地** | issue [GNOME/xdg-desktop-portal-gnome#47](https://gitlab.gnome.org/GNOME/xdg-desktop-portal-gnome/-/issues/47) 仍开放;Murmure 文档明确写「Mutter's XDG GlobalShortcuts portal is unreliable (latency, dropped events),GNOME 默认走 CLI 模式」([Murmure docs](https://docs.murmure.app/configure-shortcuts-on-linux/)) | -| **Hyprland** | 已支持 | 通过 `xdg-desktop-portal-hyprland`;同时还有合成器私有的 [`hyprland-global-shortcuts-v1`](https://wayland.app/protocols/hyprland-global-shortcuts-v1) | -| **sway / wlroots** | 已支持 | 通过 `xdg-desktop-portal-wlr` | -| **COSMIC** | 部分 | 实现质量随版本变化,未独立验证 | - -**关键缺陷**(多个合成器共有): -- 用户感受到的「再设置一次」:应用只能给 *preferred trigger*,最终键位由门户对话框决定。Hyprland 上甚至要求用户手改 config 文件 — 等于「应用申请,用户在 hyprland.conf 里实际绑」([dec05eba.com 分析](https://dec05eba.com/2024/03/29/wayland-global-hotkeys-shortcut-is-mostly-useless/))。 -- **GNOME 是最大盲区**。Issue #420 用户用的就是 Debian — Debian 默认 GNOME。在 GNOME 上跑 GlobalShortcuts 等于压根不能用。 -- 没有 push-to-talk:门户在 key-press 上触发事件,但是否传 release 事件、是否 dedupe,依赖合成器(OpenLess 当前依赖 hotkey 的 edge 来做 Toggle,需要稳定的成对事件)。 - -**维护代价**:新增 `ashpd` crate + DBus 异步流(参见 3.2 例外、4 节示例)。每个发行版/合成器组合都得人肉测一遍,bug 报告会按合成器分裂。 - -**结论**:现阶段加进来对 GNOME 用户毫无帮助,且会引入合成器分裂的支持负担。 - -### 3.2 CLI + single-instance 转发(推荐) - -把 OpenLess 二进制本身做成可被外部调起的「无 GUI 触发器」: - -``` -桌面环境快捷键 → 启动 openless --toggle-dictation - ↓ - tauri-plugin-single-instance 拦截 - ↓ - 已运行的 OpenLess 主实例从回调拿到 argv - ↓ - 解析 --toggle-dictation → 调用 coordinator.start/stop_dictation -``` - -适配范围:**所有 Linux 桌面环境**(GNOME / KDE / Hyprland / sway / Cosmic / XFCE / i3 / ...),因为它只依赖「桌面环境能绑定一个 shell 命令」这个最低公共能力。X11 / Wayland 都通杀。 - -成熟度:极高。这是 Linux 桌面集成的最普世做法(OBS、Mumble、1Password、Albert 等都同时支持),也是 Murmure 在 GNOME 上的默认模式([Murmure docs](https://docs.murmure.app/configure-shortcuts-on-linux/))。`tauri-plugin-single-instance` 2.x 已经在仓库里,回调拿 argv 是其官方设计([官方文档](https://v2.tauri.app/plugin/single-instance/))。 - -维护代价:低。代码改动集中在三处: -1. `main.rs` 早期解析一次 argv(在 Tauri Builder 之前不退出,只记下 intent); -2. `lib.rs:73` 的 single-instance 回调里识别 argv 并发往 coordinator; -3. README / Settings 页加一段文档教用户怎么绑桌面快捷键。 - -唯一已知限制:**桌面 OS 级快捷键大多只在 key-press 触发**(按键即 fire,不传 key-release)。这天然兼容 Toggle 模式,但不支持「按住说话 / 松开收尾」的 push-to-talk。OpenLess 默认就是 Toggle(`CLAUDE.md` 写明:「Hotkey is toggle-only, not press-and-hold」),所以不冲突。这一限制在 Murmure 文档里也明确写出:「Push-to-talk limitation — OS shortcuts only fire on key press」。 - -### 3.3 evdev/uinput 直接读 - -绕过 Wayland,直接打开 `/dev/input/event*` 读 scancode。 - -适配范围:所有 Linux(包括 TTY)。 -权限要求:用户必须在 `input` group,或二进制 setuid。**两条都是发行版会警告的安全降级**。 -成熟度:技术上稳定(`evdev-shortcut` crate 存在),但用户经验差:要手动 `usermod -aG input $USER` 然后注销重登 — 普通用户不会做。 -不推荐用于面向消费者的 OpenLess。 - -来源:[evremap (Wez)](https://github.com/wez/evremap)、[evdev_shortcut crate](https://docs.rs/evdev-shortcut/latest/evdev_shortcut/)。 - -### 3.4 libei - -libei + RemoteDesktop portal 是新一代「让应用模拟键盘鼠标」的官方路径,但目前主要用例是远程桌面 / 自动化测试。每次启动都要 portal 弹授权框,且 GNOME 实现仍在迭代。文档稀少。 - -不推荐用作快捷键触发路径。来源:[Sending keystrokes to Wayland — Medium](https://medium.com/@python-javascript-php-html-css/sending-keyboard-strokes-to-wayland-linux-windows-solutions-and-challenges-9319cf424d06)。 - ---- - -## 4. tauri-plugin-single-instance 2.x 最小示例 - -当前发布版本:**2.4.2**(2026-05-02)。仓库已锁 `tauri-plugin-single-instance = "2"`([crates.io](https://crates.io/crates/tauri-plugin-single-instance))。 - -回调签名:`Fn(&AppHandle, Vec, String) + Send + Sync + 'static` — 三个参数是 `app handle / argv / cwd`。来源:[Tauri 官方文档](https://v2.tauri.app/plugin/single-instance/)。 - -OpenLess 现有调用点(`lib.rs:73-78`)目前忽略 `argv` / `cwd`: - -```rust -.plugin(tauri_plugin_single_instance::init(|app, _argv, _cwd| { - log::info!("[single-instance] another instance launched, focusing existing main window"); - show_main_window(app); -})) -``` - -改造后形态(示意,不在本调研里实施): - -```rust -.plugin(tauri_plugin_single_instance::init(|app, argv, _cwd| { - if let Some(intent) = parse_cli_intent(&argv) { - let coord: tauri::State> = app.state(); - dispatch_intent(coord.inner().clone(), intent); - return; // 不抢焦点 - } - show_main_window(app); // 无 intent → 退回原来的「聚焦主窗口」 -})) -``` - -注意点: -- 回调在 Tauri 主线程上执行,长任务必须 spawn 到 tokio runtime;OpenLess 的 coordinator 接口本来就异步。 -- 第二实例的进程**已经退出**,所以「不抢焦点」就是真不弹窗 — 体验上跟原生快捷键一致。 -- single-instance 插件必须**第一个**注册(早于 `tauri_plugin_shell` 等),这是官方文档强调的注意点。OpenLess 目前已经满足。 - ---- - -## 5. CLI 参数解析建议 - -**结论:用 `std::env::args()` 手写极简解析,不引入 clap。** - -理由: -- OpenLess 是 GUI app,CLI 入口只是「触发器」,参数集小(toggle-dictation / toggle-qa / cancel / show),没有子命令树。 -- 引入 `clap` 会让二进制体积涨一截(~200 KB),还要处理 `--help` 输出(GUI 程序输出帮助文本到 stderr,用户基本看不到,价值有限)。 -- 关键风险:**CLI 解析不能让 OpenLess panic 退出**。如果用户拖文件到 .desktop launcher 或者发行版包装传了奇怪参数,GUI 必须照常起来。`clap` 默认 `unwrap_or_else(|e| e.exit())` 会让进程退出,必须改成 `try_parse` + 静默忽略错误 — 那不如直接手写。 - -最小手写示意: - -```rust -// main.rs:在 Tauri Builder 之前 -#[derive(Clone, Copy)] -pub enum CliIntent { - ToggleDictation, - ToggleQa, - Cancel, - Show, -} - -fn parse_cli_intent>(args: &[S]) -> Option { - // 跳过 argv[0],逐项匹配;多余/未知参数静默忽略,绝不 panic - for arg in args.iter().skip(1) { - match arg.as_ref() { - "--toggle-dictation" => return Some(CliIntent::ToggleDictation), - "--toggle-qa" => return Some(CliIntent::ToggleQa), - "--cancel" => return Some(CliIntent::Cancel), - "--show" => return Some(CliIntent::Show), - _ => {} - } - } - None -} -``` - -把同样的 helper 在 `lib.rs:73` 的回调里复用 — 第一次进程启动(首实例)和 single-instance 转发走同一条解析路径。 - -`std::env::args()` 是 Rust 标准库,不引外部依赖。来源:[Rust by Example - std::env::args](https://doc.rust-lang.org/std/env/fn.args.html)、[Tauri CLI plugin(参考路径,本次不使用)](https://v2.tauri.app/plugin/cli/)。 - ---- - -## 6. 桌面环境配置自定义快捷键的步骤 - -OpenLess 在 Linux 安装后默认在 `$PATH` 里(或在 `.desktop` 旁边的 bin 目录)。下面假定二进制叫 `openless`。如果安装在非 PATH 路径(如 AppImage),文档里应同时写绝对路径。 - -### 6.1 GNOME (Wayland) - -**GUI 路径**([GNOME 官方帮助](https://help.gnome.org/gnome-help/keyboard-shortcuts-set.html)): - -1. Settings → Keyboard -2. Keyboard Shortcuts → View and Customize Shortcuts -3. Custom Shortcuts → Add Shortcut(+ 按钮) -4. Name: `OpenLess Dictate` -5. Command: `openless --toggle-dictation` -6. 点击 "Add Shortcut...",按下想绑的键(如 `Super+Y`) -7. 点 Add 保存 - -**CLI / 脚本化**([Programster's Blog](https://blog.programster.org/using-the-cli-to-set-custom-keyboard-shortcuts)、[Ubuntu Wiki - Keybindings](https://wiki.ubuntu.com/Keybindings))。注意 schema 是单数 `custom-keybinding`(不带 s),relocatable schema 需要带路径访问: - -```bash -KEYBIND_PATH="/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/openless0/" -gsettings set org.gnome.settings-daemon.plugins.media-keys custom-keybindings \ - "['$KEYBIND_PATH']" -gsettings set "org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:$KEYBIND_PATH" \ - name 'OpenLess Dictate' -gsettings set "org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:$KEYBIND_PATH" \ - command 'openless --toggle-dictation' -gsettings set "org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:$KEYBIND_PATH" \ - binding 'y' -``` - -### 6.2 KDE Plasma 6 (Wayland) - -**GUI 路径**([KDE Discuss - Custom Shortcuts](https://discuss.kde.org/t/adding-shortcuts-to-systemsettings/15276)): - -1. System Settings → Keyboard → Shortcuts -2. "+ Add New" → Command/URL -3. Trigger: 录想绑的键 -4. Action: `openless --toggle-dictation` -5. Apply - -**CLI / 脚本化**([commandmasters.com](https://commandmasters.com/commands/kwriteconfig5-linux/)、[KDE Discuss - kglobalaccel](https://discuss.kde.org/t/plasma-6-method-to-refresh-kglobalaccel-shortcuts/17995)): - -Plasma 6 把 shortcut 存在 `~/.config/kglobalshortcutsrc`,工具改名为 `kwriteconfig6`。完整的 custom-shortcut 脚本化在 KDE 上比 GNOME 复杂(涉及 D-Bus 注册 + kglobalaccel 重载): - -```bash -# 写入声明 -kwriteconfig6 --file kglobalshortcutsrc \ - --group 'openless.desktop' --key '_k_friendly_name' 'OpenLess' -kwriteconfig6 --file kglobalshortcutsrc \ - --group 'openless.desktop' --key 'dictate' 'Meta+Y,none,Toggle Dictation' - -# 让 kglobalaccel 重载(必需,否则要重登) -qdbus org.kde.kglobalaccel /kglobalaccel reloadConfig -``` - -> 实践建议:KDE 上推荐**直接引导用户走 GUI**,因为 kglobalshortcutsrc 的 group 命名必须匹配 `.desktop` 文件 + 需要 service 注册,脚本化容易出错。 - -### 6.3 Hyprland - -**GUI 路径**:无。Hyprland 配置就是文本文件,没有图形化绑定。 - -**配置文件**([Hyprland Wiki - Binds](https://wiki.hypr.land/Configuring/Basics/Binds/)、[ArchWiki - Hyprland](https://wiki.archlinux.org/title/Hyprland)): - -文件位置 `~/.config/hypr/hyprland.conf`。Hyprland 0.54 及更早用传统 hyprlang 语法: - -``` -bind = SUPER, Y, exec, openless --toggle-dictation -bind = SUPER SHIFT, Y, exec, openless --toggle-qa -``` - -Hyprland 0.55+ 推荐用 Lua(hyprlang 已 deprecated): - -```lua -hl.bind({"SUPER"}, "y", "exec", "openless --toggle-dictation") -``` - -reload:`hyprctl reload`(或重启 hyprland)。 - -### 6.4 sway - -**GUI 路径**:无(同 Hyprland,纯文本配置)。 - -**配置文件**([sway(5) - ArchWiki](https://man.archlinux.org/man/sway.5)、[swaywm/sway Wiki - Shortcut handling](https://github.com/swaywm/sway/wiki/Shortcut-handling)): - -文件位置 `~/.config/sway/config`。语法: - -``` -bindsym $mod+y exec openless --toggle-dictation -bindsym $mod+Shift+y exec openless --toggle-qa -``` - -reload:`swaymsg reload`。 - ---- - -## 7. 推荐的最小修复方案(落地到 OpenLess) - -### 7.1 本期实现(Beta 1.3.x):CLI + single-instance 转发 - -理由: -1. **覆盖范围最大**:所有桌面环境直接可用,包括 Issue #420 用户的 Debian + GNOME(GNOME 是 portal 路线的最大盲区)。 -2. **改动量最小**:复用现有 `tauri-plugin-single-instance` 与 `coordinator::Coordinator` 公共接口,零新依赖。 -3. **与 toggle-only 设计契合**:OpenLess 现在就是 toggle-only(`CLAUDE.md` 已明确),不存在 push-to-talk 限制冲突。 -4. **故障面小**:CLI 解析 → IPC 命令链路是同步可测的,没有 D-Bus / 合成器版本依赖。 -5. **行业先例**:Murmure(同类产品)在 GNOME 上默认就用这条路径。 - -### 7.2 改动清单(**不在本调研中实施,仅作落地参考**) - -| 文件 | 改动 | 行数估计 | -|------|------|---------| -| `openless-all/app/src-tauri/src/cli.rs`(新) | `CliIntent` 枚举 + `parse_cli_intent` 函数 + 单元测试 | ~60 | -| `openless-all/app/src-tauri/src/lib.rs:73` | single-instance 回调里解析 argv,调度 intent | ~15 | -| `openless-all/app/src-tauri/src/lib.rs`(main 函数早期) | 首次启动也跑一遍 `parse_cli_intent`,记下首意图,coordinator 准备好后再触发;或简单约定「首次启动忽略 CLI intent,只起 GUI」 | ~5 | -| `openless-all/app/src-tauri/src/hotkey.rs:1204-1208` | 移除「wayland 报错」分支;改成 **info 级日志** + 不安装 rdev 监听(X11 仍走 rdev,Wayland 静默退出 listener) | ~10 | -| `openless-all/app/src/i18n/{zh-CN,en}.ts` | 新增 "Linux Wayland 下推荐通过桌面快捷键调用 `openless --toggle-dictation`" 引导文案 | ~10 | -| `README.md` / `README.zh.md` / `USAGE.md` | 把第 6 节四个 DE 的配置示例写进去 | ~50 | - -### 7.3 CLI 参数命名 - -按题面建议保留: - -``` -openless --toggle-dictation # 等价于按一次主热键 -openless --toggle-qa # 等价于按一次 QA 热键 -openless --cancel # 等价于 Esc -openless --show # 唤起主窗口(已有 single-instance 行为) -``` - -约定:所有 flag 在 Wayland 上是「唯一进入点」;X11 上仍然支持原 rdev 热键,CLI 是补充而非替代(用户可以同时用)。 - -### 7.4 Wayland 检测下的行为变化 - -`hotkey.rs:1204-1208` 当前的 `wayland_unsupported` 错误**不应再向上传**。改为: - -- 检测到 Wayland → 不安装 rdev listener,记一行 INFO log; -- 前端在 Settings → 热键页显示一行提示(i18n):「检测到 Wayland session。请在系统设置中将 `openless --toggle-dictation` 绑到一个快捷键。点这里查看说明 →」; -- 链接打开 README 中对应章节,按 DE 列出 6.1-6.4 的步骤。 - -这样既消除了 Issue #420 的报错,又主动告诉用户下一步该做什么,符合用户原始建议「补充对应脚本或命令让用户去系统设置中配置」。 - -### 7.5 后续路径(**留给单独 issue,本期不做**) - -- **xdg-desktop-portal GlobalShortcuts 集成**:等 GNOME 落地 issue [#47](https://gitlab.gnome.org/GNOME/xdg-desktop-portal/issues/47) 后再评估。届时 KDE + Hyprland + sway + GNOME 都成熟,可作为 CLI 路径的「升级版」(应用内绑定,无需用户去 DE 设置)。引入 `ashpd` crate(参考 4 节代码骨架与 [ashpd demo](https://github.com/bilelmoussaoui/ashpd/blob/master/demo/client/src/portals/desktop/global_shortcuts.rs))。 - - 现在不做的另一个理由:CLI 方案不会被 portal 方案取代 — 两者可共存。Portal 方案先在 KDE 上灰度也来得及。 -- **`hyprland-global-shortcuts-v1` 原生协议**:单合成器优化,优先级最低。 -- **Push-to-talk 模式**:如果未来想支持「按住录音」,OS 级快捷键路径会卡住(DE 只发 key-press),到那时再评估 portal / libei。 - ---- - -## 8. 参考资料 - -**Wayland 协议与安全模型** -- [The Wayland Protocol — seat/keyboard](https://wayland-book.com/seat/keyboard.html) -- [Wayland - keyboard-shortcuts-inhibit-unstable-v1](https://wayland.app/protocols/keyboard-shortcuts-inhibit-unstable-v1) -- [Exploring the Fragmentation of Wayland (semicomplete.com)](https://www.semicomplete.com/blog/xdotool-and-exploring-wayland-fragmentation/) -- [Sending Keyboard Strokes to Wayland (Medium)](https://medium.com/@python-javascript-php-html-css/sending-keyboard-strokes-to-wayland-linux-windows-solutions-and-challenges-9319cf424d06) -- [tauri-apps/global-hotkey issue #28 — Wayland support](https://github.com/tauri-apps/global-hotkey/issues/28) -- [dec05eba.com — Wayland global hotkeys is mostly useless](https://dec05eba.com/2024/03/29/wayland-global-hotkeys-shortcut-is-mostly-useless/) - -**xdg-desktop-portal GlobalShortcuts** -- [GlobalShortcuts 规范(flatpak.github.io)](https://flatpak.github.io/xdg-desktop-portal/docs/doc-org.freedesktop.portal.GlobalShortcuts.html) -- [KDE Portal MR !80 — Implementation of GlobalShortcuts](https://invent.kde.org/plasma/xdg-desktop-portal-kde/-/merge_requests/80) -- [KDE Portal MR !368 — Improve workflow](https://invent.kde.org/plasma/xdg-desktop-portal-kde/-/merge_requests/368) -- [KDE Portal MR !449 — Remember denied shortcuts](https://invent.kde.org/plasma/xdg-desktop-portal-kde/-/merge_requests/449) -- [GNOME xdg-desktop-portal-gnome issue #47 — GlobalShortcuts feature request](https://gitlab.gnome.org/GNOME/xdg-desktop-portal-gnome/-/issues/47) -- [GNOME Discourse — Feature request: GlobalShortcuts portal](https://discourse.gnome.org/t/feature-request-globalshortcuts-portal/15343) - -**ashpd(Rust 门户客户端)** -- [ashpd crate (docs.rs)](https://docs.rs/ashpd/latest/ashpd/) -- [ashpd repo — global_shortcuts.rs (client/src)](https://github.com/bilelmoussaoui/ashpd/blob/master/client/src/desktop/global_shortcuts.rs) -- [ashpd repo — demo global_shortcuts.rs (端到端示例)](https://github.com/bilelmoussaoui/ashpd/blob/master/demo/client/src/portals/desktop/global_shortcuts.rs) -- [ASHPD Demo on Flathub](https://flathub.org/en/apps/com.belmoussaoui.ashpd.demo) - -**Tauri single-instance** -- [tauri-plugin-single-instance — 官方文档](https://v2.tauri.app/plugin/single-instance/) -- [tauri-plugin-single-instance — crates.io(最新 2.4.2,2026-05-02)](https://crates.io/crates/tauri-plugin-single-instance) -- [tauri-plugin-single-instance — docs.rs/latest](https://docs.rs/crate/tauri-plugin-single-instance/latest) -- [Tauri v2 — Calling Rust from Frontend](https://v2.tauri.app/develop/calling-rust/) - -**桌面环境快捷键配置** -- [GNOME 帮助 — Set keyboard shortcuts](https://help.gnome.org/gnome-help/keyboard-shortcuts-set.html) -- [Programster — Using the CLI to Set Custom Keyboard Shortcuts](https://blog.programster.org/using-the-cli-to-set-custom-keyboard-shortcuts) -- [Ubuntu Wiki — Keybindings](https://wiki.ubuntu.com/Keybindings) -- [KDE Discuss — Adding shortcuts to Systemsettings](https://discuss.kde.org/t/adding-shortcuts-to-systemsettings/15276) -- [KDE Discuss — kglobalaccel reload (Plasma 6)](https://discuss.kde.org/t/plasma-6-method-to-refresh-kglobalaccel-shortcuts/17995) -- [commandmasters — kwriteconfig5 / kwriteconfig6](https://commandmasters.com/commands/kwriteconfig5-linux/) -- [Hyprland Wiki — Configuring/Basics/Binds](https://wiki.hypr.land/Configuring/Basics/Binds/) -- [ArchWiki — Hyprland](https://wiki.archlinux.org/title/Hyprland) -- [Hyprland Global Shortcuts protocol v1](https://wayland.app/protocols/hyprland-global-shortcuts-v1) -- [sway(5) — ArchWiki man page](https://man.archlinux.org/man/sway.5) -- [swaywm/sway Wiki — Shortcut handling](https://github.com/swaywm/sway/wiki/Shortcut-handling) -- [Mark Stosberg — Sway keybindings tips](https://mark.stosberg.com/sway-keybindings/) - -**同类产品参考(Murmure — 同样是 STT 应用)** -- [Murmure docs — Configure shortcuts on Linux](https://docs.murmure.app/configure-shortcuts-on-linux/) -- [Murmure repo — Kieirra/murmure](https://github.com/Kieirra/murmure) - -**evdev / 替代方案** -- [evdev_shortcut crate](https://docs.rs/evdev-shortcut/latest/evdev_shortcut/) -- [wez/evremap — Linux/Wayland keyboard remapper](https://github.com/wez/evremap) -- [xwaykeyz — X11 + Wayland keymapper](https://github.com/RedBearAK/xwaykeyz) -- [Vocalinux issue #80 — Wayland support via evdev](https://github.com/jatinkrmalik/vocalinux/issues/80) - -**OpenLess 仓库锚点** -- 当前实现:`openless-all/app/src-tauri/src/hotkey.rs:1183-1530`(Wayland 报错在 `:1204-1208`) -- single-instance 回调:`openless-all/app/src-tauri/src/lib.rs:73-78` -- IPC commands:`openless-all/app/src-tauri/src/commands.rs:1099-1110`(dictation)、`:1324-1330`(QA panel) -- Cargo deps:`openless-all/app/src-tauri/Cargo.toml:24`(`tauri-plugin-single-instance = "2"`) diff --git a/docs/logic-review-2026-05-10.md b/docs/logic-review-2026-05-10.md deleted file mode 100644 index d8a95bac..00000000 --- a/docs/logic-review-2026-05-10.md +++ /dev/null @@ -1,159 +0,0 @@ -# OpenLess beta — Logic Review (commit 400097ad) - -Audit branch: `origin/beta` @ `400097ad` (= "Merge PR #391 fix/audit-async-hygiene"). -Sources: `openless-all/app/src-tauri/src/{lib.rs, coordinator.rs, coordinator/{dictation,qa,resources}.rs, coordinator_state.rs, hotkey.rs, audio_mute.rs}`, `openless-all/app/src/lib/{ipc.ts,types.ts}`. - -> Caller's premise correction: the prompt asserts "PR #389 (emit_capsule main thread) is pending merge but the same change is already cherry-picked-equivalent on the current code." This is **false** at `400097ad`. Commits `faf02ad4` and `84ee3d96` exist on a side branch but are NOT ancestors of `400097ad` (verified with `git merge-base --is-ancestor`). The audio-thread → AppKit/Win32 SIGTRAP risk that PR #389 was written to fix is therefore **still live on this beta**. See P4. - -## Summary - -| Path | Verdict | Issues found | -|------|---------|--------------| -| P1 startup | OK | All 6 listeners paired (start in `setup`/`Ready`, stop in `Exit`); tray watcher signaled. | -| P2 press/release | OK with 1 ⚠️ | Routing + dedup correct; `acquire_recording_mute` correctly awaited in dictation start. | -| P3 end-of-session | 1 🚩 | `cancel_session` during Processing leaves `focus_target` set until next `begin_session` overwrites it. PR #387 contract is incomplete for the Processing branch. | -| P4 capsule UI emit | 1 🚩 (CRITICAL) | `emit_capsule` calls `window.show/hide` + `show_capsule_window_no_activate` directly from the cpal audio callback at ~30 Hz on `400097ad`. PR #389 is **not** merged here. | -| P5 shutdown | 1 🚩 | `acquire_recording_mute` at QA path (`coordinator.rs:2313`) is missing `.await` — return value is a dropped Future. PR #391 hygiene fix is incomplete. Compiler emits `unused_must_use`. | - -Net: **3 real bugs (🚩)**, 1 cross-PR composition concern, plus minor smells. Two of the three are direct consequences of merge-incomplete state of PR #389/#391. The Processing-cancel `focus_target` leak is a code-path PR #387 missed. - -## Findings (per path) - -### P1 — Startup - -- OK `lib.rs:316-358` — `RunEvent::Exit` stops all 6 hotkey listeners + signals tray watcher; matches the 6 starts at `lib.rs:226` (dictation, in `setup`) and `lib.rs:320-325` (QA / combo / translation / switch_style / open_app, in `RunEvent::Ready`). -- OK `coordinator.rs:344-358, 371-373, 383-385, 395-397, 407-409, 1313-1347` — every `stop_*_listener` for `global-hotkey`-backed monitors marshals the `Drop` to `app.run_on_main_thread`, matching the issue #169 contract for Carbon `RemoveEventHotKey`. `take_combo_hotkey_on_main_thread` / `take_translation_hotkey_on_main_thread` / `take_action_hotkey_on_main_thread` are the helpers; `stop_qa_hotkey_listener` inlines the same pattern. -- OK `coordinator.rs:330-332` + `hotkey.rs:344-355` — dictation `stop_hotkey_listener` is a plain `inner.hotkey.lock().take()`, but the Drop chain is `HotkeyMonitor::drop` → `MacHotkeyAdapter::shutdown` → `CGEventTapEnable(false)` + `CFRunLoopStop(rl)`. PR #388 fix in place; the comment at `hotkey.rs:311-314` correctly notes both APIs are documented thread-safe, so the lack of main-thread marshalling here is intentional. -- ℹ️ `coordinator.rs:316-318` + `global_hotkey_runtime.rs:60-63` — `request_shutdown` is `#[allow(dead_code)]` and never set in production; supervisor loops poll it but nothing flips it (matches PR #392's "passive infrastructure" promise). -- ℹ️ `global_hotkey_runtime.rs:19, 41-55` — `GlobalHotKeyManager` lives in `OnceCell>` and is therefore never Dropped in production. On macOS this means the Carbon event handler is reaped only at process exit. Acceptable but worth noting if anyone tries to add hot-restart later. - -### P2 — Press/release - -- OK `coordinator/dictation.rs:11-28` — `handle_pressed_edge` swap-dedups via `inner.hotkey_trigger_held.swap(true, SeqCst)`; routing checks `panel_visible && !dictation_active` (PR #390 fix). The `dictation_active = !matches!(phase, SessionPhase::Idle)` snapshot is the correct guard: it lets a hotkey press during an in-flight dictation flow fall through to `handle_pressed`, even when the QA panel happens to be visible. -- OK `coordinator/dictation.rs:53-67` — `handle_released_edge` symmetric: `panel_visible && !dictation_active → return`. If dictation_active was true when pressed (so it routed to `handle_pressed`), released will also bypass the QA short-circuit and reach `handle_released`. No mismatched-edge leak. -- OK `coordinator/dictation.rs:30-51, 69-85` — Hold/Toggle phase matrix correct: `(Toggle, Idle) → begin`, `(Toggle, Listening) → end`, `(Toggle, Starting) → request_stop_during_starting`, `(Hold, Idle) → begin`, `(Hold, Listening released) → end`, `(Hold, Starting released) → request_stop_during_starting`. Other combinations no-op. -- OK `coordinator/dictation.rs:87-96` + `coordinator_state.rs:87-93` — `request_stop_during_starting_state` only flips `pending_stop` when phase is exactly `Starting`; `finish_starting_session_state` consumes the bit at the Listening transition (`coordinator_state.rs:118-124`) and triggers immediate `end_session` via `BeginOutcome::PendingStop` at `dictation.rs:587-590`. -- OK `coordinator/dictation.rs:451` — `acquire_recording_mute(inner, "dictation").await;` is properly awaited (PR #391 fix applied to dictation path). -- ⚠️ `coordinator/dictation.rs:418-447` (level_handler) — runs on the cpal audio callback thread and calls `emit_capsule` synchronously. See P4 for the concrete bug; flagged here because P2's recorder-start path is the producer. - -### P3 — End-of-session pipeline - -- OK `coordinator/dictation.rs:595-602` + `coordinator_state.rs:178-184` — `start_processing_if_listening` only transitions `Listening → Processing`; if phase is anything else (`Idle`, `Starting`, `Inserting`, already-Processing), `end_session` returns Ok(()) immediately. Guards against stale pending_stop or duplicate IPC. -- OK `coordinator/dictation.rs:607-620` — recorder + ASR are taken with session-id matching (`take_recorder_for_session` / `take_asr_for_session`), so a stale callback that lands after a session has been re-bumped won't pick up the wrong recorder. -- OK `coordinator/dictation.rs:984-1005` — atomic Inserting transition: same `state.lock()` checks `cancelled` and flips phase to `Inserting`. Once `Inserting`, `cancel_session` rejects (`coordinator_state.rs:155-159` — `Idle | Inserting` → `None`). This is the audit HIGH #2 contract. -- OK `coordinator/dictation.rs:1007-1031` — `paste_shortcut = prefs.paste_shortcut` flows into `inner.inserter.insert(&polished, restore_clipboard, paste_shortcut)` for non-Windows and into `insert_with_windows_ime_first(..., paste_shortcut, ime_target)` for Windows. PR #377 wiring confirmed; corresponding signature in `coordinator.rs:1673-1680, 1731-1745` and `insertion.rs:43-89`. -- OK `coordinator/dictation.rs:1122-1126` — happy-path end_session clears `state.focus_target = None` before scheduling capsule idle. - -- 🚩 `coordinator/dictation.rs:843-849` + `coordinator/dictation.rs:1153-1178` + `coordinator_state.rs:171-176` — **`focus_target` is not cleared when cancel hits during `Processing`.** - - `cancel_session` in the `Processing` branch (`dictation.rs:1171-1173`) deliberately does NOT call `finish_cancel_session_state`, leaving phase + focus_target as-is so `end_session` can finish unwinding. - - `end_session`'s "ASR-finished, cancelled" exit (`dictation.rs:845-849`) restores Windows IME, sets `phase = Idle`, returns Ok(()) — but never touches `focus_target`. - - PR #387 (`ce82fcd9`) was framed as "clear `focus_target` on cancel regardless of phase", but the only code path that gained the unconditional clear is `finish_cancel_session_state` at `coordinator_state.rs:172`, which the Processing branch skips. - - Concrete consequence: between cancel-mid-Processing and the next `begin_session`, the cached AX `focus_target` (a stale `usize` slot) is reachable by anyone reading `state.focus_target` (logs, debug dumps, future readers). It's overwritten by `begin_session_state` at `coordinator_state.rs:80`, so user-visible insertion uses the right value. Severity: minor leak / contract violation rather than user-visible breakage. Tests at `coordinator_state.rs:362-385` only validate the cancel happy paths via `finish_cancel_session_state`, so the regression slipped past PR #387's guard test. - -- ⚠️ `coordinator/dictation.rs:1175` — even when cancel fires during `Processing`, the user immediately sees `CapsuleState::Cancelled`, but `end_session` may still be inside the ASR await for several seconds. The phase is still `Processing` until `end_session` reaches the `state.cancelled` check, so a fast retry-press will be quietly dropped (`begin_session_state` requires `Idle`). Not a bug per se — matches the design comment at `dictation.rs:1169-1174` — but worth a UX note. - -- ⚠️ `coordinator/dictation.rs:1153-1163` — `cancel_session` swallows the result of `begin_cancel_session_state` for `Inserting`, only logging "cancel ignored". Acceptable, but there's no UI signal back to the user that their Esc didn't take. Minor. - -### P4 — Capsule UI emission - -- 🚩🚩 **`coordinator.rs:3684-3727`** — **`emit_capsule` does NOT marshal `window.show/hide` to the main thread on `400097ad`.** Verbatim from current source: - - ```rust - fn emit_capsule(...) { - ... - if let Some(window) = app.get_webview_window("capsule") { - ... - let visible = !matches!(state, CapsuleState::Idle); - maybe_position_capsule_bottom_center(inner, &window, payload.translation); - if show_capsule && visible { - if !show_capsule_window_no_activate(&app, &window) { - let _ = window.show(); - } - #[cfg(target_os = "macos")] - crate::restore_main_window_key_if_active(&app); - } else { - hide_capsule_window_if_present(); - let _ = window.hide(); - } - } - let _ = app.emit_to("capsule", "capsule:state", payload); - } - ``` - - This is the *pre-PR-#389* shape. PR #389's fix (`faf02ad4`, then `84ee3d96`) wraps the `if let Some(window)` block in `app.run_on_main_thread(move || { ... })`. Verified that neither commit is an ancestor of `400097ad`: - - ``` - $ git merge-base --is-ancestor faf02ad4 400097ad ; echo $? → 1 (NOT ancestor) - $ git merge-base --is-ancestor 84ee3d96 400097ad ; echo $? → 1 (NOT ancestor) - ``` - - Reproduction reasoning: `coordinator/dictation.rs:418-447` builds `level_handler` as `Arc` and hands it to `Recorder::start`. cpal calls it from the audio process callback thread; the handler then calls `emit_capsule(...)` (line 439-446). On macOS, `WebviewWindow::show()` / `hide()` and the `show_capsule_window_no_activate` (which calls `NSWindow.orderFrontRegardless`) hit AppKit assertions (`dispatch_assert_queue_fail` → SIGTRAP) when invoked off the main thread. The 33 ms throttle at `dictation.rs:417, 426-432` only limits frequency — every individual call is at the same thread-safety risk. - - The same risk applies to QA's level_handler at `coordinator.rs:2282-2309`, which also calls `emit_capsule` directly from the cpal callback (line 2301-2308). - - Severity: high. SIGTRAP would crash the app on long recordings; less catastrophic outcomes are stuttering audio (the audio callback misses its deadline waiting for AppKit) and `kAudioUnitErr_TooManyFramesToProcess`. PR #389 needs to land or be cherry-picked before this beta is shipped. - -- OK `coordinator.rs:3726` — `app.emit_to("capsule", "capsule:state", payload)` stays on the calling thread; Tauri's event bus is internally thread-safe. No change needed regardless of PR #389. -- OK `coordinator.rs:3739-3765` — `maybe_position_capsule_bottom_center` is the OS-level call inside the `if let Some(window)` block; it would be moved into the same `run_on_main_thread` closure once PR #389 lands. - -### P5 — App shutdown - -- OK `lib.rs:347-355` (RunEvent::Exit) — calls `stop_hotkey_listener` (Mac CGEventTap path, safe to invoke any thread per `hotkey.rs:344-355`), plus the 5 `global-hotkey`-backed `stop_*_listener`s that all marshal via `app.run_on_main_thread` (`coordinator.rs:344-358, 1313-1347`). -- OK `lib.rs:348` — `TRAY_MICROPHONE_WATCHER_STOPPING.store(true, Relaxed)` correctly signals the watcher loop spawned at `lib.rs:540-548`. -- ⚠️ `coordinator.rs:344-358, 1313-1347` — `app.run_on_main_thread` is fire-and-forget; the queued `Drop` may not run before the process exits. In practice this is fine (process exit reaps everything), but if Tauri's main-loop teardown beats the queued closure, Carbon `RemoveEventHotKey` is skipped. Same model as the pre-existing PR #169 fix for `qa_hotkey`, so flagging as an inherited limitation, not a regression. - -- 🚩 **`coordinator.rs:2313` — `acquire_recording_mute(inner, "qa");` is missing `.await`.** - - PR #391 (`6171df61`) made `acquire_recording_mute` `async fn` (`coordinator/resources.rs:122`) and updated the dictation call site to `.await` (`coordinator/dictation.rs:451`). The QA call site was missed. - - Effect: the function returns an `impl Future` that is dropped on the next line. `spawn_blocking` is never scheduled, so `mute.holders` doesn't increment, the system audio mute is never engaged for QA, and the `[audio-mute] acquired by qa` log is never written. The corresponding `release_recording_mute(inner, "qa")` calls (e.g. `resources.rs:194`, `coordinator.rs:2324`) decrement holders that were never incremented (early `return` at `resources.rs:174` because `holders == 0`). - - Compiler confirms it: `cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml` emits - - ``` - warning: unused implementer of `futures_util::Future` that must be used - --> src/coordinator.rs:2313:5 - 2313 | acquire_recording_mute(inner, "qa"); - | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - = note: futures do nothing unless you `.await` or poll them - ``` - - - User-visible consequence: when the user opted into "Mute system output during recording" and triggers QA via Option, system audio is NOT muted (e.g. a YouTube playback continues). Dictation behaves correctly. Fix: insert `.await` on the call. - -- OK `coordinator/resources.rs:184-188` — `release_recording_mute` falls back to synchronous `work()` when no tokio handle is present, so the recorder error monitor (a plain `std::thread::spawn`) can release safely. Drop of `AudioMuteGuard` shells out to `osascript` / `wpctl`, blocking on a std thread is OK. - -## Cross-PR composition risks - -1. **PR #391 + #389 incomplete merge.** The current `400097ad` contains PR #391 (which added the async hygiene the rest of the audit fixes assume) but is missing PR #389. Net result: the audio-thread → AppKit risk PR #389 fixes is still present *and* the QA codepath has a half-applied PR #391 (missing `.await` at `coordinator.rs:2313`). Both must land before this commit becomes a release candidate. -2. **PR #387 + Processing branch interaction.** PR #387 introduced `state.focus_target = None` inside `finish_cancel_session_state`, but `cancel_session` deliberately skips that helper for `Processing` (`dictation.rs:1171-1173`) so the scheduled `end_session` can drive its own teardown. `end_session`'s cancelled-after-ASR exit (`dictation.rs:845-849`) was not updated to clear `focus_target`. The contract "clear focus_target on cancel regardless of phase" is therefore violated for cancel-during-Processing. Fix can be either (a) clear `focus_target` in the cancel-after-ASR branch of `end_session`, or (b) move the clear into `cancel_session` even for Processing (does not interfere with `end_session`'s own writes). -3. **PR #390 + multi-bridge `hotkey_trigger_held`.** The dedup atomic is process-global. With both `hotkey` and `combo_hotkey` monitors running, the legacy modifier-only adapter and the custom-combo adapter share `inner.hotkey_trigger_held`. Currently exclusive (custom-combo only runs when trigger == Custom — see `coordinator.rs:413-426`), so no cross-contamination, but anything that allows them to coexist would corrupt the dedup. ℹ️ informational. -4. **PR #392 (passive flag) is dormant**, as the prompt indicates. No interaction risk; calling out only that supervisor loops gracefully ignore `shutdown=false` so future RunEvent::Exit hookup is safe to land in a follow-up PR. - -## Manual-verification checklist for the user - -After cherry-picking PR #389 + fixing the missing `.await` in P5, verify on a running build: - -**P4 — capsule main-thread (PR #389 confirmation)** -- [ ] macOS arm64 build, dev profile, run a 3-minute continuous toggle dictation. App must not crash with SIGTRAP / `dispatch_assert_queue_fail`. Tail `~/Library/Logs/OpenLess/openless.log` while recording. -- [ ] On macOS, capsule still appears once the first PCM frame is captured (50–200 ms after Recorder::start) and disappears 1.5 s after Done/Cancelled/Error. -- [ ] On Windows, no SendMessage deadlock against the GUI thread during recording start/stop (capsule transitions complete within ~50 ms of phase changes). - -**P5 — QA mute fix** -- [ ] Set `prefs.muteDuringRecording = true` (Settings → Recording). -- [ ] Play YouTube in Safari/Chrome. -- [ ] Open QA panel via `Cmd+Shift+;`. Press Right Option to start QA recording. Audio playback **must** mute. Log line `[audio-mute] acquired by qa; holders=1` must appear in `openless.log`. (Without the fix, no log line + audio keeps playing.) -- [ ] Release Right Option. Audio playback resumes; log line `[audio-mute] released by qa; holders=0` + `system output mute restored after recording` must appear. - -**P3 — focus_target on Processing-cancel (PR #387 completeness)** -- [ ] Start dictation, speak briefly, release hotkey to enter Processing. While ASR is awaiting result (within ~1 s window), press Esc to trigger `cancel_dictation`. End_session bails at `cancelled` check. -- [ ] Inspect debug logs / state dump (or add a temporary log at `dictation.rs:849`): `state.focus_target` should be `None`. With current `400097ad` it remains `Some(...)`. -- [ ] Start a fresh dictation in a different window. Insertion should still target the new window — this works today via `begin_session_state` overwrite (`coordinator_state.rs:80`), so the bug is silent until something else reads stale `focus_target` between the cancel and the next begin. - -**P2 — pressed/released routing (PR #390 confirmation)** -- [ ] Open QA panel (`Cmd+Shift+;`). Confirm Option starts QA recording. -- [ ] Close QA panel. Start a normal dictation (Option). While dictation is running, open QA panel via `Cmd+Shift+;` (panel becomes visible while dictation_active=true). Press and release Option once — dictation must end normally (insert text), QA must NOT capture this Option edge. -- [ ] Hold mode: Set HotkeyMode::Hold. Hold Option for 2 s, release. End_session must trigger on release (not on press). -- [ ] Toggle mode: Set HotkeyMode::Toggle. Tap Option twice rapidly during Starting phase (within the 50–200 ms cpal init window) — verify `request_stop_during_starting` queues then end_session fires on the Listening transition (search log for `applying pending_stop edge → end_session immediately`). - -**P1 — listener teardown sanity** -- [ ] Quit the app via tray menu. `openless.log` should show ordered `stop_*_listener` calls; no panic / SIGTRAP at exit. -- [ ] Relaunch and grant Accessibility (after prior reset). Confirm `[hotkey] CGEventTap 已启动` returns within 3 s; first hotkey press still works without app restart. - -If any of the P4/P5 checks fail, this beta is **not** build-quality. diff --git a/docs/qa-reasoning-roadmap.md b/docs/qa-reasoning-roadmap.md deleted file mode 100644 index 4b1f887e..00000000 --- a/docs/qa-reasoning-roadmap.md +++ /dev/null @@ -1,75 +0,0 @@ -# 划词追问:思考能力(Reasoning)路线图 - -> 创建于 2026-05-01。流式输出(v2.1)已完成,**思考能力(v2.2)暂未实施**——这份文档是后续迭代的设计稿。 -> -> 关联:issue #118 v2、PR #119、`openless-all/app/src-tauri/src/polish.rs`、`openless-all/app/src/pages/SelectionAsk.tsx`。 - -## 背景与决策 - -用户提出:"QA 应该让 LLM 进行思考后再回复,并且可以设置思考强度"。 - -讨论了 3 条路径: - -| 方案 | 实现 | 优 | 劣 | -|---|---|---|---| -| A | prompt-engineered(system prompt 加 `` 块要求) | 0 配置改动;现 model 即可 | 思考质量受小模型限制;不可控 | -| B | OpenAI 标准 `reasoning_effort: low/medium/high` 字段 | 标准化 | DeepSeek-v4-flash 不识别该字段 | -| **C** | **切换 reasoner 模型(deepseek-r1 / o1 / claude extended thinking)** | **真**思考;可视化推理过程 | 用户得多配一个 model;UI 复杂度高 | - -**结论**:选 C。A/B 在当前 provider 下等于无效。 - -## 实施分解 - -### 后端 - -1. **凭据存储**:`CredentialAccount` 加两条 - - `ArkReasonerModelId`(如 `deepseek-r1`、`doubao-seed-1.6-thinking`) - - 复用现有 `ArkApiKey` / `ArkEndpoint`(同一 provider 不同 model) - -2. **prefs**:`Preferences` 加字段 - - `qa_reasoning_effort: ReasoningEffort` 枚举 `Off | Low | Medium | High` - - 默认 `Off`(与现行为一致) - -3. **`answer_chat_streaming` 重载**:根据 effort 决定走 chat 还是 reasoner endpoint - - `Off`:走 v2.1 现路径(chat 模型 + stream) - - `Low/Medium/High`:走 reasoner 模型;强度通过 system prompt hint 调("简短思考即可" / "详细思考" / "深度推理多角度") - - SSE 解析时同时收 `delta.content` + `delta.reasoning_content`,两者通过不同事件 emit: - - `qa:state {kind:"reasoning_delta", chunk}` - - `qa:state {kind:"answer_delta", chunk}` (已存在) - -4. **answer_chat 拼装最终 message** 时,`reasoning_content` 不写入 `messages` 数组(只显示用,不进上下文)。多轮提问只把最终答案带回上下文。 - -### 前端 - -1. **SelectionAsk.tsx** 新增配置块: - - 「思考强度」下拉:关闭 / 浅 / 中 / 深 - - 「思考模型」输入框(model id;默认 `deepseek-r1`) - - i18n:zh-CN / en - -2. **QaPanel.tsx** 新增「思考过程」可折叠区块: - - 在 user 气泡下方、最终 assistant 气泡上方 - - 默认折叠,标题 `思考中…` / `思考过程(X 字)`,点击展开 - - 流式期间:实时拼接 `reasoning_delta`,气泡有打字 caret - - 答案完成:折叠收起;用户随时可点开看推理 - -3. **types.ts**:`QaStateKind` 加 `'reasoning_delta'`;payload 加 `reasoning_chunk?: string` - -### 边界与风险 - -- **Provider 兼容性**:火山 Ark 的 deepseek-r1 / doubao-thinking 都返回 `reasoning_content`;OpenAI o1 通过 thinking blocks(不是 reasoning_content),需要单独 adapter -- **Token 成本**:reasoner 模型 token 价格高 5-10x;用户开「深度」就是真烧钱,UI 应该有提示 -- **延迟**:reasoner 首 token 可能 > 5s(思考阶段无 content 输出)。要在 UI 上区分「思考中」(reasoning streaming)vs「答题中」(content streaming),避免用户以为卡了 - -## 工作量估算 - -- 后端 reasoner 通路 + SSE 双流解析:~2h -- 前端折叠思考区块 + 打字 caret + 状态切换:~1.5h -- prefs / SelectionAsk 配置 UI + i18n:~0.5h -- 端到端测试(三档强度 × 单/多轮 × 错误回退):~1h -- **总计**:~5h - -## 实施先决条件 - -1. 用户配置好一个 reasoner model(deepseek-r1 / doubao-thinking-pro 等) -2. 后端凭据 vault 写入对应 model id -3. v2.1 流式输出已稳定(已完成 ✅) diff --git a/docs/style-pack-marketplace.md b/docs/style-pack-marketplace.md deleted file mode 100644 index eba69293..00000000 --- a/docs/style-pack-marketplace.md +++ /dev/null @@ -1,299 +0,0 @@ -# Style Pack Marketplace — 规划文档 - -**状态**:规划中(API 已预留 stub,未实装) -**起草日期**:2026-05-14 -**owner**:待定 - -## 1. 目标 - -把现在「ZIP 包本地导入 / 导出」的体验扩展成一个公开的风格包市场: - -- 用户可以把自己调好的风格包**上传**到云端,附带名称、描述、作者署名、标签、效果示例 -- 其他用户可以**浏览 / 搜索 / 下载**别人的风格包,一键安装到本地 -- 后期支持**版本升级提醒**、**收藏 / 评分**等基础社交属性 - -非目标(v1 不做): -- 付费 / 抽成 -- 风格包内嵌外部 prompt 注入 / 跨域 fetch(安全考虑,风格包始终是纯文本 prompt) -- 多人协作编辑 / fork - -## 2. 架构概览 - -``` -┌──────────────────┐ HTTPS ┌─────────────────────┐ -│ OpenLess client │ ◄──────────────────► │ marketplace API │ -│ (Tauri 2) │ JSON over TLS │ (TBD: Cloudflare │ -│ │ │ Workers / D1 / │ -│ Rust IPC → │ │ R2 for blobs) │ -│ reqwest client │ │ │ -└──────────────────┘ └─────────────────────┘ - │ │ - │ local cache (~/Library/Application │ - │ Support/OpenLess/market_cache/) │ - ▼ ▼ - StylePackStore Postgres / D1 - (existing local listings + R2 blobs - persistence layer) -``` - -**关键约束**: -- 客户端只能上传 / 下载 ZIP **bundle**(不直接传 JSON),保持跟现有 ZIP import/export 同构 -- 服务端 ZIP 验证:解压后必须能反序列化成 `StylePack`、`prompt.chars().count() <= 50_000`、没有可执行附件 -- 风格包 ID 上传后由服务端分配(`{author_slug}-{name_slug}-{version}`),跟本地 ID 解耦 -- 客户端始终拿 ZIP 走现有 `import_style_pack_from_zip` 路径入库 —— 不另开一条「从市场直接写 Pack」的代码路径,避免双入口 - -## 3. HTTP API 规约 - -Base URL(待定):`https://api.openless.app/v1/marketplace/` - -所有响应统一信封: -```json -{ - "ok": true, - "data": | null, - "error": null | { "code": "ERR_XXX", "message": "..." } -} -``` - -### 3.1 GET `/packs` — 列表 / 搜索 - -Query: -| 参数 | 类型 | 默认 | 说明 | -|---|---|---|---| -| `q` | string | `""` | 关键词(名称 / 描述 / 标签) | -| `tag` | string | `""` | 单标签筛选 | -| `sort` | `recent` \| `popular` \| `name` | `recent` | 排序 | -| `cursor` | string | `null` | 分页游标 | -| `limit` | int (1-100) | `20` | 每页条数 | - -Response data: -```typescript -{ - packs: MarketPackListing[]; - next_cursor: string | null; -} -``` - -`MarketPackListing`: -```typescript -{ - id: string; // server-assigned, e.g. "alice-formal-v2.1" - name: string; - description: string; - author: string; - version: string; // semver - tags: string[]; - base_mode: "raw" | "light" | "structured" | "professional"; - recommended_model: string | null; - compatible_app_version: string | null; - downloads: number; - rating_avg: number | null; - rating_count: number; - updated_at: string; // ISO8601 - zip_size_bytes: number; - zip_sha256: string; // 客户端下载后校验 -} -``` - -### 3.2 GET `/packs/{id}` — 详情 - -Response data:`MarketPackListing` + 额外字段: -```typescript -{ - ...listing, - examples: StylePackExample[]; // 解压 ZIP 前的预览 - changelog: string | null; - homepage_url: string | null; -} -``` - -### 3.3 GET `/packs/{id}/download` — 下载 ZIP - -Response:`application/zip` 二进制流,带 `X-Pack-SHA256` header 用于校验。 - -服务端通过 redirect 直接指向 R2 / S3 预签 URL,避免代理流量。 - -### 3.4 POST `/packs` — 上传(需鉴权) - -Headers:`Authorization: Bearer ` -Body:`multipart/form-data` with field `pack=@xxx.zip` - -Response data:`MarketPackListing`(含新分配 id) - -错误码: -- `ERR_INVALID_ZIP` — ZIP 解压失败 / 不是合法 StylePack JSON -- `ERR_PROMPT_TOO_LARGE` — prompt 字数超 50k -- `ERR_DUPLICATE_VERSION` — 同 author+name+version 已存在 -- `ERR_RATE_LIMITED` — 触发限频 - -### 3.5 DELETE `/packs/{id}` — 撤回(需鉴权 + 必须是上传者) - -### 3.6 POST `/packs/{id}/rate` — 评分(需鉴权) - -Body:`{ score: 1..5, comment?: string }` - -## 4. IPC 契约(Rust ↔ TS) - -在 `src-tauri/src/commands.rs` 新增以下 stub(暂返回 `Err("not implemented yet")`,等服务端落地后实装): - -```rust -// 列表 / 搜索 -#[tauri::command] -pub async fn market_list_packs( - query: Option, - tag: Option, - sort: Option, - cursor: Option, - limit: Option, -) -> Result; - -// 详情 -#[tauri::command] -pub async fn market_get_pack(id: String) -> Result; - -// 下载 + 自动调用现有的 import_style_pack_from_zip 入库 -#[tauri::command] -pub async fn market_download_pack( - coord: CoordinatorState<'_>, - app: AppHandle, - id: String, -) -> Result; - -// 上传(dirty 字段 = 已编辑、未保存) -#[tauri::command] -pub async fn market_upload_pack( - coord: CoordinatorState<'_>, - pack_id: String, - api_key: String, -) -> Result; - -// 撤回 -#[tauri::command] -pub async fn market_delete_pack(id: String, api_key: String) -> Result<(), String>; - -// 评分 -#[tauri::command] -pub async fn market_rate_pack( - id: String, - api_key: String, - score: u8, - comment: Option, -) -> Result<(), String>; -``` - -DTO(在 `types.rs` 新增): -```rust -#[derive(Debug, Serialize, Deserialize, Clone)] -pub struct MarketPackListing { - pub id: String, - pub name: String, - pub description: String, - pub author: String, - pub version: String, - pub tags: Vec, - pub base_mode: PolishMode, - pub recommended_model: Option, - pub compatible_app_version: Option, - pub downloads: u64, - pub rating_avg: Option, - pub rating_count: u32, - pub updated_at: String, - pub zip_size_bytes: u64, - pub zip_sha256: String, -} - -#[derive(Debug, Serialize, Deserialize, Clone)] -pub struct MarketPackDetail { - #[serde(flatten)] - pub listing: MarketPackListing, - pub examples: Vec, - pub changelog: Option, - pub homepage_url: Option, -} - -#[derive(Debug, Serialize, Deserialize, Clone)] -pub struct MarketListResponse { - pub packs: Vec, - pub next_cursor: Option, -} -``` - -TS wrappers(`src/lib/ipc.ts`): -```typescript -export interface MarketPackListing { /* same shape */ } -export interface MarketPackDetail extends MarketPackListing { /* + examples, changelog, homepage_url */ } -export interface MarketListResponse { packs: MarketPackListing[]; next_cursor: string | null; } - -export function marketListPacks(opts: { - query?: string; tag?: string; sort?: 'recent' | 'popular' | 'name'; - cursor?: string; limit?: number; -}): Promise; -export function marketGetPack(id: string): Promise; -export function marketDownloadPack(id: string): Promise; -export function marketUploadPack(packId: string, apiKey: string): Promise; -export function marketDeletePack(id: string, apiKey: string): Promise; -export function marketRatePack(id: string, apiKey: string, score: number, comment?: string): Promise; -``` - -## 5. 鉴权模型 - -**v1 简化方案**: -- 用户在设置页输入个人 API key(服务端发放) -- API key 存到 OS Keychain,账户名 `com.openless.app.market_api_key` -- 客户端在 Header 加 `Authorization: Bearer ` -- 服务端校验 + 限频(每小时 60 次写、600 次读) - -**v2 升级路径**(暂不做): -- OAuth via GitHub / Google -- 上传时自动签名 ZIP,下载端校验签名 - -## 6. 缓存与版本检查 - -本地缓存目录:`/market_cache/` -- `listings.json` — 上次拉的 listings(带 ETag) -- `packs/{id}.zip` — 已下载的 ZIP(按需保留,30 天自动清理) - -版本升级提示: -- 启动时(带 dev-cap 24h 节流)调用 `/packs?ids=<已安装的 market_id...>` 拉对比 -- 本地包记录 `installed_market_id` 和 `installed_market_version` 字段,新建 `StylePack` 时填,本地从 ZIP 安装也填 -- 发现新版本 → 在 Style 页该包卡片角标显示 `New version: 2.3.0 →` - -## 7. 客户端 UI 入口(v1 不做,先留位) - -- Style 页头部加一个 tab:`本地 / 市场` -- 市场页:搜索栏 + tag 过滤 + 卡片列表 + 详情抽屉 -- 上传:编辑某个本地包时,"导出 ZIP" 按钮旁边出现 "上传到市场"(需要先在设置里填 API key) - -## 8. 安全 / 滥用对策 - -- ZIP 解压走 streaming,限制最大解压后大小 5 MB -- prompt 字段过滤明显的 prompt injection / 越狱(关键词预扫描 + 异步内容审核) -- 每用户每天上传上限 10 包,单包大小 ≤ 2 MB -- 上传后挂 24h 公开延迟(防恶意刷榜) - -## 9. 实装 TODO(按优先级) - -- [ ] 服务端选型(CF Workers + D1 + R2 vs Supabase vs 自托管 FastAPI) -- [ ] 服务端实装 + 部署环境(dev / staging / prod) -- [ ] 客户端 `types.rs` 加 DTO -- [ ] `commands.rs` 加 6 个 stub(**已完成**,返回 `not implemented yet`) -- [ ] `lib/ipc.ts` 加 wrapper(**已完成**) -- [ ] 实装 `market_download_pack`(先做单条路径打通:URL → 下载 → 走现有 import_style_pack_from_zip) -- [ ] 加凭据存储(Keychain 复用现有 `CredentialsVault`) -- [ ] UI:本地 / 市场 tab -- [ ] UI:搜索 + 卡片 -- [ ] UI:详情面板 -- [ ] UI:上传流程 -- [ ] 升级提醒 badge -- [ ] 缓存清理 + ETag - -## 10. 决策 / 风险记录 - -| 项 | 决策 | Why | -|---|---|---| -| ZIP 而非 JSON 上传 | 用 ZIP | 跟现有 import/export 同构;prompt 长文 + examples 用 ZIP 包压缩 | -| 服务端分配 ID | 是 | 防本地 ID 碰撞、用户重命名包不影响订阅 | -| 上传立刻可见 vs 审核 | 24h 公开延迟 | 防刷榜 + 给审核留空间 | -| API key vs OAuth | 先 API key | 简化 v1;登录态可 v2 升级 | -| 客户端缓存策略 | listings ETag + 已下载 ZIP 30 天 | 平衡流量和体验 | -| 国际化 / 跨境 | API 全英文 + 客户端 i18n | 服务端不存翻译,名称/描述支持任意 UTF-8 | diff --git a/docs/superpowers/plans/2026-05-01-windows-temporary-tsf-ime.md b/docs/superpowers/plans/2026-05-01-windows-temporary-tsf-ime.md deleted file mode 100644 index 582aa827..00000000 --- a/docs/superpowers/plans/2026-05-01-windows-temporary-tsf-ime.md +++ /dev/null @@ -1,2191 +0,0 @@ -# Windows Temporary TSF IME Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Add a Windows-only TSF input-method backend that temporarily activates OpenLess during a voice session, commits the final dictated text through TSF, and restores the user's previous input method. - -**Architecture:** Keep the existing Tauri/Rust app as the only owner of hotkeys, recording, ASR, polish, UI, history, and fallback insertion. Add a small Windows TSF COM DLL that registers as an input processor and accepts one active `SubmitText` request from the app over a local named pipe. Add a Rust Windows IME controller that records/restores the active input profile and falls back to the current `WM_PASTE` path whenever TSF activation or commit cannot complete. - -**Tech Stack:** Rust 2021, Tauri 2, `windows` crate Win32/TSF bindings, Tokio named pipes on Windows, C++17 Windows SDK COM/TSF DLL, PowerShell registration scripts, React/TypeScript settings surface. - ---- - -## File Structure - -- Create: `openless-all/app/src-tauri/src/windows_ime_protocol.rs` - - Shared Rust message types for app-side JSONL IPC. - - Pure tests for serialization and stale session rejection. -- Create: `openless-all/app/src-tauri/src/windows_ime_profile.rs` - - Windows-only TSF profile snapshot, OpenLess profile activation, and restoration. - - Non-Windows stub so cross-platform builds keep compiling. -- Create: `openless-all/app/src-tauri/src/windows_ime_ipc.rs` - - Windows-only named-pipe server that tracks the most recent OpenLess IME client and submits text with a timeout. - - Non-Windows stub returning `Unavailable`. -- Create: `openless-all/app/src-tauri/src/windows_ime_session.rs` - - Session guard that combines profile switching, IPC submit, fallback routing, and restoration. -- Modify: `openless-all/app/src-tauri/src/insertion.rs` - - Keep existing clipboard/`WM_PASTE` insertion as the Windows fallback and expose a clearly named fallback method. -- Modify: `openless-all/app/src-tauri/src/coordinator.rs` - - Prepare Windows IME session on voice-session start. - - Submit through TSF first on Windows, then fallback. - - Restore input profile on success, failure, and cancellation. -- Modify: `openless-all/app/src-tauri/src/lib.rs` - - Register the new Rust modules and Tauri commands. -- Modify: `openless-all/app/src-tauri/src/commands.rs` - - Expose Windows IME install/status commands. -- Modify: `openless-all/app/src-tauri/src/types.rs` - - Add Windows IME status value types for IPC to the frontend. -- Modify: `openless-all/app/src-tauri/Cargo.toml` - - Add Windows API feature gates required for COM, TSF, named-pipe helpers, registry, and process/thread lookup. -- Create: `openless-all/app/windows-ime/OpenLessIme.sln` -- Create: `openless-all/app/windows-ime/OpenLessIme.vcxproj` -- Create: `openless-all/app/windows-ime/src/guids.h` -- Create: `openless-all/app/windows-ime/src/dllmain.cpp` -- Create: `openless-all/app/windows-ime/src/class_factory.h` -- Create: `openless-all/app/windows-ime/src/class_factory.cpp` -- Create: `openless-all/app/windows-ime/src/text_service.h` -- Create: `openless-all/app/windows-ime/src/text_service.cpp` -- Create: `openless-all/app/windows-ime/src/edit_session.h` -- Create: `openless-all/app/windows-ime/src/edit_session.cpp` -- Create: `openless-all/app/windows-ime/src/ipc_client.h` -- Create: `openless-all/app/windows-ime/src/ipc_client.cpp` -- Create: `openless-all/app/windows-ime/src/registry.h` -- Create: `openless-all/app/windows-ime/src/registry.cpp` -- Create: `openless-all/app/windows-ime/src/resource.rc` - - Minimal C++ TSF text service DLL. -- Create: `openless-all/app/scripts/windows-ime-register.ps1` -- Create: `openless-all/app/scripts/windows-ime-unregister.ps1` -- Create: `openless-all/app/scripts/windows-ime-build.ps1` - - Build, register, and unregister scripts for the TSF DLL. -- Modify: `openless-all/app/scripts/windows-preflight.ps1` - - Check MSBuild and Windows SDK when TSF IME work is requested. -- Modify: `openless-all/app/src/lib/types.ts` -- Modify: `openless-all/app/src/lib/ipc.ts` -- Modify: `openless-all/app/src/i18n/zh-CN.ts` -- Modify: `openless-all/app/src/i18n/en.ts` -- Modify: `openless-all/app/src/pages/Settings.tsx` - - Windows-only TSF IME status and actions. - -Use these fixed identifiers in every Rust, C++, and script location: - -```text -OpenLess TSF text service CLSID: {6B9F3F4F-5EE7-42D6-9C61-9F80B03A5D7D} -OpenLess TSF profile GUID: {9B5F5E04-23F6-47DA-9A26-D221F6C3F02E} -OpenLess TSF category GUID: GUID_TFCAT_TIP_KEYBOARD -OpenLess TSF language id: 0x0804 -OpenLess named pipe: \\.\pipe\OpenLessImeSubmit -OpenLess protocol version: 1 -``` - ---- - -### Task 1: Shared IME IPC Protocol - -**Files:** -- Create: `openless-all/app/src-tauri/src/windows_ime_protocol.rs` -- Modify: `openless-all/app/src-tauri/src/lib.rs` - -- [ ] **Step 1: Write failing protocol serialization tests** - -Add this test module to the new file before adding production types: - -```rust -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn submit_text_roundtrips_as_camel_case_json() { - let message = ImePipeMessage::SubmitText { - protocol_version: OPENLESS_IME_PROTOCOL_VERSION, - session_id: "session-1".to_string(), - text: "你好 OpenLess".to_string(), - created_at: "2026-05-01T12:00:00Z".to_string(), - }; - - let json = encode_message(&message).expect("encode"); - assert!(json.contains("\"submitText\"")); - assert!(json.ends_with('\n')); - - let decoded = decode_message(json.trim_end()).expect("decode"); - assert_eq!(decoded, message); - } - - #[test] - fn stale_submit_result_is_rejected() { - let result = ImePipeMessage::SubmitResult { - protocol_version: OPENLESS_IME_PROTOCOL_VERSION, - session_id: "old-session".to_string(), - status: ImeSubmitStatus::Committed, - error_code: None, - }; - - assert!(is_result_for_pending_session(&result, "current-session").is_err()); - assert!(is_result_for_pending_session(&result, "old-session").is_ok()); - } -} -``` - -- [ ] **Step 2: Run the test and verify it fails because types are missing** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml windows_ime_protocol --lib -``` - -Expected: compile fails with missing `ImePipeMessage`, `OPENLESS_IME_PROTOCOL_VERSION`, `encode_message`, `decode_message`, `ImeSubmitStatus`, and `is_result_for_pending_session`. - -- [ ] **Step 3: Add the protocol implementation** - -Put this implementation above the test module: - -```rust -use serde::{Deserialize, Serialize}; - -pub const OPENLESS_IME_PROTOCOL_VERSION: u32 = 1; -pub const OPENLESS_IME_PIPE_NAME: &str = r"\\.\pipe\OpenLessImeSubmit"; - -#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] -#[serde(tag = "type", rename_all = "camelCase")] -pub enum ImePipeMessage { - ClientReady { - protocol_version: u32, - client_id: String, - process_id: u32, - thread_id: u32, - }, - SubmitText { - protocol_version: u32, - session_id: String, - text: String, - created_at: String, - }, - SubmitResult { - protocol_version: u32, - session_id: String, - status: ImeSubmitStatus, - error_code: Option, - }, - CancelSession { - protocol_version: u32, - session_id: String, - }, - Ping { - protocol_version: u32, - }, -} - -#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)] -#[serde(rename_all = "camelCase")] -pub enum ImeSubmitStatus { - Committed, - Rejected, - Failed, -} - -pub fn encode_message(message: &ImePipeMessage) -> Result { - let mut line = serde_json::to_string(message)?; - line.push('\n'); - Ok(line) -} - -pub fn decode_message(line: &str) -> Result { - serde_json::from_str(line) -} - -pub fn is_result_for_pending_session( - message: &ImePipeMessage, - pending_session_id: &str, -) -> Result<(), &'static str> { - match message { - ImePipeMessage::SubmitResult { session_id, .. } if session_id == pending_session_id => Ok(()), - ImePipeMessage::SubmitResult { .. } => Err("submit result belongs to a different session"), - _ => Err("message is not a submit result"), - } -} -``` - -- [ ] **Step 4: Register the module** - -Add this to `openless-all/app/src-tauri/src/lib.rs` beside the other `mod` declarations: - -```rust -mod windows_ime_protocol; -``` - -- [ ] **Step 5: Run the protocol tests** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml windows_ime_protocol --lib -``` - -Expected: both protocol tests pass. - -- [ ] **Step 6: Commit** - -```powershell -git add -- openless-all/app/src-tauri/src/windows_ime_protocol.rs openless-all/app/src-tauri/src/lib.rs -git commit -m "feat: add Windows IME IPC protocol" -``` - ---- - -### Task 2: Profile Snapshot State Machine - -**Files:** -- Create: `openless-all/app/src-tauri/src/windows_ime_profile.rs` -- Modify: `openless-all/app/src-tauri/src/lib.rs` - -- [ ] **Step 1: Write failing pure state tests** - -Create `openless-all/app/src-tauri/src/windows_ime_profile.rs` with these tests first: - -```rust -#[cfg(test)] -mod tests { - use super::*; - - fn text_service_snapshot() -> ImeProfileSnapshot { - ImeProfileSnapshot { - kind: ImeProfileKind::TextService, - lang_id: 0x0804, - clsid: Some("{11111111-1111-1111-1111-111111111111}".to_string()), - profile_guid: Some("{22222222-2222-2222-2222-222222222222}".to_string()), - hkl: None, - } - } - - #[test] - fn restore_is_required_when_openless_is_active_and_snapshot_exists() { - assert_eq!( - restore_decision(Some(&text_service_snapshot()), true), - ProfileRestoreDecision::RestoreSavedProfile - ); - } - - #[test] - fn restore_is_skipped_when_snapshot_is_missing() { - assert_eq!( - restore_decision(None, true), - ProfileRestoreDecision::KeepCurrentProfile - ); - } - - #[test] - fn restore_is_skipped_when_user_already_changed_away_from_openless() { - assert_eq!( - restore_decision(Some(&text_service_snapshot()), false), - ProfileRestoreDecision::KeepCurrentProfile - ); - } -} -``` - -- [ ] **Step 2: Run the test and verify it fails because profile types are missing** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml windows_ime_profile --lib -``` - -Expected: compile fails with missing snapshot and decision types. - -- [ ] **Step 3: Add platform-neutral profile types** - -Add this implementation above the test module: - -```rust -#[derive(Debug, Clone, PartialEq, Eq)] -pub enum ImeProfileKind { - KeyboardLayout, - TextService, -} - -#[derive(Debug, Clone, PartialEq, Eq)] -pub struct ImeProfileSnapshot { - pub kind: ImeProfileKind, - pub lang_id: u16, - pub clsid: Option, - pub profile_guid: Option, - pub hkl: Option, -} - -#[derive(Debug, Clone, Copy, PartialEq, Eq)] -pub enum ProfileRestoreDecision { - RestoreSavedProfile, - KeepCurrentProfile, -} - -pub fn restore_decision( - saved: Option<&ImeProfileSnapshot>, - openless_profile_is_current: bool, -) -> ProfileRestoreDecision { - if saved.is_some() && openless_profile_is_current { - ProfileRestoreDecision::RestoreSavedProfile - } else { - ProfileRestoreDecision::KeepCurrentProfile - } -} -``` - -- [ ] **Step 4: Add public manager API with non-Windows stub** - -Add this API below the pure types: - -```rust -#[derive(Debug, Clone, PartialEq, Eq)] -pub enum WindowsImeProfileError { - Unavailable(String), - WindowsApi(String), -} - -impl std::fmt::Display for WindowsImeProfileError { - fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { - match self { - Self::Unavailable(message) | Self::WindowsApi(message) => write!(f, "{message}"), - } - } -} - -impl std::error::Error for WindowsImeProfileError {} - -pub type WindowsImeProfileResult = Result; - -#[cfg(not(target_os = "windows"))] -pub struct WindowsImeProfileManager; - -#[cfg(not(target_os = "windows"))] -impl WindowsImeProfileManager { - pub fn new() -> Self { - Self - } - - pub fn capture_active_profile(&self) -> WindowsImeProfileResult { - Err(WindowsImeProfileError::Unavailable( - "Windows TSF profiles are only available on Windows".to_string(), - )) - } - - pub fn activate_openless_profile(&self) -> WindowsImeProfileResult<()> { - Err(WindowsImeProfileError::Unavailable( - "Windows TSF profiles are only available on Windows".to_string(), - )) - } - - pub fn restore_profile(&self, _snapshot: &ImeProfileSnapshot) -> WindowsImeProfileResult<()> { - Err(WindowsImeProfileError::Unavailable( - "Windows TSF profiles are only available on Windows".to_string(), - )) - } - - pub fn is_openless_profile_active(&self) -> WindowsImeProfileResult { - Ok(false) - } -} -``` - -- [ ] **Step 5: Register the module** - -Add this to `openless-all/app/src-tauri/src/lib.rs`: - -```rust -mod windows_ime_profile; -``` - -- [ ] **Step 6: Run the profile tests** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml windows_ime_profile --lib -``` - -Expected: all profile state tests pass. - -- [ ] **Step 7: Commit** - -```powershell -git add -- openless-all/app/src-tauri/src/windows_ime_profile.rs openless-all/app/src-tauri/src/lib.rs -git commit -m "feat: add Windows IME profile state" -``` - ---- - -### Task 3: Windows TSF Profile Manager - -**Files:** -- Modify: `openless-all/app/src-tauri/Cargo.toml` -- Modify: `openless-all/app/src-tauri/src/windows_ime_profile.rs` - -- [ ] **Step 1: Write failing Windows-only compile test for fixed identifiers** - -Add these tests inside `windows_ime_profile.rs`: - -```rust -#[cfg(all(test, target_os = "windows"))] -mod windows_tests { - use super::*; - - #[test] - fn openless_profile_identifiers_are_fixed() { - assert_eq!(OPENLESS_TSF_LANG_ID, 0x0804); - assert_eq!( - OPENLESS_TEXT_SERVICE_CLSID_BRACED, - "{6B9F3F4F-5EE7-42D6-9C61-9F80B03A5D7D}" - ); - assert_eq!( - OPENLESS_PROFILE_GUID_BRACED, - "{9B5F5E04-23F6-47DA-9A26-D221F6C3F02E}" - ); - } -} -``` - -- [ ] **Step 2: Run the Windows-only test and verify it fails** - -Run on Windows: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml openless_profile_identifiers_are_fixed --lib -``` - -Expected: compile fails because the constants are not defined. - -- [ ] **Step 3: Extend Windows API features** - -In `openless-all/app/src-tauri/Cargo.toml`, extend the Windows dependency features to include: - -```toml - "Win32_Globalization", - "Win32_System_Com", - "Win32_System_Ole", - "Win32_System_Registry", - "Win32_UI_TextServices", -``` - -The resulting Windows dependency block keeps the existing features and includes the new ones: - -```toml -[target.'cfg(target_os = "windows")'.dependencies] -windows = { version = "0.58", features = [ - "Win32_Foundation", - "Win32_Globalization", - "Win32_System_Com", - "Win32_System_Ole", - "Win32_System_Registry", - "Win32_System_Threading", - "Win32_UI_Input_KeyboardAndMouse", - "Win32_UI_Shell", - "Win32_UI_TextServices", - "Win32_UI_WindowsAndMessaging", -] } -winreg = "0.52" -``` - -- [ ] **Step 4: Add Windows constants and GUID parsing helpers** - -Add these items near the top of `windows_ime_profile.rs`: - -```rust -pub const OPENLESS_TSF_LANG_ID: u16 = 0x0804; -pub const OPENLESS_TEXT_SERVICE_CLSID_BRACED: &str = - "{6B9F3F4F-5EE7-42D6-9C61-9F80B03A5D7D}"; -pub const OPENLESS_PROFILE_GUID_BRACED: &str = - "{9B5F5E04-23F6-47DA-9A26-D221F6C3F02E}"; - -#[cfg(target_os = "windows")] -fn parse_guid(value: &str) -> WindowsImeProfileResult { - windows::core::GUID::from(value).map_err(|err| { - WindowsImeProfileError::WindowsApi(format!("invalid GUID {value}: {err}")) - }) -} -``` - -- [ ] **Step 5: Add Windows profile manager skeleton** - -Replace the non-Windows-only manager coverage with a Windows implementation guarded by `#[cfg(target_os = "windows")]`: - -```rust -#[cfg(target_os = "windows")] -pub struct WindowsImeProfileManager; - -#[cfg(target_os = "windows")] -impl WindowsImeProfileManager { - pub fn new() -> Self { - Self - } - - pub fn capture_active_profile(&self) -> WindowsImeProfileResult { - windows_impl::capture_active_profile() - } - - pub fn activate_openless_profile(&self) -> WindowsImeProfileResult<()> { - windows_impl::activate_openless_profile() - } - - pub fn restore_profile(&self, snapshot: &ImeProfileSnapshot) -> WindowsImeProfileResult<()> { - windows_impl::restore_profile(snapshot) - } - - pub fn is_openless_profile_active(&self) -> WindowsImeProfileResult { - windows_impl::is_openless_profile_active() - } -} -``` - -- [ ] **Step 6: Implement the Windows TSF calls** - -Add this module in `windows_ime_profile.rs`. Keep all COM calls inside this module: - -```rust -#[cfg(target_os = "windows")] -mod windows_impl { - use super::*; - use windows::core::{Interface, GUID}; - use windows::Win32::Foundation::HKL; - use windows::Win32::System::Com::{ - CoCreateInstance, CoInitializeEx, CoUninitialize, CLSCTX_INPROC_SERVER, - COINIT_APARTMENTTHREADED, - }; - use windows::Win32::UI::Input::KeyboardAndMouse::GetKeyboardLayout; - use windows::Win32::UI::TextServices::{ - ITfInputProcessorProfileMgr, CLSID_TF_InputProcessorProfiles, - TF_PROFILETYPE_INPUTPROCESSOR, TF_PROFILETYPE_KEYBOARDLAYOUT, - TF_IPPMF_FORPROCESS, - }; - - struct ComApartment; - - impl ComApartment { - fn init() -> WindowsImeProfileResult { - unsafe { - CoInitializeEx(None, COINIT_APARTMENTTHREADED).map_err(|err| { - WindowsImeProfileError::WindowsApi(format!("CoInitializeEx failed: {err}")) - })?; - } - Ok(Self) - } - } - - impl Drop for ComApartment { - fn drop(&mut self) { - unsafe { - CoUninitialize(); - } - } - } - - fn profile_mgr() -> WindowsImeProfileResult { - unsafe { - CoCreateInstance(&CLSID_TF_InputProcessorProfiles, None, CLSCTX_INPROC_SERVER) - .map_err(|err| { - WindowsImeProfileError::WindowsApi(format!( - "CoCreateInstance(CLSID_TF_InputProcessorProfiles) failed: {err}" - )) - }) - } - } - - pub fn capture_active_profile() -> WindowsImeProfileResult { - let _com = ComApartment::init()?; - let mgr = profile_mgr()?; - unsafe { - let profile = mgr.GetActiveProfile(GUID::zeroed()).map_err(|err| { - WindowsImeProfileError::WindowsApi(format!("GetActiveProfile failed: {err}")) - })?; - if profile.dwProfileType == TF_PROFILETYPE_INPUTPROCESSOR { - return Ok(ImeProfileSnapshot { - kind: ImeProfileKind::TextService, - lang_id: profile.langid as u16, - clsid: Some(format!("{:?}", profile.clsid)), - profile_guid: Some(format!("{:?}", profile.guidProfile)), - hkl: None, - }); - } - let hkl = GetKeyboardLayout(0); - Ok(ImeProfileSnapshot { - kind: ImeProfileKind::KeyboardLayout, - lang_id: profile.langid as u16, - clsid: None, - profile_guid: None, - hkl: Some(hkl.0), - }) - } - } - - pub fn activate_openless_profile() -> WindowsImeProfileResult<()> { - let _com = ComApartment::init()?; - let mgr = profile_mgr()?; - let clsid = parse_guid(OPENLESS_TEXT_SERVICE_CLSID_BRACED)?; - let profile_guid = parse_guid(OPENLESS_PROFILE_GUID_BRACED)?; - unsafe { - mgr.ActivateProfile( - TF_PROFILETYPE_INPUTPROCESSOR, - OPENLESS_TSF_LANG_ID, - &clsid, - &profile_guid, - windows::Win32::Foundation::HKL(0), - TF_IPPMF_FORPROCESS, - ) - .map_err(|err| { - WindowsImeProfileError::WindowsApi(format!( - "ActivateProfile(OpenLess) failed: {err}" - )) - }) - } - } - - pub fn restore_profile(snapshot: &ImeProfileSnapshot) -> WindowsImeProfileResult<()> { - let _com = ComApartment::init()?; - let mgr = profile_mgr()?; - unsafe { - match snapshot.kind { - ImeProfileKind::TextService => { - let clsid = parse_guid(snapshot.clsid.as_deref().ok_or_else(|| { - WindowsImeProfileError::WindowsApi( - "saved text service profile has no CLSID".to_string(), - ) - })?)?; - let profile_guid = parse_guid(snapshot.profile_guid.as_deref().ok_or_else(|| { - WindowsImeProfileError::WindowsApi( - "saved text service profile has no profile GUID".to_string(), - ) - })?)?; - mgr.ActivateProfile( - TF_PROFILETYPE_INPUTPROCESSOR, - snapshot.lang_id, - &clsid, - &profile_guid, - HKL(0), - TF_IPPMF_FORPROCESS, - ) - } - ImeProfileKind::KeyboardLayout => { - mgr.ActivateProfile( - TF_PROFILETYPE_KEYBOARDLAYOUT, - snapshot.lang_id, - &GUID::zeroed(), - &GUID::zeroed(), - HKL(snapshot.hkl.unwrap_or_default()), - TF_IPPMF_FORPROCESS, - ) - } - } - .map_err(|err| { - WindowsImeProfileError::WindowsApi(format!("restore profile failed: {err}")) - }) - } - } - - pub fn is_openless_profile_active() -> WindowsImeProfileResult { - let active = capture_active_profile()?; - Ok(active.kind == ImeProfileKind::TextService - && active.clsid.as_deref() == Some(OPENLESS_TEXT_SERVICE_CLSID_BRACED) - && active.profile_guid.as_deref() == Some(OPENLESS_PROFILE_GUID_BRACED)) - } -} -``` - -If `windows` crate signatures differ, adjust only the type adapters around `GetActiveProfile` and `ActivateProfile`; keep the public API and behavior unchanged. - -- [ ] **Step 7: Run Windows type check** - -Run: - -```powershell -cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml -``` - -Expected: backend type-checks on Windows. - -- [ ] **Step 8: Run profile tests** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml windows_ime_profile --lib -``` - -Expected: pure profile tests pass; Windows identifier test passes. - -- [ ] **Step 9: Commit** - -```powershell -git add -- openless-all/app/src-tauri/Cargo.toml openless-all/app/src-tauri/src/windows_ime_profile.rs -git commit -m "feat: manage Windows TSF input profiles" -``` - ---- - -### Task 4: Rust Named-Pipe IME Server - -**Files:** -- Create: `openless-all/app/src-tauri/src/windows_ime_ipc.rs` -- Modify: `openless-all/app/src-tauri/src/lib.rs` - -- [ ] **Step 1: Write failing pending-submit tests** - -Create `windows_ime_ipc.rs` with this test-first state logic: - -```rust -#[cfg(test)] -mod tests { - use super::*; - use crate::windows_ime_protocol::ImeSubmitStatus; - - #[test] - fn pending_submit_accepts_only_matching_session() { - let mut pending = PendingImeSubmit::new("session-1".to_string()); - assert!(pending.accept_result("session-2", ImeSubmitStatus::Committed).is_err()); - assert_eq!( - pending.accept_result("session-1", ImeSubmitStatus::Committed), - Ok(ImeSubmitStatus::Committed) - ); - } - - #[test] - fn pending_submit_rejects_second_result_after_completion() { - let mut pending = PendingImeSubmit::new("session-1".to_string()); - assert_eq!( - pending.accept_result("session-1", ImeSubmitStatus::Committed), - Ok(ImeSubmitStatus::Committed) - ); - assert!(pending.accept_result("session-1", ImeSubmitStatus::Committed).is_err()); - } -} -``` - -- [ ] **Step 2: Run the test and verify it fails because `PendingImeSubmit` is missing** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml windows_ime_ipc --lib -``` - -Expected: compile fails with missing `PendingImeSubmit`. - -- [ ] **Step 3: Add pending-submit state** - -Add this implementation above the tests: - -```rust -use std::time::Duration; - -use crate::windows_ime_protocol::ImeSubmitStatus; - -pub const IME_CLIENT_WAIT_TIMEOUT: Duration = Duration::from_millis(700); -pub const IME_SUBMIT_TIMEOUT: Duration = Duration::from_millis(900); - -#[derive(Debug, Clone, PartialEq, Eq)] -pub enum WindowsImeIpcError { - Unavailable(String), - NoReadyClient, - Timeout, - Protocol(String), - Io(String), -} - -impl std::fmt::Display for WindowsImeIpcError { - fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { - match self { - Self::Unavailable(message) - | Self::Protocol(message) - | Self::Io(message) => write!(f, "{message}"), - Self::NoReadyClient => write!(f, "no OpenLess IME client is ready"), - Self::Timeout => write!(f, "OpenLess IME IPC timed out"), - } - } -} - -impl std::error::Error for WindowsImeIpcError {} - -pub type WindowsImeIpcResult = Result; - -#[derive(Debug)] -pub struct PendingImeSubmit { - session_id: String, - completed: bool, -} - -impl PendingImeSubmit { - pub fn new(session_id: String) -> Self { - Self { - session_id, - completed: false, - } - } - - pub fn accept_result( - &mut self, - session_id: &str, - status: ImeSubmitStatus, - ) -> WindowsImeIpcResult { - if self.completed { - return Err(WindowsImeIpcError::Protocol( - "submit result arrived after completion".to_string(), - )); - } - if self.session_id != session_id { - return Err(WindowsImeIpcError::Protocol( - "submit result belongs to a different session".to_string(), - )); - } - self.completed = true; - Ok(status) - } -} -``` - -- [ ] **Step 4: Add public server API stubs** - -Add this API below `PendingImeSubmit`: - -```rust -#[derive(Debug, Clone)] -pub struct ImeSubmitRequest { - pub session_id: String, - pub text: String, - pub created_at: String, -} - -#[derive(Clone)] -pub struct WindowsImeIpcServer { - inner: std::sync::Arc>, -} - -#[derive(Debug, Default)] -struct WindowsImeIpcState { - ready_client_id: Option, -} - -impl WindowsImeIpcServer { - pub fn new() -> Self { - Self { - inner: std::sync::Arc::new(parking_lot::Mutex::new(WindowsImeIpcState::default())), - } - } - - pub fn mark_client_ready_for_test(&self, client_id: String) { - self.inner.lock().ready_client_id = Some(client_id); - } - - pub fn has_ready_client(&self) -> bool { - self.inner.lock().ready_client_id.is_some() - } -} -``` - -- [ ] **Step 5: Add Windows async submit implementation** - -Add a Windows-only `submit_text` implementation. Keep the non-Windows implementation as an immediate `Unavailable` error: - -```rust -#[cfg(not(target_os = "windows"))] -impl WindowsImeIpcServer { - pub async fn submit_text( - &self, - _request: ImeSubmitRequest, - ) -> WindowsImeIpcResult { - Err(WindowsImeIpcError::Unavailable( - "Windows IME IPC is only available on Windows".to_string(), - )) - } -} - -#[cfg(target_os = "windows")] -impl WindowsImeIpcServer { - pub async fn submit_text( - &self, - request: ImeSubmitRequest, - ) -> WindowsImeIpcResult { - if !self.has_ready_client() { - return Err(WindowsImeIpcError::NoReadyClient); - } - - windows_pipe::submit_text_over_pipe(request).await - } -} - -#[cfg(target_os = "windows")] -mod windows_pipe { - use super::*; - use crate::windows_ime_protocol::{ - decode_message, encode_message, ImePipeMessage, OPENLESS_IME_PIPE_NAME, - OPENLESS_IME_PROTOCOL_VERSION, - }; - use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader}; - use tokio::net::windows::named_pipe::ClientOptions; - - pub async fn submit_text_over_pipe( - request: ImeSubmitRequest, - ) -> WindowsImeIpcResult { - let client = ClientOptions::new() - .open(OPENLESS_IME_PIPE_NAME) - .map_err(|err| WindowsImeIpcError::Io(format!("open IME pipe failed: {err}")))?; - let (reader, mut writer) = tokio::io::split(client); - let mut reader = BufReader::new(reader); - let submit = ImePipeMessage::SubmitText { - protocol_version: OPENLESS_IME_PROTOCOL_VERSION, - session_id: request.session_id.clone(), - text: request.text, - created_at: request.created_at, - }; - let line = encode_message(&submit) - .map_err(|err| WindowsImeIpcError::Protocol(format!("encode submit failed: {err}")))?; - writer - .write_all(line.as_bytes()) - .await - .map_err(|err| WindowsImeIpcError::Io(format!("write submit failed: {err}")))?; - writer - .flush() - .await - .map_err(|err| WindowsImeIpcError::Io(format!("flush submit failed: {err}")))?; - - let mut response = String::new(); - let read = tokio::time::timeout(IME_SUBMIT_TIMEOUT, reader.read_line(&mut response)) - .await - .map_err(|_| WindowsImeIpcError::Timeout)? - .map_err(|err| WindowsImeIpcError::Io(format!("read submit result failed: {err}")))?; - if read == 0 { - return Err(WindowsImeIpcError::Io("IME pipe closed before result".to_string())); - } - - match decode_message(response.trim_end()) - .map_err(|err| WindowsImeIpcError::Protocol(format!("decode result failed: {err}")))? - { - ImePipeMessage::SubmitResult { - session_id, - status, - .. - } if session_id == request.session_id => Ok(status), - _ => Err(WindowsImeIpcError::Protocol( - "unexpected IME submit result".to_string(), - )), - } - } -} -``` - -This MVP opens the named pipe for each submit. The C++ IME DLL owns the pipe server because it is the active TSF instance inside the focused process. - -- [ ] **Step 6: Register the module** - -Add this to `lib.rs`: - -```rust -mod windows_ime_ipc; -``` - -- [ ] **Step 7: Run tests** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml windows_ime_ipc --lib -``` - -Expected: pending-submit tests pass. - -- [ ] **Step 8: Run type check** - -Run: - -```powershell -cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml -``` - -Expected: backend type-checks. - -- [ ] **Step 9: Commit** - -```powershell -git add -- openless-all/app/src-tauri/src/windows_ime_ipc.rs openless-all/app/src-tauri/src/lib.rs -git commit -m "feat: add Windows IME IPC client" -``` - ---- - -### Task 5: Windows IME Session Guard and Fallback Routing - -**Files:** -- Create: `openless-all/app/src-tauri/src/windows_ime_session.rs` -- Modify: `openless-all/app/src-tauri/src/insertion.rs` -- Modify: `openless-all/app/src-tauri/src/coordinator.rs` -- Modify: `openless-all/app/src-tauri/src/lib.rs` - -- [ ] **Step 1: Write failing routing tests** - -Create `windows_ime_session.rs` with these tests: - -```rust -#[cfg(test)] -mod tests { - use super::*; - use crate::types::InsertStatus; - use crate::windows_ime_protocol::ImeSubmitStatus; - - #[test] - fn committed_ime_result_maps_to_inserted() { - assert_eq!( - map_ime_status_to_insert_status(ImeSubmitStatus::Committed), - InsertStatus::Inserted - ); - } - - #[test] - fn rejected_ime_result_requests_fallback() { - assert!(should_fallback_after_ime_result(ImeSubmitStatus::Rejected)); - assert!(should_fallback_after_ime_result(ImeSubmitStatus::Failed)); - assert!(!should_fallback_after_ime_result(ImeSubmitStatus::Committed)); - } -} -``` - -- [ ] **Step 2: Run the test and verify it fails because mapping functions are missing** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml windows_ime_session --lib -``` - -Expected: compile fails with missing mapping functions. - -- [ ] **Step 3: Add mapping functions and session result types** - -Add this implementation above the tests: - -```rust -use crate::types::InsertStatus; -use crate::windows_ime_ipc::{ImeSubmitRequest, WindowsImeIpcServer}; -use crate::windows_ime_profile::{ImeProfileSnapshot, WindowsImeProfileManager}; -use crate::windows_ime_protocol::ImeSubmitStatus; - -#[derive(Debug)] -pub enum WindowsImeSessionError { - Profile(String), - Ipc(String), -} - -impl std::fmt::Display for WindowsImeSessionError { - fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { - match self { - Self::Profile(message) | Self::Ipc(message) => write!(f, "{message}"), - } - } -} - -impl std::error::Error for WindowsImeSessionError {} - -pub fn map_ime_status_to_insert_status(status: ImeSubmitStatus) -> InsertStatus { - match status { - ImeSubmitStatus::Committed => InsertStatus::Inserted, - ImeSubmitStatus::Rejected | ImeSubmitStatus::Failed => InsertStatus::CopiedFallback, - } -} - -pub fn should_fallback_after_ime_result(status: ImeSubmitStatus) -> bool { - !matches!(status, ImeSubmitStatus::Committed) -} - -#[derive(Debug)] -pub struct PreparedWindowsImeSession { - saved_profile: Option, - openless_activated: bool, -} - -impl PreparedWindowsImeSession { - pub fn unavailable() -> Self { - Self { - saved_profile: None, - openless_activated: false, - } - } - - pub fn is_ready_for_tsf_submit(&self) -> bool { - self.saved_profile.is_some() && self.openless_activated - } -} -``` - -- [ ] **Step 4: Add Windows session controller** - -Add this controller below `PreparedWindowsImeSession`: - -```rust -pub struct WindowsImeSessionController { - profile_manager: WindowsImeProfileManager, - ipc: WindowsImeIpcServer, -} - -impl WindowsImeSessionController { - pub fn new() -> Self { - Self { - profile_manager: WindowsImeProfileManager::new(), - ipc: WindowsImeIpcServer::new(), - } - } - - pub fn prepare_session(&self) -> PreparedWindowsImeSession { - #[cfg(not(target_os = "windows"))] - { - PreparedWindowsImeSession::unavailable() - } - - #[cfg(target_os = "windows")] - { - let saved_profile = match self.profile_manager.capture_active_profile() { - Ok(snapshot) => Some(snapshot), - Err(err) => { - log::warn!("[windows-ime] capture active profile failed: {err}"); - None - } - }; - if saved_profile.is_none() { - return PreparedWindowsImeSession::unavailable(); - } - match self.profile_manager.activate_openless_profile() { - Ok(()) => PreparedWindowsImeSession { - saved_profile, - openless_activated: true, - }, - Err(err) => { - log::warn!("[windows-ime] activate OpenLess profile failed: {err}"); - PreparedWindowsImeSession::unavailable() - } - } - } - } - - pub async fn submit_prepared( - &self, - prepared: &PreparedWindowsImeSession, - request: ImeSubmitRequest, - ) -> Result { - if !prepared.is_ready_for_tsf_submit() { - return Err(WindowsImeSessionError::Ipc( - "OpenLess IME session is not active".to_string(), - )); - } - let status = self - .ipc - .submit_text(request) - .await - .map_err(|err| WindowsImeSessionError::Ipc(err.to_string()))?; - Ok(map_ime_status_to_insert_status(status)) - } - - pub fn restore_session(&self, prepared: PreparedWindowsImeSession) { - let Some(saved_profile) = prepared.saved_profile else { - return; - }; - match self.profile_manager.is_openless_profile_active() { - Ok(true) => { - if let Err(err) = self.profile_manager.restore_profile(&saved_profile) { - log::warn!("[windows-ime] restore previous profile failed: {err}"); - } - } - Ok(false) => {} - Err(err) => log::warn!("[windows-ime] profile active check failed: {err}"), - } - } -} -``` - -- [ ] **Step 5: Register the module** - -Add this to `lib.rs`: - -```rust -mod windows_ime_session; -``` - -- [ ] **Step 6: Expose a fallback-only insertion method** - -In `insertion.rs`, keep current behavior but rename the Windows/Linux helper intent by adding this method to `impl TextInserter` under `#[cfg(not(target_os = "macos"))]`: - -```rust -#[cfg(not(target_os = "macos"))] -pub fn insert_via_clipboard_fallback( - &self, - text: &str, - restore_clipboard_after_paste: bool, -) -> InsertStatus { - self.insert(text, restore_clipboard_after_paste) -} -``` - -- [ ] **Step 7: Wire the controller into coordinator state** - -In `coordinator.rs`, add the controller and prepared session field near the existing `inserter` field: - -```rust -#[cfg(target_os = "windows")] -use crate::windows_ime_session::{PreparedWindowsImeSession, WindowsImeSessionController}; -``` - -Add fields to the coordinator inner state: - -```rust -#[cfg(target_os = "windows")] -windows_ime: WindowsImeSessionController, -#[cfg(target_os = "windows")] -prepared_windows_ime_session: Arc>>, -``` - -Initialize them where `TextInserter::new()` is initialized: - -```rust -#[cfg(target_os = "windows")] -windows_ime: WindowsImeSessionController::new(), -#[cfg(target_os = "windows")] -prepared_windows_ime_session: Arc::new(Mutex::new(None)), -``` - -- [ ] **Step 8: Prepare TSF session when recording starts** - -In the recording-start path, immediately after the coordinator accepts the hotkey edge and before recorder start, add: - -```rust -#[cfg(target_os = "windows")] -{ - let prepared = inner.windows_ime.prepare_session(); - *inner.prepared_windows_ime_session.lock() = Some(prepared); -} -``` - -This code belongs in the same start-session branch that changes phase from `Idle` to `Starting`. - -- [ ] **Step 9: Submit through TSF first in `end_session`** - -Replace the direct insertion call: - -```rust -let status = inner.inserter.insert(&polished, restore_clipboard); -``` - -with Windows-first routing: - -```rust -#[cfg(target_os = "windows")] -let status = { - let prepared = inner.prepared_windows_ime_session.lock().take(); - if let Some(prepared) = prepared { - let request = crate::windows_ime_ipc::ImeSubmitRequest { - session_id: Uuid::new_v4().to_string(), - text: polished.clone(), - created_at: Utc::now().to_rfc3339(), - }; - let tsf_status = inner.windows_ime.submit_prepared(&prepared, request).await; - inner.windows_ime.restore_session(prepared); - match tsf_status { - Ok(InsertStatus::Inserted) => InsertStatus::Inserted, - Ok(_) | Err(_) => inner - .inserter - .insert_via_clipboard_fallback(&polished, restore_clipboard), - } - } else { - inner - .inserter - .insert_via_clipboard_fallback(&polished, restore_clipboard) - } -}; - -#[cfg(not(target_os = "windows"))] -let status = inner.inserter.insert(&polished, restore_clipboard); -``` - -- [ ] **Step 10: Restore on cancellation** - -In the cancellation path that handles active `Starting`, `Listening`, or `Processing` sessions, add: - -```rust -#[cfg(target_os = "windows")] -if let Some(prepared) = inner.prepared_windows_ime_session.lock().take() { - inner.windows_ime.restore_session(prepared); -} -``` - -Place it before returning the session to `Idle`. - -- [ ] **Step 11: Run focused tests** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml windows_ime_session --lib -``` - -Expected: routing tests pass. - -- [ ] **Step 12: Run backend type check** - -Run: - -```powershell -cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml -``` - -Expected: backend type-checks. - -- [ ] **Step 13: Commit** - -```powershell -git add -- openless-all/app/src-tauri/src/windows_ime_session.rs openless-all/app/src-tauri/src/insertion.rs openless-all/app/src-tauri/src/coordinator.rs openless-all/app/src-tauri/src/lib.rs -git commit -m "feat: route Windows insertion through temporary TSF IME" -``` - ---- - -### Task 6: C++ TSF DLL Project Skeleton - -**Files:** -- Create: `openless-all/app/windows-ime/OpenLessIme.sln` -- Create: `openless-all/app/windows-ime/OpenLessIme.vcxproj` -- Create: `openless-all/app/windows-ime/src/guids.h` -- Create: `openless-all/app/windows-ime/src/dllmain.cpp` -- Create: `openless-all/app/windows-ime/src/class_factory.h` -- Create: `openless-all/app/windows-ime/src/class_factory.cpp` -- Create: `openless-all/app/windows-ime/src/text_service.h` -- Create: `openless-all/app/windows-ime/src/text_service.cpp` -- Create: `openless-all/app/windows-ime/src/registry.h` -- Create: `openless-all/app/windows-ime/src/registry.cpp` -- Create: `openless-all/app/windows-ime/src/resource.rc` - -- [ ] **Step 1: Create the C++ project files** - -Create a Visual Studio DLL project that builds `OpenLessIme.dll` for x64 with C++17 and the Windows SDK. The `.vcxproj` must include: - -```xml -DynamicLibrary -Unicode -stdcpp17 -msctf.lib;ole32.lib;uuid.lib;advapi32.lib;%(AdditionalDependencies) -``` - -Include every `src/*.cpp`, `src/*.h`, and `src/resource.rc` file listed in this task. - -- [ ] **Step 2: Add fixed GUID constants** - -Create `src/guids.h`: - -```cpp -#pragma once - -#include - -// {6B9F3F4F-5EE7-42D6-9C61-9F80B03A5D7D} -inline constexpr GUID CLSID_OpenLessTextService = { - 0x6b9f3f4f, - 0x5ee7, - 0x42d6, - {0x9c, 0x61, 0x9f, 0x80, 0xb0, 0x3a, 0x5d, 0x7d}}; - -// {9B5F5E04-23F6-47DA-9A26-D221F6C3F02E} -inline constexpr GUID GUID_OpenLessProfile = { - 0x9b5f5e04, - 0x23f6, - 0x47da, - {0x9a, 0x26, 0xd2, 0x21, 0xf6, 0xc3, 0xf0, 0x2e}}; - -inline constexpr wchar_t kOpenLessImeName[] = L"OpenLess Voice Input"; -inline constexpr LANGID kOpenLessLangId = 0x0804; -``` - -- [ ] **Step 3: Add DLL exports and module lifetime** - -Create `src/dllmain.cpp` with exports: - -```cpp -#include -#include "class_factory.h" -#include "registry.h" -#include "guids.h" - -HINSTANCE g_module = nullptr; -long g_lock_count = 0; -long g_object_count = 0; - -BOOL APIENTRY DllMain(HINSTANCE module, DWORD reason, LPVOID) { - if (reason == DLL_PROCESS_ATTACH) { - g_module = module; - DisableThreadLibraryCalls(module); - } - return TRUE; -} - -STDAPI DllCanUnloadNow() { - return (g_lock_count == 0 && g_object_count == 0) ? S_OK : S_FALSE; -} - -STDAPI DllGetClassObject(REFCLSID clsid, REFIID iid, void** result) { - if (!result) { - return E_POINTER; - } - *result = nullptr; - if (clsid != CLSID_OpenLessTextService) { - return CLASS_E_CLASSNOTAVAILABLE; - } - auto* factory = new (std::nothrow) OpenLessClassFactory(); - if (!factory) { - return E_OUTOFMEMORY; - } - const HRESULT hr = factory->QueryInterface(iid, result); - factory->Release(); - return hr; -} - -STDAPI DllRegisterServer() { - return RegisterOpenLessTextService(g_module); -} - -STDAPI DllUnregisterServer() { - return UnregisterOpenLessTextService(); -} -``` - -- [ ] **Step 4: Add class factory** - -Create `class_factory.h/.cpp` implementing `IClassFactory`. It must: - -- Support `IUnknown` and `IClassFactory`. -- Increment `g_object_count` on construction and decrement it on destruction. -- `CreateInstance` returns a new `OpenLessTextService`. -- `LockServer` increments/decrements `g_lock_count`. - -Use this `CreateInstance` body: - -```cpp -HRESULT OpenLessClassFactory::CreateInstance(IUnknown* outer, REFIID iid, void** result) { - if (!result) { - return E_POINTER; - } - *result = nullptr; - if (outer) { - return CLASS_E_NOAGGREGATION; - } - auto* service = new (std::nothrow) OpenLessTextService(); - if (!service) { - return E_OUTOFMEMORY; - } - const HRESULT hr = service->QueryInterface(iid, result); - service->Release(); - return hr; -} -``` - -- [ ] **Step 5: Add minimal text service class** - -Create `text_service.h/.cpp` implementing `ITfTextInputProcessorEx`. It must: - -- Support `IUnknown`, `ITfTextInputProcessor`, and `ITfTextInputProcessorEx`. -- Store `ITfThreadMgr* thread_mgr_` and `TfClientId client_id_`. -- `ActivateEx` stores the thread manager and client id, starts the IPC server thread, and returns `S_OK`. -- `Deactivate` stops the IPC server thread, releases the thread manager, clears client id, and returns `S_OK`. - -Use this method shape: - -```cpp -HRESULT OpenLessTextService::ActivateEx(ITfThreadMgr* thread_mgr, TfClientId client_id, DWORD) { - if (!thread_mgr) { - return E_INVALIDARG; - } - thread_mgr_ = thread_mgr; - thread_mgr_->AddRef(); - client_id_ = client_id; - ipc_client_.Start(this); - return S_OK; -} - -HRESULT OpenLessTextService::Deactivate() { - ipc_client_.Stop(); - if (thread_mgr_) { - thread_mgr_->Release(); - thread_mgr_ = nullptr; - } - client_id_ = TF_CLIENTID_NULL; - return S_OK; -} -``` - -Add a method used by the IPC client: - -```cpp -HRESULT OpenLessTextService::SubmitTextFromPipe(const std::wstring& session_id, - const std::wstring& text); -``` - -For this task, return `E_NOTIMPL` from `SubmitTextFromPipe`; Task 7 replaces it with real edit-session submission. - -- [ ] **Step 6: Add COM and TSF registration code** - -Create `registry.h/.cpp` with: - -```cpp -HRESULT RegisterOpenLessTextService(HINSTANCE module); -HRESULT UnregisterOpenLessTextService(); -``` - -`RegisterOpenLessTextService` must: - -- Write HKCU COM registration under `Software\Classes\CLSID\{6B9F3F4F-5EE7-42D6-9C61-9F80B03A5D7D}`. -- Set `InprocServer32` default value to the DLL path. -- Set `ThreadingModel` to `Apartment`. -- Create `ITfInputProcessorProfiles`. -- Call `Register(CLSID_OpenLessTextService)`. -- Call `AddLanguageProfile(CLSID_OpenLessTextService, 0x0804, GUID_OpenLessProfile, L"OpenLess Voice Input", ...)`. -- Call `EnableLanguageProfile(CLSID_OpenLessTextService, 0x0804, GUID_OpenLessProfile, TRUE)`. - -`UnregisterOpenLessTextService` must call `Unregister(CLSID_OpenLessTextService)` and remove the HKCU COM registration key. - -- [ ] **Step 7: Build the DLL** - -Run from a Developer PowerShell: - -```powershell -MSBuild openless-all/app/windows-ime/OpenLessIme.sln /p:Configuration=Release /p:Platform=x64 -``` - -Expected: `openless-all/app/windows-ime/x64/Release/OpenLessIme.dll` exists. - -- [ ] **Step 8: Commit** - -```powershell -git add -- openless-all/app/windows-ime -git commit -m "feat: scaffold OpenLess TSF IME DLL" -``` - ---- - -### Task 7: TSF Edit Session Text Commit - -**Files:** -- Create: `openless-all/app/windows-ime/src/edit_session.h` -- Create: `openless-all/app/windows-ime/src/edit_session.cpp` -- Modify: `openless-all/app/windows-ime/src/text_service.h` -- Modify: `openless-all/app/windows-ime/src/text_service.cpp` - -- [ ] **Step 1: Add edit session class** - -Create `edit_session.h/.cpp` implementing `ITfEditSession`: - -```cpp -class OpenLessEditSession final : public ITfEditSession { -public: - OpenLessEditSession(ITfContext* context, std::wstring text); - - STDMETHODIMP QueryInterface(REFIID iid, void** result) override; - STDMETHODIMP_(ULONG) AddRef() override; - STDMETHODIMP_(ULONG) Release() override; - STDMETHODIMP DoEditSession(TfEditCookie edit_cookie) override; - -private: - ~OpenLessEditSession() = default; - - long ref_count_ = 1; - ITfContext* context_ = nullptr; - std::wstring text_; -}; -``` - -`DoEditSession` must query `ITfInsertAtSelection` from the context and call: - -```cpp -insert_at_selection->InsertTextAtSelection( - edit_cookie, - TF_IAS_QUERYONLY, - text_.c_str(), - static_cast(text_.size()), - nullptr); -``` - -Then call the same method without `TF_IAS_QUERYONLY` to commit text: - -```cpp -insert_at_selection->InsertTextAtSelection( - edit_cookie, - 0, - text_.c_str(), - static_cast(text_.size()), - nullptr); -``` - -Return the HRESULT from the committing call. Release every COM pointer acquired in the method. - -- [ ] **Step 2: Replace `SubmitTextFromPipe` with real TSF submission** - -In `text_service.cpp`, implement: - -```cpp -HRESULT OpenLessTextService::SubmitTextFromPipe(const std::wstring&, - const std::wstring& text) { - if (!thread_mgr_ || client_id_ == TF_CLIENTID_NULL) { - return E_UNEXPECTED; - } - - ITfDocumentMgr* document_mgr = nullptr; - HRESULT hr = thread_mgr_->GetFocus(&document_mgr); - if (FAILED(hr) || !document_mgr) { - return FAILED(hr) ? hr : E_FAIL; - } - - ITfContext* context = nullptr; - hr = document_mgr->GetTop(&context); - document_mgr->Release(); - if (FAILED(hr) || !context) { - return FAILED(hr) ? hr : E_FAIL; - } - - auto* session = new (std::nothrow) OpenLessEditSession(context, text); - if (!session) { - context->Release(); - return E_OUTOFMEMORY; - } - - HRESULT edit_result = E_FAIL; - hr = context->RequestEditSession( - client_id_, - session, - TF_ES_SYNC | TF_ES_READWRITE, - &edit_result); - session->Release(); - context->Release(); - if (FAILED(hr)) { - return hr; - } - return edit_result; -} -``` - -- [ ] **Step 3: Build the DLL** - -Run: - -```powershell -MSBuild openless-all/app/windows-ime/OpenLessIme.sln /p:Configuration=Release /p:Platform=x64 -``` - -Expected: build succeeds. - -- [ ] **Step 4: Commit** - -```powershell -git add -- openless-all/app/windows-ime/src/edit_session.h openless-all/app/windows-ime/src/edit_session.cpp openless-all/app/windows-ime/src/text_service.h openless-all/app/windows-ime/src/text_service.cpp -git commit -m "feat: commit dictated text through TSF edit sessions" -``` - ---- - -### Task 8: C++ Named-Pipe Server in the IME DLL - -**Files:** -- Create: `openless-all/app/windows-ime/src/ipc_client.h` -- Create: `openless-all/app/windows-ime/src/ipc_client.cpp` -- Modify: `openless-all/app/windows-ime/src/text_service.h` -- Modify: `openless-all/app/windows-ime/src/text_service.cpp` - -- [ ] **Step 1: Add IPC server class** - -Create `ipc_client.h` with: - -```cpp -class OpenLessTextService; - -class OpenLessPipeServer { -public: - OpenLessPipeServer(); - ~OpenLessPipeServer(); - - void Start(OpenLessTextService* service); - void Stop(); - -private: - void Run(); - HRESULT HandleSubmitLine(const std::wstring& line); - bool WriteResult(const std::wstring& session_id, const wchar_t* status, const wchar_t* error_code); - - std::atomic stop_requested_{false}; - std::thread thread_; - OpenLessTextService* service_ = nullptr; -}; -``` - -- [ ] **Step 2: Implement one-submit-at-a-time JSONL handling** - -Create `ipc_client.cpp` using Windows named pipes: - -- Pipe name: `\\.\pipe\OpenLessImeSubmit` -- Pipe mode: message pipe, byte read mode, blocking wait. -- Accept one client at a time. -- Read one UTF-8 JSON line. -- Extract `type`, `sessionId`, and `text`. -- Reject messages whose `type` is not `submitText`. -- Convert `text` from UTF-8 to UTF-16. -- Call `service_->SubmitTextFromPipe(session_id, text)`. -- Write one JSONL `submitResult` response with `committed`, `rejected`, or `failed`. - -Use a small local parser limited to the protocol keys: - -```cpp -std::wstring ExtractJsonStringField(const std::wstring& json, const wchar_t* field_name); -``` - -The parser only needs to handle JSON emitted by Rust `serde_json` for this protocol. It must reject missing fields and return `failed` with `protocolError`. - -- [ ] **Step 3: Start and stop pipe server from the text service** - -In `OpenLessTextService::ActivateEx`, call: - -```cpp -pipe_server_.Start(this); -``` - -In `OpenLessTextService::Deactivate`, call: - -```cpp -pipe_server_.Stop(); -``` - -Store `OpenLessPipeServer pipe_server_;` as a member of `OpenLessTextService`. - -- [ ] **Step 4: Build the DLL** - -Run: - -```powershell -MSBuild openless-all/app/windows-ime/OpenLessIme.sln /p:Configuration=Release /p:Platform=x64 -``` - -Expected: build succeeds. - -- [ ] **Step 5: Commit** - -```powershell -git add -- openless-all/app/windows-ime/src/ipc_client.h openless-all/app/windows-ime/src/ipc_client.cpp openless-all/app/windows-ime/src/text_service.h openless-all/app/windows-ime/src/text_service.cpp -git commit -m "feat: receive OpenLess IME submissions over a named pipe" -``` - ---- - -### Task 9: Registration and Build Scripts - -**Files:** -- Create: `openless-all/app/scripts/windows-ime-build.ps1` -- Create: `openless-all/app/scripts/windows-ime-register.ps1` -- Create: `openless-all/app/scripts/windows-ime-unregister.ps1` -- Modify: `openless-all/app/scripts/windows-preflight.ps1` - -- [ ] **Step 1: Add build script** - -Create `windows-ime-build.ps1`: - -```powershell -param( - [ValidateSet("Debug", "Release")] - [string]$Configuration = "Release" -) - -$ErrorActionPreference = "Stop" -$appRoot = (Resolve-Path (Join-Path $PSScriptRoot "..")).Path -$solution = Join-Path $appRoot "windows-ime\OpenLessIme.sln" - -$msbuild = Get-Command MSBuild.exe -ErrorAction SilentlyContinue -if (-not $msbuild) { - throw "MSBuild.exe not found. Run from Developer PowerShell or install Visual Studio Build Tools with Desktop development with C++." -} - -& $msbuild.Source $solution /p:Configuration=$Configuration /p:Platform=x64 -if ($LASTEXITCODE -ne 0) { - throw "OpenLessIme build failed with exit code $LASTEXITCODE" -} - -$dll = Join-Path $appRoot "windows-ime\x64\$Configuration\OpenLessIme.dll" -if (-not (Test-Path $dll)) { - throw "OpenLessIme.dll was not produced at $dll" -} - -Write-Host "[ok] $dll" -``` - -- [ ] **Step 2: Add register script** - -Create `windows-ime-register.ps1`: - -```powershell -param( - [ValidateSet("Debug", "Release")] - [string]$Configuration = "Release" -) - -$ErrorActionPreference = "Stop" -$appRoot = (Resolve-Path (Join-Path $PSScriptRoot "..")).Path -$dll = Join-Path $appRoot "windows-ime\x64\$Configuration\OpenLessIme.dll" - -if (-not (Test-Path $dll)) { - & (Join-Path $PSScriptRoot "windows-ime-build.ps1") -Configuration $Configuration -} - -$regsvr32 = Join-Path $env:WINDIR "System32\regsvr32.exe" -& $regsvr32 /s $dll -if ($LASTEXITCODE -ne 0) { - throw "regsvr32 failed with exit code $LASTEXITCODE" -} - -Write-Host "[ok] OpenLess TSF IME registered for current user" -``` - -- [ ] **Step 3: Add unregister script** - -Create `windows-ime-unregister.ps1`: - -```powershell -param( - [ValidateSet("Debug", "Release")] - [string]$Configuration = "Release" -) - -$ErrorActionPreference = "Stop" -$appRoot = (Resolve-Path (Join-Path $PSScriptRoot "..")).Path -$dll = Join-Path $appRoot "windows-ime\x64\$Configuration\OpenLessIme.dll" - -if (-not (Test-Path $dll)) { - Write-Host "[skip] OpenLessIme.dll not found at $dll" - exit 0 -} - -$regsvr32 = Join-Path $env:WINDIR "System32\regsvr32.exe" -& $regsvr32 /u /s $dll -if ($LASTEXITCODE -ne 0) { - throw "regsvr32 /u failed with exit code $LASTEXITCODE" -} - -Write-Host "[ok] OpenLess TSF IME unregistered" -``` - -- [ ] **Step 4: Extend preflight** - -In `windows-preflight.ps1`, add an `ime` option to the `ValidateSet` and check: - -```powershell -if ($Toolchain -eq "all" -or $Toolchain -eq "msvc" -or $Toolchain -eq "ime") { - Write-Host "" - Write-Host "== Windows IME route ==" - if (-not (Test-Command "MSBuild.exe")) { - Write-Host "[hint] Install Visual Studio Build Tools and run from Developer PowerShell." - $failed = $true - } - $msctf = Get-ChildItem -LiteralPath (Join-Path ${env:ProgramFiles(x86)} "Windows Kits\10\Lib") -Recurse -Filter msctf.lib -ErrorAction SilentlyContinue | - Where-Object { $_.FullName -match "\\um\\x64\\msctf\.lib$" } | - Select-Object -First 1 - if ($msctf) { - Write-Host "[ok] msctf.lib -> $($msctf.FullName)" - } else { - Write-Host "[missing] msctf.lib" - $failed = $true - } -} -``` - -- [ ] **Step 5: Run scripts** - -Run: - -```powershell -.\openless-all\app\scripts\windows-preflight.ps1 -Toolchain ime -.\openless-all\app\scripts\windows-ime-build.ps1 -``` - -Expected: preflight passes and the IME DLL builds. - -- [ ] **Step 6: Commit** - -```powershell -git add -- openless-all/app/scripts/windows-ime-build.ps1 openless-all/app/scripts/windows-ime-register.ps1 openless-all/app/scripts/windows-ime-unregister.ps1 openless-all/app/scripts/windows-preflight.ps1 -git commit -m "feat: add Windows IME build and registration scripts" -``` - ---- - -### Task 10: Tauri Commands and Settings Status - -**Files:** -- Modify: `openless-all/app/src-tauri/src/types.rs` -- Modify: `openless-all/app/src-tauri/src/commands.rs` -- Modify: `openless-all/app/src-tauri/src/lib.rs` -- Modify: `openless-all/app/src/lib/types.ts` -- Modify: `openless-all/app/src/lib/ipc.ts` -- Modify: `openless-all/app/src/i18n/zh-CN.ts` -- Modify: `openless-all/app/src/i18n/en.ts` -- Modify: `openless-all/app/src/pages/Settings.tsx` - -- [ ] **Step 1: Add backend status types** - -In `types.rs`, add: - -```rust -#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] -#[serde(rename_all = "camelCase")] -pub enum WindowsImeInstallState { - NotWindows, - NotInstalled, - Installed, - RegistrationBroken, -} - -#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] -#[serde(rename_all = "camelCase")] -pub struct WindowsImeStatus { - pub state: WindowsImeInstallState, - pub using_tsf_backend: bool, - pub message: Option, -} -``` - -- [ ] **Step 2: Add status command** - -In `commands.rs`, add: - -```rust -#[tauri::command] -pub fn get_windows_ime_status() -> WindowsImeStatus { - #[cfg(not(target_os = "windows"))] - { - WindowsImeStatus { - state: WindowsImeInstallState::NotWindows, - using_tsf_backend: false, - message: Some("Windows TSF IME is only available on Windows.".to_string()), - } - } - - #[cfg(target_os = "windows")] - { - match crate::windows_ime_profile::WindowsImeProfileManager::new() - .is_openless_profile_active() - { - Ok(_) => WindowsImeStatus { - state: WindowsImeInstallState::Installed, - using_tsf_backend: true, - message: None, - }, - Err(err) => WindowsImeStatus { - state: WindowsImeInstallState::NotInstalled, - using_tsf_backend: false, - message: Some(err.to_string()), - }, - } - } -} -``` - -Use this as a health signal only; active-profile false is not a failure because OpenLess should be active only during voice sessions. - -- [ ] **Step 3: Register command** - -Add `get_windows_ime_status` to the Tauri `invoke_handler!` list in `lib.rs`. - -- [ ] **Step 4: Add frontend types and IPC wrapper** - -In `src/lib/types.ts`: - -```ts -export type WindowsImeInstallState = - | 'notWindows' - | 'notInstalled' - | 'installed' - | 'registrationBroken'; - -export interface WindowsImeStatus { - state: WindowsImeInstallState; - usingTsfBackend: boolean; - message?: string | null; -} -``` - -In `src/lib/ipc.ts`: - -```ts -export async function getWindowsImeStatus(): Promise { - if (isBrowserDev()) { - return { - state: 'notWindows', - usingTsfBackend: false, - message: 'Browser dev mock', - }; - } - return invoke('get_windows_ime_status'); -} -``` - -- [ ] **Step 5: Add Settings UI row** - -In `Settings.tsx`, add a Windows-only status row using existing UI atoms. Text keys: - -Chinese source: - -```ts -windowsImeTitle: 'Windows 输入法后端', -windowsImeInstalled: '已安装,语音输入会临时切换到 OpenLess 输入法', -windowsImeNotInstalled: '未安装,当前使用剪贴板/WM_PASTE 回退', -windowsImeRegistrationBroken: '注册异常,请重新安装 OpenLess 输入法', -windowsImeNotWindows: '仅 Windows 可用', -``` - -English: - -```ts -windowsImeTitle: 'Windows input method backend', -windowsImeInstalled: 'Installed. Voice input temporarily switches to the OpenLess IME.', -windowsImeNotInstalled: 'Not installed. OpenLess is using the clipboard/WM_PASTE fallback.', -windowsImeRegistrationBroken: 'Registration is broken. Reinstall the OpenLess IME.', -windowsImeNotWindows: 'Only available on Windows.', -``` - -- [ ] **Step 6: Run frontend build** - -Run: - -```powershell -cd openless-all/app -npm run build -``` - -Expected: TypeScript and Vite build succeed. - -- [ ] **Step 7: Run backend type check** - -Run: - -```powershell -cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml -``` - -Expected: backend type-checks. - -- [ ] **Step 8: Commit** - -```powershell -git add -- openless-all/app/src-tauri/src/types.rs openless-all/app/src-tauri/src/commands.rs openless-all/app/src-tauri/src/lib.rs openless-all/app/src/lib/types.ts openless-all/app/src/lib/ipc.ts openless-all/app/src/i18n/zh-CN.ts openless-all/app/src/i18n/en.ts openless-all/app/src/pages/Settings.tsx -git commit -m "feat: show Windows TSF IME backend status" -``` - ---- - -### Task 11: End-to-End Windows Verification - -**Files:** -- Modify only files needed to fix defects found during verification. - -- [ ] **Step 1: Run full automated checks** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml --lib -cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml -cd openless-all/app -npm run build -.\scripts\windows-ime-build.ps1 -``` - -Expected: - -- Rust tests pass. -- Rust backend type-checks. -- Frontend build succeeds. -- `OpenLessIme.dll` builds. - -- [ ] **Step 2: Register the IME** - -Run: - -```powershell -.\openless-all\app\scripts\windows-ime-register.ps1 -``` - -Expected: script prints `[ok] OpenLess TSF IME registered for current user`. - -- [ ] **Step 3: Manual Notepad verification** - -1. Open Notepad. -2. Switch to Microsoft Pinyin. -3. Start OpenLess. -4. Press the configured voice hotkey to start recording. -5. Speak a short phrase. -6. Press the configured voice hotkey again to finish. - -Expected: - -- Input indicator briefly switches to OpenLess during the voice session. -- Final text appears at the Notepad caret. -- Input indicator returns to Microsoft Pinyin. -- Clipboard content is unchanged when TSF commit succeeds. - -- [ ] **Step 4: Manual browser verification** - -Repeat Step 3 in a browser text field. - -Expected: text appears in the focused browser field and input profile restores. - -- [ ] **Step 5: Manual VS Code verification** - -Repeat Step 3 in a VS Code editor tab. - -Expected: text appears at the editor caret and input profile restores. - -- [ ] **Step 6: Cancellation verification** - -1. Open Notepad with Microsoft Pinyin active. -2. Press the OpenLess voice hotkey to start. -3. Cancel during recording or processing using the existing cancel path. - -Expected: - -- No text is inserted. -- Input profile returns to Microsoft Pinyin. -- Clipboard content is unchanged. - -- [ ] **Step 7: Fallback verification** - -Unregister the IME: - -```powershell -.\openless-all\app\scripts\windows-ime-unregister.ps1 -``` - -Run a normal voice session in Notepad. - -Expected: - -- Voice input still inserts through the existing Windows fallback path. -- Settings reports the TSF backend as not installed. -- User text is not lost. - -- [ ] **Step 8: Final verification review** - -Run: - -```powershell -git status --short -git diff -- openless-all/app/src-tauri openless-all/app/windows-ime openless-all/app/scripts openless-all/app/src docs/superpowers/plans/2026-05-01-windows-temporary-tsf-ime.md -``` - -Expected: every remaining diff is tied to the TSF IME implementation or a verification fix discovered in this task. If no code changed during verification, leave the branch without an extra commit. If verification changed code, stage the exact files shown by `git status --short` that are tied to this TSF IME work and commit with: - -```powershell -git commit -m "fix: harden Windows TSF IME verification path" -``` - ---- - -## Self-Review Checklist - -- The plan covers TSF profile activation, final-text IPC, TSF edit-session commit, restore on success/failure/cancel, fallback behavior, settings status, registration scripts, and manual verification. -- The plan keeps ASR, polish, recorder, and UI ownership in the Tauri/Rust app. -- The plan keeps third-party Chinese IME behavior by restoring the user's previous input profile after each voice session. -- The plan preserves the existing Windows `WM_PASTE` fallback. -- The plan avoids putting network, ASR, LLM, or Tauri UI inside the IME DLL. diff --git a/docs/superpowers/plans/2026-05-06-windows-local-asr.md b/docs/superpowers/plans/2026-05-06-windows-local-asr.md deleted file mode 100644 index 01785022..00000000 --- a/docs/superpowers/plans/2026-05-06-windows-local-asr.md +++ /dev/null @@ -1,1396 +0,0 @@ -# Windows Local ASR Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Add a Windows-only `foundry-local-whisper` ASR provider so new Windows users can dictate through OpenLess without external ASR keys or Windows Win+H Voice Typing. - -**Architecture:** Keep `coordinator::Coordinator` as the single owner of dictation state. Add a Windows Foundry Local Whisper provider that buffers existing recorder PCM, transcribes it locally, then returns `RawTranscript` into the existing polish, Windows TSF IME insertion, and history pipeline. - -**Tech Stack:** Tauri 2, Rust, React/TypeScript, Foundry Local Rust SDK, reqwest multipart REST call to local `/v1/audio/transcriptions`, existing Windows TSF IME backend. - ---- - -## File Map - -- Modify `openless-all/app/src-tauri/Cargo.toml`: add Windows-only Foundry Local SDK dependency after a compile probe. -- Create `openless-all/app/src-tauri/src/asr/wav.rs`: shared WAV encoder for Whisper HTTP and Foundry Local. -- Modify `openless-all/app/src-tauri/src/asr/mod.rs`: export `wav` and Windows Foundry Local modules. -- Modify `openless-all/app/src-tauri/src/asr/whisper.rs`: use the shared WAV encoder. -- Create `openless-all/app/src-tauri/src/asr/local/foundry.rs`: provider id, model registry, runtime status structs, and Windows runtime/proxy exports. -- Create `openless-all/app/src-tauri/src/asr/local/foundry_runtime.rs`: Windows-only Foundry Local SDK wrapper for model status, download, load, endpoint discovery, and local transcription. -- Create `openless-all/app/src-tauri/src/asr/local/foundry_provider.rs`: `FoundryLocalWhisperAsr` implementing `AudioConsumer` and producing `RawTranscript`. -- Modify `openless-all/app/src-tauri/src/asr/local/mod.rs`: keep Qwen3 macOS exports and add Foundry Whisper exports. -- Modify `openless-all/app/src-tauri/src/types.rs`: add Windows local ASR preferences and Windows default provider. -- Modify `openless-all/app/src-tauri/src/persistence.rs`: align credentials active ASR default with Windows local ASR for new installs. -- Modify `openless-all/app/src-tauri/src/commands.rs`: expose Foundry Local settings/status/download/test commands and ASR credential status. -- Modify `openless-all/app/src-tauri/src/lib.rs`: manage a shared Foundry Local runtime and register commands. -- Modify `openless-all/app/src-tauri/src/coordinator.rs`: add `ActiveAsr::FoundryLocalWhisper`, provider startup, transcribe branch, timeout, cancel, and preload/release hooks. -- Modify `openless-all/app/src/lib/localAsr.ts`: add Foundry Local IPC types and wrapper functions. -- Modify `openless-all/app/src/lib/types.ts` and `openless-all/app/src/lib/ipc.ts`: add preferences/mock defaults. -- Modify `openless-all/app/src/pages/Settings.tsx`: add `foundry-local-whisper` provider preset and local ASR hint behavior. -- Modify `openless-all/app/src/pages/LocalAsr.tsx`: show Windows Foundry Local model/runtime controls alongside macOS Qwen3. -- Modify `openless-all/app/src/i18n/zh-CN.ts` and `openless-all/app/src/i18n/en.ts`: add user-facing strings. -- Modify `openless-all/app/scripts/windows-real-asr-insertion-smoke.ps1`: add a local ASR mode that does not require Volcengine credentials. - -## Implementation Tasks - -### Task 1: Shared WAV Encoder - -**Files:** -- Create: `openless-all/app/src-tauri/src/asr/wav.rs` -- Modify: `openless-all/app/src-tauri/src/asr/mod.rs` -- Modify: `openless-all/app/src-tauri/src/asr/whisper.rs` - -- [ ] **Step 1: Write the shared WAV encoder tests** - -Add this file: - -```rust -//! WAV helpers for ASR providers that accept complete audio files. - -/// Encode 16 kHz / mono / 16-bit little-endian PCM as a RIFF WAV file. -pub fn encode_wav_16k_mono(pcm: &[u8]) -> Vec { - let sample_rate: u32 = 16_000; - let num_channels: u16 = 1; - let bits_per_sample: u16 = 16; - let byte_rate = sample_rate * num_channels as u32 * (bits_per_sample as u32 / 8); - let block_align = num_channels * (bits_per_sample / 8); - let data_size = pcm.len() as u32; - let chunk_size = 36 + data_size; - - let mut wav = Vec::with_capacity(44 + pcm.len()); - wav.extend_from_slice(b"RIFF"); - wav.extend_from_slice(&chunk_size.to_le_bytes()); - wav.extend_from_slice(b"WAVE"); - wav.extend_from_slice(b"fmt "); - wav.extend_from_slice(&16u32.to_le_bytes()); - wav.extend_from_slice(&1u16.to_le_bytes()); - wav.extend_from_slice(&num_channels.to_le_bytes()); - wav.extend_from_slice(&sample_rate.to_le_bytes()); - wav.extend_from_slice(&byte_rate.to_le_bytes()); - wav.extend_from_slice(&block_align.to_le_bytes()); - wav.extend_from_slice(&bits_per_sample.to_le_bytes()); - wav.extend_from_slice(b"data"); - wav.extend_from_slice(&data_size.to_le_bytes()); - wav.extend_from_slice(pcm); - wav -} - -#[cfg(test)] -mod tests { - use super::encode_wav_16k_mono; - - #[test] - fn wav_header_matches_16k_mono_pcm() { - let pcm = [0x01, 0x00, 0xff, 0x7f]; - let wav = encode_wav_16k_mono(&pcm); - - assert_eq!(&wav[0..4], b"RIFF"); - assert_eq!(u32::from_le_bytes(wav[4..8].try_into().unwrap()), 40); - assert_eq!(&wav[8..12], b"WAVE"); - assert_eq!(&wav[12..16], b"fmt "); - assert_eq!(u16::from_le_bytes(wav[20..22].try_into().unwrap()), 1); - assert_eq!(u16::from_le_bytes(wav[22..24].try_into().unwrap()), 1); - assert_eq!(u32::from_le_bytes(wav[24..28].try_into().unwrap()), 16_000); - assert_eq!(u16::from_le_bytes(wav[34..36].try_into().unwrap()), 16); - assert_eq!(&wav[36..40], b"data"); - assert_eq!(u32::from_le_bytes(wav[40..44].try_into().unwrap()), 4); - assert_eq!(&wav[44..], &pcm); - } -} -``` - -- [ ] **Step 2: Run the new unit test and verify the module is not wired yet** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml wav_header_matches_16k_mono_pcm -``` - -Expected: FAIL with an unresolved module only if `wav.rs` has not been registered yet. - -- [ ] **Step 3: Register the module and replace Whisper's private encoder** - -In `openless-all/app/src-tauri/src/asr/mod.rs`, add: - -```rust -pub mod wav; -``` - -In `openless-all/app/src-tauri/src/asr/whisper.rs`, add: - -```rust -use crate::asr::wav::encode_wav_16k_mono; -``` - -Then remove the private `fn encode_wav_16k_mono(pcm: &[u8]) -> Vec` from the bottom of `whisper.rs`. - -- [ ] **Step 4: Run the WAV test** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml wav_header_matches_16k_mono_pcm -``` - -Expected: PASS. - -- [ ] **Step 5: Commit** - -```powershell -git add -- openless-all/app/src-tauri/src/asr/mod.rs openless-all/app/src-tauri/src/asr/whisper.rs openless-all/app/src-tauri/src/asr/wav.rs -git commit -m "refactor(asr): share wav encoding" -``` - -### Task 2: Provider Constants, Preferences, and Defaults - -**Files:** -- Create: `openless-all/app/src-tauri/src/asr/local/foundry.rs` -- Modify: `openless-all/app/src-tauri/src/asr/local/mod.rs` -- Modify: `openless-all/app/src-tauri/src/types.rs` -- Modify: `openless-all/app/src-tauri/src/persistence.rs` -- Modify: `openless-all/app/src/lib/types.ts` -- Modify: `openless-all/app/src/lib/ipc.ts` - -- [ ] **Step 1: Add provider constants and model registry** - -Create `openless-all/app/src-tauri/src/asr/local/foundry.rs`: - -```rust -use serde::Serialize; - -pub const PROVIDER_ID: &str = "foundry-local-whisper"; -pub const DEFAULT_MODEL_ALIAS: &str = "whisper-small"; - -#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize)] -#[serde(rename_all = "camelCase")] -pub struct FoundryWhisperModel { - pub alias: &'static str, - pub display_name: &'static str, - pub quality_tier: &'static str, -} - -pub const MODELS: &[FoundryWhisperModel] = &[ - FoundryWhisperModel { - alias: "whisper-small", - display_name: "Whisper Small", - quality_tier: "balanced", - }, - FoundryWhisperModel { - alias: "whisper-base", - display_name: "Whisper Base", - quality_tier: "low-resource", - }, - FoundryWhisperModel { - alias: "whisper-tiny", - display_name: "Whisper Tiny", - quality_tier: "smoke-test", - }, -]; - -pub fn is_foundry_local_whisper(id: &str) -> bool { - id == PROVIDER_ID -} - -pub fn model_alias_is_known(alias: &str) -> bool { - MODELS.iter().any(|model| model.alias == alias) -} - -pub fn default_language_hint() -> Option { - None -} - -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn provider_id_is_stable() { - assert!(is_foundry_local_whisper("foundry-local-whisper")); - assert!(!is_foundry_local_whisper("local-qwen3")); - } - - #[test] - fn default_model_is_registered() { - assert!(model_alias_is_known(DEFAULT_MODEL_ALIAS)); - } -} -``` - -- [ ] **Step 2: Export the Foundry module** - -In `openless-all/app/src-tauri/src/asr/local/mod.rs`, add: - -```rust -pub mod foundry; -``` - -- [ ] **Step 3: Add Rust preferences** - -In `openless-all/app/src-tauri/src/types.rs`, add fields to `UserPreferences` after `local_asr_keep_loaded_secs`: - -```rust -/// Windows Foundry Local Whisper 当前激活的模型 alias。 -#[serde(default = "default_foundry_local_asr_model")] -pub foundry_local_asr_model: String, -/// Windows Foundry Local Whisper 语言 hint。空串 = 自动检测。 -#[serde(default)] -pub foundry_local_asr_language_hint: String, -/// Windows Foundry Local Whisper 模型在 runtime 中保持加载多久。 -#[serde(default = "default_local_asr_keep_loaded_secs")] -pub foundry_local_asr_keep_loaded_secs: u32, -``` - -Add the default helper: - -```rust -fn default_foundry_local_asr_model() -> String { - crate::asr::local::foundry::DEFAULT_MODEL_ALIAS.into() -} -``` - -Update `impl Default for UserPreferences`: - -```rust -active_asr_provider: default_active_asr_provider(), -foundry_local_asr_model: default_foundry_local_asr_model(), -foundry_local_asr_language_hint: String::new(), -foundry_local_asr_keep_loaded_secs: default_local_asr_keep_loaded_secs(), -``` - -Add this helper near the existing preference defaults: - -```rust -fn default_active_asr_provider() -> String { - #[cfg(target_os = "windows")] - { - return crate::asr::local::foundry::PROVIDER_ID.into(); - } - #[cfg(not(target_os = "windows"))] - { - "volcengine".into() - } -} -``` - -- [ ] **Step 4: Align credentials active ASR default** - -In `openless-all/app/src-tauri/src/persistence.rs`, replace `creds_default_asr()` with: - -```rust -fn creds_default_asr() -> String { - #[cfg(target_os = "windows")] - { - return crate::asr::local::foundry::PROVIDER_ID.into(); - } - #[cfg(not(target_os = "windows"))] - { - "volcengine".into() - } -} -``` - -- [ ] **Step 5: Add TypeScript preference fields** - -In `openless-all/app/src/lib/types.ts`, add: - -```ts - foundryLocalAsrModel: string; - foundryLocalAsrLanguageHint: string; - foundryLocalAsrKeepLoadedSecs: number; -``` - -In `openless-all/app/src/lib/ipc.ts`, update mock defaults: - -```ts - activeAsrProvider: 'foundry-local-whisper', - foundryLocalAsrModel: 'whisper-small', - foundryLocalAsrLanguageHint: '', - foundryLocalAsrKeepLoadedSecs: 300, -``` - -- [ ] **Step 6: Run default and provider tests** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml provider_id_is_stable default_model_is_registered -npm --prefix openless-all/app run build -``` - -Expected: Rust tests PASS; TypeScript build PASS. - -- [ ] **Step 7: Commit** - -```powershell -git add -- openless-all/app/src-tauri/src/asr/local/foundry.rs openless-all/app/src-tauri/src/asr/local/mod.rs openless-all/app/src-tauri/src/types.rs openless-all/app/src-tauri/src/persistence.rs openless-all/app/src/lib/types.ts openless-all/app/src/lib/ipc.ts -git commit -m "feat(asr): add Foundry local provider defaults" -``` - -### Task 3: Foundry Runtime Compile Probe - -**Files:** -- Modify: `openless-all/app/src-tauri/Cargo.toml` -- Create: `openless-all/app/src-tauri/src/asr/local/foundry_runtime.rs` -- Modify: `openless-all/app/src-tauri/src/asr/local/foundry.rs` -- Modify: `openless-all/app/src-tauri/src/asr/local/mod.rs` - -- [ ] **Step 1: Add the official Windows SDK dependency** - -Run: - -```powershell -cd openless-all/app/src-tauri -cargo add foundry-local-sdk --features winml --target 'cfg(target_os = "windows")' -``` - -Expected: `Cargo.toml` gains a Windows-only `foundry-local-sdk` dependency and `Cargo.lock` is updated. - -- [ ] **Step 2: Add runtime status types** - -Append to `openless-all/app/src-tauri/src/asr/local/foundry.rs`: - -```rust -#[derive(Debug, Clone, Serialize)] -#[serde(rename_all = "camelCase")] -pub struct FoundryRuntimeStatus { - pub provider_id: String, - pub available: bool, - pub active_model: String, - pub loaded_model_id: Option, - pub endpoint: Option, - pub error: Option, -} - -impl FoundryRuntimeStatus { - pub fn unavailable(active_model: String, error: impl Into) -> Self { - Self { - provider_id: PROVIDER_ID.into(), - available: false, - active_model, - loaded_model_id: None, - endpoint: None, - error: Some(error.into()), - } - } -} -``` - -- [ ] **Step 3: Add the minimal Windows runtime wrapper** - -Create `openless-all/app/src-tauri/src/asr/local/foundry_runtime.rs`: - -```rust -#[cfg(target_os = "windows")] -mod imp { - use anyhow::{Context, Result}; - use parking_lot::Mutex; - - use super::super::foundry::{FoundryRuntimeStatus, PROVIDER_ID}; - use foundry_local_sdk::{FoundryLocalConfig, FoundryLocalManager}; - - #[derive(Debug, Clone)] - struct LoadedModel { - alias: String, - model_id: String, - endpoint: String, - } - - pub struct FoundryLocalRuntime { - loaded: Mutex>, - } - - impl Default for FoundryLocalRuntime { - fn default() -> Self { - Self::new() - } - } - - impl FoundryLocalRuntime { - pub fn new() -> Self { - Self { - loaded: Mutex::new(None), - } - } - - pub fn status_snapshot(&self, active_model: &str) -> FoundryRuntimeStatus { - let loaded = self.loaded.lock().clone(); - FoundryRuntimeStatus { - provider_id: PROVIDER_ID.into(), - available: true, - active_model: active_model.to_string(), - loaded_model_id: loaded.as_ref().map(|model| model.model_id.clone()), - endpoint: loaded.as_ref().map(|model| model.endpoint.clone()), - error: None, - } - } - - pub async fn ensure_loaded(&self, alias: &str) -> Result<(String, String)> { - if let Some(loaded) = self.loaded.lock().as_ref() { - if loaded.alias == alias { - return Ok((loaded.model_id.clone(), loaded.endpoint.clone())); - } - } - - let manager = - FoundryLocalManager::create(FoundryLocalConfig::new("openless")) - .context("initialize Foundry Local manager")?; - manager - .download_and_register_eps_with_progress(None, |_ep, _percent| {}) - .await - .context("download/register Foundry execution providers")?; - let model = manager - .catalog() - .get_model(alias) - .await - .with_context(|| format!("get Foundry model {alias}"))?; - if !model.is_cached().await.context("check Foundry model cache")? { - model.download(Some(|_percent| {})).await.context("download Foundry model")?; - } - model.load().await.context("load Foundry model")?; - manager.start_web_service().await.context("start Foundry web service")?; - let endpoint = manager - .urls() - .context("read Foundry web service urls")? - .first() - .cloned() - .context("Foundry web service returned no endpoint")?; - let model_id = model.id().to_string(); - - *self.loaded.lock() = Some(LoadedModel { - alias: alias.to_string(), - model_id: model_id.clone(), - endpoint: endpoint.clone(), - }); - Ok((model_id, endpoint)) - } - - pub fn release_now(&self) { - self.loaded.lock().take(); - } - } -} - -#[cfg(target_os = "windows")] -pub use imp::FoundryLocalRuntime; - -#[cfg(not(target_os = "windows"))] -pub struct FoundryLocalRuntime; - -#[cfg(not(target_os = "windows"))] -impl FoundryLocalRuntime { - pub fn new() -> Self { - Self - } - - pub fn status_snapshot( - &self, - active_model: &str, - ) -> super::foundry::FoundryRuntimeStatus { - super::foundry::FoundryRuntimeStatus::unavailable( - active_model.to_string(), - "Foundry Local Whisper is only available on Windows", - ) - } - - pub fn release_now(&self) {} -} -``` - -- [ ] **Step 4: Export the runtime** - -In `openless-all/app/src-tauri/src/asr/local/mod.rs`, add: - -```rust -pub mod foundry_runtime; -pub use foundry_runtime::FoundryLocalRuntime; -``` - -- [ ] **Step 5: Compile-check the SDK API** - -Run: - -```powershell -cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml -``` - -Expected: PASS. If the Foundry SDK names differ from Microsoft Learn, update only `foundry_runtime.rs` and rerun until this command passes before continuing. - -- [ ] **Step 6: Commit** - -```powershell -git add -- openless-all/app/src-tauri/Cargo.toml openless-all/app/src-tauri/Cargo.lock openless-all/app/src-tauri/src/asr/local/foundry.rs openless-all/app/src-tauri/src/asr/local/foundry_runtime.rs openless-all/app/src-tauri/src/asr/local/mod.rs -git commit -m "feat(asr): add Foundry local runtime wrapper" -``` - -### Task 4: Foundry Local Whisper Provider - -**Files:** -- Create: `openless-all/app/src-tauri/src/asr/local/foundry_provider.rs` -- Modify: `openless-all/app/src-tauri/src/asr/local/mod.rs` - -- [ ] **Step 1: Add provider with fakeable HTTP transcription** - -Create `openless-all/app/src-tauri/src/asr/local/foundry_provider.rs`: - -```rust -#[cfg(target_os = "windows")] -use std::sync::Arc; - -use anyhow::{Context, Result}; -use parking_lot::Mutex; - -use crate::asr::wav::encode_wav_16k_mono; -use crate::asr::RawTranscript; - -#[cfg(target_os = "windows")] -use super::foundry_runtime::FoundryLocalRuntime; - -pub struct FoundryLocalWhisperAsr { - #[cfg(target_os = "windows")] - runtime: Arc, - model_alias: String, - language_hint: Option, - buffer: Mutex>, - client: reqwest::Client, -} - -impl FoundryLocalWhisperAsr { - #[cfg(target_os = "windows")] - pub fn new( - runtime: Arc, - model_alias: String, - language_hint: Option, - ) -> Self { - Self { - runtime, - model_alias, - language_hint, - buffer: Mutex::new(Vec::new()), - client: reqwest::Client::new(), - } - } - - pub async fn transcribe(&self) -> Result { - let pcm = self.buffer.lock().clone(); - if pcm.is_empty() { - return Ok(RawTranscript { - text: String::new(), - duration_ms: 0, - }); - } - let duration_ms = (pcm.len() as u64 / 2) * 1000 / 16_000; - let raw = self.transcribe_pcm(&pcm).await?; - self.buffer.lock().clear(); - Ok(RawTranscript { - text: raw.trim().to_string(), - duration_ms, - }) - } - - #[cfg(target_os = "windows")] - async fn transcribe_pcm(&self, pcm: &[u8]) -> Result { - let (model_id, endpoint) = self.runtime.ensure_loaded(&self.model_alias).await?; - self.post_transcription(&endpoint, &model_id, pcm).await - } - - #[cfg(not(target_os = "windows"))] - async fn transcribe_pcm(&self, _pcm: &[u8]) -> Result { - anyhow::bail!("Foundry Local Whisper is only available on Windows") - } - - async fn post_transcription( - &self, - endpoint: &str, - model_id: &str, - pcm: &[u8], - ) -> Result { - let wav = encode_wav_16k_mono(pcm); - let wav_part = reqwest::multipart::Part::bytes(wav) - .file_name("openless-foundry.wav") - .mime_str("audio/wav") - .context("set Foundry transcription MIME type")?; - let mut form = reqwest::multipart::Form::new() - .part("file", wav_part) - .text("model", model_id.to_string()) - .text("response_format", "json".to_string()); - if let Some(language) = self.language_hint.as_deref().filter(|s| !s.trim().is_empty()) { - form = form.text("language", language.to_string()); - } - let url = format!("{}/v1/audio/transcriptions", endpoint.trim_end_matches('/')); - let response = self - .client - .post(url) - .multipart(form) - .send() - .await - .context("Foundry Local transcription request failed")?; - if !response.status().is_success() { - let status = response.status(); - let body = response.text().await.unwrap_or_default(); - anyhow::bail!("Foundry Local transcription HTTP {status}: {body}"); - } - let json: serde_json::Value = response - .json() - .await - .context("parse Foundry Local transcription response")?; - Ok(json["text"].as_str().unwrap_or("").to_string()) - } - - pub fn cancel(&self) { - self.buffer.lock().clear(); - } -} - -impl crate::recorder::AudioConsumer for FoundryLocalWhisperAsr { - fn consume_pcm_chunk(&self, pcm: &[u8]) { - self.buffer.lock().extend_from_slice(pcm); - } -} -``` - -- [ ] **Step 2: Export the provider** - -In `openless-all/app/src-tauri/src/asr/local/mod.rs`, add: - -```rust -pub mod foundry_provider; -pub use foundry_provider::FoundryLocalWhisperAsr; -``` - -- [ ] **Step 3: Run cargo check** - -Run: - -```powershell -cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml -``` - -Expected: PASS. - -- [ ] **Step 4: Commit** - -```powershell -git add -- openless-all/app/src-tauri/src/asr/local/foundry_provider.rs openless-all/app/src-tauri/src/asr/local/mod.rs -git commit -m "feat(asr): add Foundry local Whisper provider" -``` - -### Task 5: Backend Commands and Runtime State - -**Files:** -- Modify: `openless-all/app/src-tauri/src/commands.rs` -- Modify: `openless-all/app/src-tauri/src/lib.rs` - -- [ ] **Step 1: Manage runtime in Tauri** - -In `openless-all/app/src-tauri/src/lib.rs`, after the local Qwen download manager: - -```rust -let foundry_local_runtime = Arc::new(asr::local::FoundryLocalRuntime::new()); -``` - -Add `.manage(foundry_local_runtime.clone())` to the Tauri builder. - -- [ ] **Step 2: Add command result type and status command** - -In `commands.rs`, import: - -```rust -use crate::asr::local::foundry::{ - model_alias_is_known, FoundryRuntimeStatus, DEFAULT_MODEL_ALIAS, - PROVIDER_ID as FOUNDRY_LOCAL_PROVIDER_ID, -}; -use crate::asr::local::FoundryLocalRuntime; -``` - -Add commands: - -```rust -#[tauri::command] -pub fn foundry_local_asr_status( - coord: CoordinatorState<'_>, - runtime: State<'_, Arc>, -) -> FoundryRuntimeStatus { - let prefs = coord.prefs().get(); - let active_model = if model_alias_is_known(&prefs.foundry_local_asr_model) { - prefs.foundry_local_asr_model - } else { - DEFAULT_MODEL_ALIAS.to_string() - }; - runtime.status_snapshot(&active_model) -} - -#[tauri::command] -pub fn foundry_local_asr_set_model( - coord: CoordinatorState<'_>, - model_alias: String, -) -> Result<(), String> { - if !model_alias_is_known(&model_alias) { - return Err(format!("unknown Foundry Whisper model alias: {model_alias}")); - } - let mut prefs = coord.prefs().get(); - prefs.foundry_local_asr_model = model_alias; - coord.prefs().set(prefs).map_err(|e| e.to_string()) -} - -#[tauri::command] -pub fn foundry_local_asr_set_language_hint( - coord: CoordinatorState<'_>, - language_hint: String, -) -> Result<(), String> { - let normalized = language_hint.trim().to_string(); - if !normalized.is_empty() - && (normalized.len() != 2 || !normalized.bytes().all(|b| b.is_ascii_lowercase())) - { - return Err("language hint must be empty or ISO 639-1 lowercase code".to_string()); - } - let mut prefs = coord.prefs().get(); - prefs.foundry_local_asr_language_hint = normalized; - coord.prefs().set(prefs).map_err(|e| e.to_string()) -} -``` - -- [ ] **Step 3: Make credential status treat Foundry as credential-free** - -In `asr_configured_for_provider`, add: - -```rust -if provider == FOUNDRY_LOCAL_PROVIDER_ID { - return true; -} -``` - -- [ ] **Step 4: Register commands** - -In `lib.rs` `invoke_handler`, add: - -```rust -commands::foundry_local_asr_status, -commands::foundry_local_asr_set_model, -commands::foundry_local_asr_set_language_hint, -``` - -- [ ] **Step 5: Add command tests** - -In `commands.rs` tests, add: - -```rust -#[test] -fn credentials_status_treats_foundry_local_asr_as_configured() { - assert!(asr_configured_for_provider( - crate::asr::local::foundry::PROVIDER_ID, - &CredentialsSnapshot::default() - )); -} -``` - -- [ ] **Step 6: Run tests and build** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml credentials_status_treats_foundry_local_asr_as_configured -cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml -``` - -Expected: PASS. - -- [ ] **Step 7: Commit** - -```powershell -git add -- openless-all/app/src-tauri/src/commands.rs openless-all/app/src-tauri/src/lib.rs -git commit -m "feat(asr): expose Foundry local ASR status" -``` - -### Task 6: Coordinator Integration - -**Files:** -- Modify: `openless-all/app/src-tauri/src/coordinator.rs` - -- [ ] **Step 1: Add runtime to `Inner`** - -Import Foundry types: - -```rust -#[cfg(target_os = "windows")] -use crate::asr::local::{foundry, FoundryLocalRuntime, FoundryLocalWhisperAsr}; -``` - -Add field to `Inner`: - -```rust -#[cfg(target_os = "windows")] -foundry_local_runtime: Arc, -``` - -Initialize it in `Coordinator::new()`: - -```rust -#[cfg(target_os = "windows")] -foundry_local_runtime: Arc::new(FoundryLocalRuntime::new()), -``` - -- [ ] **Step 2: Add active ASR variant** - -Add to `ActiveAsr`: - -```rust -#[cfg(target_os = "windows")] -FoundryLocalWhisper(Arc), -``` - -Update `cancel_active_asr`: - -```rust -#[cfg(target_os = "windows")] -ActiveAsr::FoundryLocalWhisper(local) => local.cancel(), -``` - -- [ ] **Step 3: Start Foundry local provider in `begin_session`** - -After `let active_asr = CredentialsVault::get_active_asr();`, add before Whisper-compatible branch: - -```rust -#[cfg(target_os = "windows")] -if foundry::is_foundry_local_whisper(&active_asr) { - let prefs = inner.prefs.get(); - let model_alias = if foundry::model_alias_is_known(&prefs.foundry_local_asr_model) { - prefs.foundry_local_asr_model.clone() - } else { - foundry::DEFAULT_MODEL_ALIAS.to_string() - }; - let language_hint = prefs - .foundry_local_asr_language_hint - .trim() - .to_string(); - let language_hint = if language_hint.is_empty() { - None - } else { - Some(language_hint) - }; - let local = Arc::new(FoundryLocalWhisperAsr::new( - Arc::clone(&inner.foundry_local_runtime), - model_alias, - language_hint, - )); - store_asr_for_session( - inner, - current_session_id, - ActiveAsr::FoundryLocalWhisper(Arc::clone(&local)), - ); - let consumer: Arc = local; - start_recorder_and_enter_listening(inner, current_session_id, &active_asr, consumer) - .await?; - return Ok(()); -} -``` - -- [ ] **Step 4: Transcribe Foundry local results in `end_session`** - -Add a match branch next to `ActiveAsr::Whisper`: - -```rust -#[cfg(target_os = "windows")] -ActiveAsr::FoundryLocalWhisper(local) => { - let timeout_duration = std::time::Duration::from_secs(COORDINATOR_GLOBAL_TIMEOUT_SECS); - match tokio::time::timeout(timeout_duration, local.transcribe()).await { - Ok(Ok(r)) => r, - Ok(Err(e)) => { - log::error!("[coord] Foundry Local Whisper transcribe failed: {e:#}"); - emit_capsule( - inner, - CapsuleState::Error, - 0.0, - elapsed, - Some(format!("本地识别失败: {e}")), - None, - ); - restore_prepared_windows_ime_session(inner, current_session_id); - inner.state.lock().phase = SessionPhase::Idle; - schedule_capsule_idle(inner, CAPSULE_AUTO_HIDE_DELAY_MS); - return Err(e.to_string()); - } - Err(_) => { - log::error!( - "[coord] Foundry Local Whisper 全局超时 {} 秒", - COORDINATOR_GLOBAL_TIMEOUT_SECS - ); - emit_capsule( - inner, - CapsuleState::Error, - 0.0, - elapsed, - Some("识别超时".to_string()), - None, - ); - restore_prepared_windows_ime_session(inner, current_session_id); - inner.state.lock().phase = SessionPhase::Idle; - schedule_capsule_idle(inner, CAPSULE_AUTO_HIDE_DELAY_MS); - return Err("foundry local global timeout".to_string()); - } - } -} -``` - -- [ ] **Step 5: Relax ASR credential gate** - -In `ensure_asr_credentials`, add before local Qwen3: - -```rust -#[cfg(target_os = "windows")] -if foundry::is_foundry_local_whisper(&active_asr) { - return Ok(()); -} -``` - -- [ ] **Step 6: Add coordinator tests for fallback routing** - -Add tests in `coordinator.rs` tests: - -```rust -#[test] -fn foundry_local_provider_is_not_whisper_compatible_cloud_provider() { - assert!(!is_whisper_compatible_provider( - crate::asr::local::foundry::PROVIDER_ID - )); -} -``` - -- [ ] **Step 7: Run backend checks** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml foundry_local_provider_is_not_whisper_compatible_cloud_provider -cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml -``` - -Expected: PASS. - -- [ ] **Step 8: Commit** - -```powershell -git add -- openless-all/app/src-tauri/src/coordinator.rs -git commit -m "feat(asr): route dictation through Foundry local Whisper" -``` - -### Task 7: Frontend IPC and Settings Provider - -**Files:** -- Modify: `openless-all/app/src/lib/localAsr.ts` -- Modify: `openless-all/app/src/pages/Settings.tsx` -- Modify: `openless-all/app/src/i18n/zh-CN.ts` -- Modify: `openless-all/app/src/i18n/en.ts` - -- [ ] **Step 1: Add TypeScript IPC wrappers** - -In `openless-all/app/src/lib/localAsr.ts`, add: - -```ts -export interface FoundryLocalAsrStatus { - providerId: string; - available: boolean; - activeModel: string; - loadedModelId: string | null; - endpoint: string | null; - error: string | null; -} - -export function getFoundryLocalAsrStatus(): Promise { - return invokeOrMock('foundry_local_asr_status', undefined, () => ({ - providerId: 'foundry-local-whisper', - available: true, - activeModel: 'whisper-small', - loadedModelId: null, - endpoint: null, - error: null, - })); -} - -export function setFoundryLocalAsrModel(modelAlias: string): Promise { - return invokeOrMock('foundry_local_asr_set_model', { modelAlias }, () => undefined); -} - -export function setFoundryLocalAsrLanguageHint(languageHint: string): Promise { - return invokeOrMock( - 'foundry_local_asr_set_language_hint', - { languageHint }, - () => undefined, - ); -} -``` - -- [ ] **Step 2: Add provider preset** - -In `Settings.tsx`, add to `ASR_PRESETS` before `local-qwen3`: - -```ts -{ id: 'foundry-local-whisper', nameKey: 'asrFoundryLocalWhisper', baseUrl: '', model: '' }, -``` - -Update the union type automatically via `as const`. - -- [ ] **Step 3: Render local provider hint** - -Change: - -```tsx -) : committedAsrProvider === 'local-qwen3' ? ( - -) : ( -``` - -to: - -```tsx -) : committedAsrProvider === 'local-qwen3' || committedAsrProvider === 'foundry-local-whisper' ? ( - -) : ( -``` - -Change `LocalAsrProviderHint` signature: - -```tsx -function LocalAsrProviderHint({ provider }: { provider: 'local-qwen3' | 'foundry-local-whisper' }) { -``` - -Use provider-specific text: - -```tsx -const hintKey = provider === 'foundry-local-whisper' - ? 'settings.providers.foundryLocalAsrHint' - : 'settings.providers.localAsrHint'; -``` - -- [ ] **Step 4: Add i18n strings** - -In `zh-CN.ts` under `settings.providers.presets`: - -```ts -asrFoundryLocalWhisper: '本地 Whisper(Foundry Local)', -``` - -Under `settings.providers`: - -```ts -foundryLocalAsrHint: 'Windows 本地 Whisper 在本机运行,无需 ASR API Key。首次使用需下载 Foundry Local 运行组件和 Whisper 模型;LLM 润色仍按你配置的模型供应商调用。', -``` - -In `en.ts` add: - -```ts -asrFoundryLocalWhisper: 'Local Whisper (Foundry Local)', -foundryLocalAsrHint: 'Windows local Whisper runs on this device and does not need an ASR API key. First use downloads Foundry Local runtime components and a Whisper model; LLM polishing still uses your configured LLM provider.', -``` - -- [ ] **Step 5: Build frontend** - -Run: - -```powershell -npm --prefix openless-all/app run build -``` - -Expected: PASS. - -- [ ] **Step 6: Commit** - -```powershell -git add -- openless-all/app/src/lib/localAsr.ts openless-all/app/src/pages/Settings.tsx openless-all/app/src/i18n/zh-CN.ts openless-all/app/src/i18n/en.ts -git commit -m "feat(ui): add Foundry local ASR provider" -``` - -### Task 8: Local ASR Page for Windows Foundry Models - -**Files:** -- Modify: `openless-all/app/src/pages/LocalAsr.tsx` -- Modify: `openless-all/app/src/i18n/zh-CN.ts` -- Modify: `openless-all/app/src/i18n/en.ts` - -- [ ] **Step 1: Load Foundry status on Local ASR page** - -In `LocalAsr.tsx`, import: - -```ts -getFoundryLocalAsrStatus, -setFoundryLocalAsrModel, -setFoundryLocalAsrLanguageHint, -type FoundryLocalAsrStatus, -``` - -Add state: - -```ts -const [foundryStatus, setFoundryStatus] = useState(null); -``` - -Add refresh function: - -```ts -const refreshFoundryStatus = async () => { - try { - const status = await getFoundryLocalAsrStatus(); - setFoundryStatus(status); - } catch (err) { - console.warn('[localAsr] Foundry status query failed', err); - } -}; -``` - -Call it inside `refresh()`: - -```ts -void refreshFoundryStatus(); -``` - -- [ ] **Step 2: Add Windows Foundry model controls** - -Add this block after the top page header: - -```tsx - -
-
-
- {t('localAsr.foundryTitle')} -
-
- {t('localAsr.foundryDesc')} -
-
- - {foundryStatus?.available ? t('localAsr.runtimeReady') : t('localAsr.runtimeUnavailable')} - -
-
- - -
- {foundryStatus?.error && ( -
- {foundryStatus.error} -
- )} -
-``` - -- [ ] **Step 3: Add i18n strings** - -In `zh-CN.ts` under `localAsr`: - -```ts -foundryTitle: 'Windows 本地 Whisper', -foundryDesc: '使用 Microsoft Foundry Local 在本机转写语音。无需 ASR API Key;首次使用会准备运行组件和 Whisper 模型。', -runtimeReady: '运行时可用', -runtimeUnavailable: '运行时不可用', -foundryModelLabel: 'Whisper 模型', -languageHintLabel: '识别语言', -languageAuto: '自动检测', -languageZh: '优先中文', -languageEn: '优先英文', -``` - -Add matching English strings in `en.ts`. - -- [ ] **Step 4: Build frontend** - -Run: - -```powershell -npm --prefix openless-all/app run build -``` - -Expected: PASS. - -- [ ] **Step 5: Commit** - -```powershell -git add -- openless-all/app/src/pages/LocalAsr.tsx openless-all/app/src/i18n/zh-CN.ts openless-all/app/src/i18n/en.ts -git commit -m "feat(ui): manage Windows local Whisper" -``` - -### Task 9: Windows Smoke Script Local ASR Mode - -**Files:** -- Modify: `openless-all/app/scripts/windows-real-asr-insertion-smoke.ps1` - -- [ ] **Step 1: Add ASR mode parameter** - -Add parameter: - -```powershell -[ValidateSet("volcengine", "foundry-local-whisper")] -[string]$AsrProvider = "volcengine", -``` - -- [ ] **Step 2: Write active ASR preference for smoke** - -In `Set-HoldHotkeyPreference`, replace the active ASR default line with: - -```powershell -if ($null -eq $prefs.activeAsrProvider) { - $prefs | Add-Member -NotePropertyName activeAsrProvider -NotePropertyValue $AsrProvider -} else { - $prefs.activeAsrProvider = $AsrProvider -} -``` - -- [ ] **Step 3: Skip Volcengine credential requirement for local ASR** - -Replace: - -```powershell -if ($RequireJsonCredentials -and (-not $credentialStatus.VolcengineConfigured -or -not $credentialStatus.ArkConfigured)) { - throw "Real ASR regression requires configured Volcengine ASR and Ark LLM credentials." -} -``` - -with: - -```powershell -if ($RequireJsonCredentials -and $AsrProvider -eq "volcengine" -and (-not $credentialStatus.VolcengineConfigured -or -not $credentialStatus.ArkConfigured)) { - throw "Real ASR regression requires configured Volcengine ASR and Ark LLM credentials." -} -if ($RequireJsonCredentials -and $AsrProvider -eq "foundry-local-whisper" -and (-not $credentialStatus.ArkConfigured)) { - Write-Warning "Ark LLM credentials are not configured; local ASR smoke will accept raw transcript fallback." -} -``` - -- [ ] **Step 4: Add no Win+H log assertion** - -After history verification, add: - -```powershell -$logText = Get-Content -Raw $logPath -if ($logText -match "Win\\+H|Voice Typing|Windows\\.Media\\.SpeechRecognition|SAPI") { - throw "Unexpected Windows system dictation path appeared in OpenLess log." -} -``` - -- [ ] **Step 5: Run script syntax check** - -Run: - -```powershell -powershell -NoProfile -ExecutionPolicy Bypass -Command "$null = [scriptblock]::Create((Get-Content -Raw '.\openless-all\app\scripts\windows-real-asr-insertion-smoke.ps1')); 'ok'" -``` - -Expected: prints `ok`. - -- [ ] **Step 6: Commit** - -```powershell -git add -- openless-all/app/scripts/windows-real-asr-insertion-smoke.ps1 -git commit -m "test(windows): add local ASR smoke mode" -``` - -### Task 10: End-to-End Verification - -**Files:** -- No code changes unless a verification step exposes a bug. - -- [ ] **Step 1: Run backend unit and type checks** - -Run: - -```powershell -cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml -cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml -``` - -Expected: PASS. - -- [ ] **Step 2: Run frontend build** - -Run: - -```powershell -npm --prefix openless-all/app run build -``` - -Expected: PASS. - -- [ ] **Step 3: Run no Win+H source search** - -Run: - -```powershell -rg -n "Win\\+H|Voice Typing|Windows\\.Media\\.SpeechRecognition|SAPI|SendInput.*H" openless-all/app/src-tauri/src openless-all/app/windows-ime openless-all/app/src -``` - -Expected: no matches except documentation or explicit negative test strings. - -- [ ] **Step 4: Run local ASR smoke on Windows** - -Run after building a Windows executable: - -```powershell -powershell -ExecutionPolicy Bypass -File .\openless-all\app\scripts\windows-real-asr-insertion-smoke.ps1 -AsrProvider foundry-local-whisper -Target notepad -ManualSpeech -AllowClipboardFallback -``` - -Expected: - -- OpenLess observes hotkey and starts session. -- No Windows Voice Typing panel appears. -- History receives a new item with non-empty `rawTranscript` and `finalText`. -- If Ark is not configured, `finalText` equals raw transcript or records polish fallback. -- Notepad receives the final text through TSF or permitted fallback. - -- [ ] **Step 5: Confirm verification did not create file changes** - -Run: - -```powershell -git status --short -``` - -Expected: no output. If a verification step exposed a code defect, stop this task and write a new focused fix task before continuing. - -## Self-Review - -Spec coverage: - -- No Win+H: Task 10 source search and smoke log assertion cover it. -- Existing interaction: Task 6 routes through `Coordinator`; no UI shortcut path bypasses recorder/capsule. -- Local transcript into polish/history: Task 6 returns `RawTranscript` before existing polish and history code. -- First-use UX: Tasks 7 and 8 expose provider and runtime/model state. -- Windows TSF insertion unchanged: Task 6 leaves `insert_with_windows_ime_first` intact. -- Offline behavior after cache: Task 3 runtime caches loaded model state; Task 10 smoke can be repeated after model download. - -Placeholder scan: - -- This plan contains no unresolved placeholders or unspecified file paths. - -Type consistency: - -- Provider id is consistently `foundry-local-whisper`. -- Rust preference fields are `foundry_local_asr_model`, `foundry_local_asr_language_hint`, and `foundry_local_asr_keep_loaded_secs`. -- TypeScript preference fields use camelCase equivalents. diff --git a/docs/superpowers/specs/2026-05-01-windows-temporary-tsf-ime-design.md b/docs/superpowers/specs/2026-05-01-windows-temporary-tsf-ime-design.md deleted file mode 100644 index 5b482249..00000000 --- a/docs/superpowers/specs/2026-05-01-windows-temporary-tsf-ime-design.md +++ /dev/null @@ -1,143 +0,0 @@ -# Windows 临时激活式 TSF IME 设计 - -## 背景 - -OpenLess 当前 Windows 插入链路仍依赖剪贴板:先把最终文本写入剪贴板,再向焦点控件发送 `WM_PASTE`。这比模拟 `Ctrl+V` 更稳,但本质仍是粘贴工具,不是 Windows 输入法。 - -目标是在 Windows 上新增真正的 TSF 输入法后端,让语音结果通过系统文本输入框架提交,同时不破坏用户平时使用微软拼音、搜狗或英文键盘的手动输入体验。 - -## 目标 - -- OpenLess 在语音会话期间临时切换到 OpenLess TSF 输入法。 -- 录音、ASR、润色、胶囊 UI、历史保存继续由现有 Tauri/Rust 主程序负责。 -- OpenLess TSF IME DLL 只负责系统输入法身份、接收最终文本、通过 TSF 提交到当前文本上下文。 -- 提交、取消或失败后自动恢复会话开始前的输入法 profile。 -- 不要求用户手动切换输入法。 -- 不把第三方中文输入法代理进 OpenLess IME;用户平时中文手打仍使用原输入法。 - -## 非目标 - -- 不把录音、网络请求、ASR、LLM、Tauri UI 放进 IME DLL。 -- 不实现拼音候选、中文转换、词库或第三方 IME 代理。 -- 不移除现有 Windows `WM_PASTE` 路径;它保留为未安装 TSF IME、切换失败或提交失败时的回退路径。 -- 不承诺 UAC 安全桌面、管理员权限目标窗口、游戏、远程桌面或强隔离应用中的完整可用性。 - -## 架构 - -新增 Windows-only 输入层由三个部分组成: - -1. `OpenLess` 主程序:沿用现有 `Coordinator` 状态机。语音热键开始时记录当前输入 profile 并临时激活 OpenLess TSF profile;语音结束后把最终文本发送给 IME;会话收尾时恢复原 profile。 -2. `OpenLess TSF IME DLL`:COM in-proc text service,注册为 TSF input processor。它实现最小可用的激活、停用、编辑会话和文本提交能力,不持有产品业务状态。 -3. `OpenLess IME IPC`:本机 IPC 通道,连接主程序和当前被 TSF 加载的 IME 实例。主程序发送带 session id 的最终文本;IME 在可写 TSF context 中调用 `ITfInsertAtSelection::InsertTextAtSelection`。 - -TSF IME 使用官方 profile 注册路径,而不是手写默认输入法注册表项。安装阶段注册 COM in-proc server、TSF text service、language profile,并把 OpenLess profile 加入当前用户可用输入法列表。 - -## 会话时序 - -1. 用户按下当前 OpenLess 全局热键。 -2. `Coordinator` 从 `Idle` 进入录音启动流程。 -3. Windows 输入 profile 守护逻辑读取并保存当前活动 profile,包括键盘布局或 TSF input processor。 -4. 守护逻辑激活 OpenLess TSF profile,范围优先使用当前桌面 session。 -5. 用户说话,现有 recorder、ASR、polish 流程不变。 -6. 用户再次按热键结束录音;`Coordinator` 获得最终 polished text。 -7. 主程序通过 IPC 向 OpenLess IME 发送 `{ session_id, text }`。 -8. 当前焦点应用中的 OpenLess IME 实例在 TSF edit session 中提交文本。 -9. 主程序收到提交成功、超时或失败结果。 -10. 无论成功、取消还是失败,守护逻辑都尝试恢复第 3 步保存的输入 profile。 -11. `Coordinator` 按现有规则保存历史并更新胶囊状态。 - -## Profile 切换策略 - -会话开始时记录完整 active profile,而不是只记录语言 ID。记录内容至少包括: - -- profile type:keyboard layout 或 TSF input processor; -- language id; -- text service CLSID; -- profile GUID; -- HKL; -- 激活范围。 - -激活 OpenLess profile 时使用 TSF profile manager。若当前输入语言与 OpenLess profile 不一致,使用允许切换到指定 profile 的标志,避免因语言不匹配导致激活失败。 - -恢复时优先恢复原始 profile。若原始 profile 不再可用,记录 warning 并保持系统当前输入法,不再反复切换。恢复失败不阻塞历史保存。 - -## IPC 协议 - -MVP 使用本机低延迟 IPC,协议保持小而明确: - -- `SubmitText { session_id, text, created_at }` -- `SubmitResult { session_id, status, error_code }` -- `CancelSession { session_id }` -- `Ping` - -`session_id` 必须由现有 `DictationSession` 或 coordinator 会话生成,IME 只接受当前最新待提交 session,避免过期文本在焦点变化后落入错误应用。 - -IPC 超时策略: - -- 等待 IME 连接:短超时,失败后走现有 `WM_PASTE` 回退。 -- 等待提交结果:短超时,失败后恢复原 profile 并走回退或报 `CopiedFallback`。 -- 会话取消:发送 `CancelSession`,IME 丢弃待提交文本。 - -## 失败与恢复 - -必须把“用户文字不丢失”作为约束: - -- OpenLess profile 激活失败:不进入 TSF 提交流程,继续使用现有 Windows 插入后端。 -- IME DLL 未安装或未注册:设置页显示状态,语音输入仍可用但使用回退后端。 -- IPC 断开或超时:恢复原 profile,并使用现有 `WM_PASTE` 路径。 -- TSF 提交返回只读、无 selection、context disconnected 或 no lock:恢复原 profile,并使用现有回退路径。 -- 用户在 Processing 阶段取消:不提交文本,恢复原 profile。 -- OpenLess 主程序崩溃:下次启动检查是否存在“上次会话临时切换未恢复”标记;若存在,尝试恢复最近保存的 profile。 - -## 用户体验 - -平时用户继续使用原输入法。只有语音会话期间,系统输入指示器可能短暂切到 OpenLess。会话结束后自动回到原输入法。 - -设置页新增 Windows-only 输入后端状态: - -- TSF 输入法已安装并可用; -- TSF 输入法未安装; -- TSF 输入法注册异常; -- 当前使用剪贴板/`WM_PASTE` 回退。 - -默认行为保持保守:未安装 TSF IME 时,不改变现有插入体验。安装 TSF IME 后,Windows 平台优先使用临时激活式 TSF 后端。 - -## 文件与模块边界 - -计划新增或调整的主要区域: - -- `openless-all/app/src-tauri/src/insertion.rs`:保留现有回退后端,新增 Windows TSF 后端选择入口。 -- `openless-all/app/src-tauri/src/windows_ime_profile.rs`:封装 active profile 读取、OpenLess profile 激活、原 profile 恢复。 -- `openless-all/app/src-tauri/src/windows_ime_ipc.rs`:封装主程序到 IME 的 IPC。 -- `openless-all/app/windows-ime/`:新增 Windows-only TSF IME DLL 工程,包含 COM 注册、TSF text service、edit session、IPC 客户端。 -- `openless-all/app/scripts/`:新增 Windows IME 注册、注销、打包脚本。 -- `openless-all/app/src/lib/ipc.ts` 与设置页:暴露 Windows TSF 后端安装/健康状态。 - -Rust 业务模块仍遵守现有约束:叶子模块不互相调用;跨模块编排继续放在 `coordinator.rs`。 - -## 验证 - -自动验证: - -- Rust backend type check:`cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml` -- Windows IME 工程 build。 -- profile 记录/恢复逻辑单元测试。 -- IPC 协议编解码和过期 session 丢弃测试。 -- 前端构建:`npm run build` - -手动验证: - -- Notepad:微软拼音为当前输入法,按 OpenLess 热键录音,提交后文本进入光标位置,并自动回到微软拼音。 -- 浏览器文本框:同上。 -- VS Code 编辑器:同上。 -- 取消录音:不插入文本,并恢复原输入法。 -- 未安装 OpenLess IME:语音输入仍走现有回退路径。 -- 目标窗口不可写:不丢文本,恢复原输入法,并给出可理解状态。 - -## 参考 - -- Microsoft Learn: Custom Input Method Editor requirements -- Microsoft Learn: Text Services Framework -- Microsoft Learn: Text Service Registration -- Microsoft Learn: `ITfInputProcessorProfileMgr::ActivateProfile` -- Microsoft Learn: `ITfInsertAtSelection::InsertTextAtSelection` diff --git a/docs/superpowers/specs/2026-05-06-windows-local-asr-design.md b/docs/superpowers/specs/2026-05-06-windows-local-asr-design.md deleted file mode 100644 index 069db72c..00000000 --- a/docs/superpowers/specs/2026-05-06-windows-local-asr-design.md +++ /dev/null @@ -1,247 +0,0 @@ -# Windows 本地 ASR 设计 - -## 背景 - -OpenLess 的产品契约是:全局热键启动听写,胶囊显示录音状态,ASR 产出 transcript,现有 LLM provider 做润色、翻译或语义处理,再通过当前平台插入链路写回光标位置并保存历史。 - -Windows 新用户目前仍需要配置外部 ASR provider,才能完成真实听写。目标是在 Windows 上提供一个不依赖外部 ASR API Key 的本地识别方案,同时不调用 `Win+H`,不显示 Windows Voice Typing 系统面板,不绕开现有 polish、insert 和 history 流水线。 - -已确认的边界: - -- Windows `Win+H` / Voice Typing 是用户级系统功能,没有适合 OpenLess 嵌入并拿回 transcript 的公开 API。 -- `SendInput` 模拟 `Win+H` 只会打开系统面板,OpenLess 拿不到 transcript,也无法 polish 或写 history。 -- `Windows.Media.SpeechRecognition` 对普通 desktop app 的支持和授权路径不适合作为主线。 -- SAPI COM 可做 desktop dictation,但质量和现代体验不足以满足高品质目标。 - -## 官方资料核对 - -核对时间:2026-05-06。 - -Microsoft Learn 当前资料显示: - -- Foundry Local 是本地 AI runtime,支持 Windows、macOS Apple silicon 和 Linux,提供 C#、JavaScript、Rust、Python SDK;本地推理数据不离开设备,首次模型和执行 provider 下载仍需要网络。 -- Foundry Local catalog 覆盖 chat completion 和 audio transcription;音频转写示例明确使用 Whisper 模型。 -- Rust SDK 在 Windows 上使用 `foundry-local-sdk --features winml`,Windows 包集成 Windows ML runtime。 -- Rust native audio API 当前文档示例是:下载并 load Whisper 模型后 `model.create_audio_client()`,再调用 `audio_client.transcribe(file_path).await`。 -- Foundry Local 也能启动 OpenAI-compatible local REST service;REST endpoint `POST /v1/audio/transcriptions` 接收 multipart `file`、`model`,可选 `language`、`temperature`、`response_format`,返回 `text`。 -- REST service 的端口是动态分配,文档要求通过 SDK 暴露的 endpoint / urls 获取,不要硬编码。 -- CLI 是开发和管理辅助工具,不是应用集成主线;生产应用应使用 SDK 嵌入 runtime。 -- Foundry Local 仍是 preview,API、安装和分发方式可能变动。 - -主要来源: - -- https://learn.microsoft.com/en-us/azure/foundry-local/what-is-foundry-local -- https://learn.microsoft.com/en-us/azure/foundry-local/get-started -- https://learn.microsoft.com/en-us/azure/foundry-local/how-to/how-to-transcribe-audio -- https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-rest -- https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-sdk-current -- https://learn.microsoft.com/en-us/azure/foundry-local/how-to/how-to-use-foundry-local-cli -- https://learn.microsoft.com/en-us/azure/foundry-local/concepts/foundry-local-architecture - -## 目标 - -- Windows 新用户无需 Volcengine、Whisper HTTP、DashScope 等外部 ASR API Key,即可完成听写。 -- 不调用 `Win+H`,用户完全看不到 Windows Voice Typing 弹窗。 -- 现有交互不变:热键、OpenLess capsule、录音状态、转写、LLM polish / 翻译、插入、历史保存都走当前主流水线。 -- LLM polish 仍沿用用户配置的 OpenAI-compatible LLM provider;LLM 未配置或失败时插入原始 transcript。 -- 本地 ASR 缺 runtime / 模型时给出可操作引导,而不是静默失败。 -- 下载完成后可离线识别;首次模型 / execution provider 下载可以联网。 - -## 非目标 - -- 不把 Windows Voice Typing、SAPI 或系统听写面板嵌入 OpenLess。 -- 不在本阶段把 LLM polish 也改成本地模型;本设计只解决 ASR。 -- 不把大型模型直接打进默认 Windows 安装包,除非后续逐项确认模型 license、再分发条款、安装包体积和 updater 影响。 -- 不重写 Windows TSF IME 插入链路。 -- 不保证所有隔离目标窗口都能 TSF 上屏;现有 TSF / Unicode / clipboard fallback 策略继续负责插入可用性。 - -## 现有系统切入点 - -主听写状态机集中在 `openless-all/app/src-tauri/src/coordinator.rs`: - -- `ActiveAsr` 当前有 `Volcengine`、`Whisper`,以及 macOS-only `Local`。 -- `begin_session` 从 `CredentialsVault::get_active_asr()` 读取 active provider,再分流到 local Qwen3、OpenAI-compatible Whisper 或 Volcengine。 -- `end_session` 统一取得 `RawTranscript` 后,继续走 `polish_or_passthrough` / `translate_or_passthrough`、Windows TSF-first 插入和 history append。 -- `ensure_asr_credentials` 是录音前的 provider gate;本地 ASR 需要在这里改成“无需云凭据,但需要 runtime / model ready”。 -- `is_whisper_compatible_provider` 只覆盖云端 OpenAI-compatible `/audio/transcriptions` provider;Foundry Local 不应塞进这里,因为它需要 runtime / model lifecycle。 - -现有本地 ASR 模块在 `openless-all/app/src-tauri/src/asr/local/`: - -- provider id 是 `local-qwen3`,模型枚举是 `qwen3-asr-0.6b` / `qwen3-asr-1.7b`。 -- `LocalAsrCache` 目前只在 macOS 持有 `QwenAsrEngine`。 -- 下载页和 IPC 命令已覆盖 model status、下载、删除、test、preload、release,但 UI 文案和目录语义都强绑定 Qwen3-ASR。 -- Windows 端 `engine_available` 当前为 false,设置页提示“仅 macOS 已支持”。 - -Windows 插入链路已经满足本需求: - -- 会话开始时 `prepare_session()` 捕获当前输入法 profile 并临时激活 OpenLess TSF。 -- 会话结束时 `insert_with_windows_ime_first()` 通过 named pipe 把最终文本提交给 TSF DLL。 -- TSF DLL 在目标应用内调用 `ITfInsertAtSelection::InsertTextAtSelection`。 -- TSF 失败后按用户偏好走 Unicode `SendInput` 或 clipboard fallback。 - -## 推荐方案 - -新增 Windows-only provider:`foundry-local-whisper`。 - -实现上分两层: - -1. `FoundryLocalWhisperAsr`:形状对齐 `WhisperBatchASR` 和 `LocalQwenAsr`,实现 `AudioConsumer`,录音阶段 buffer 16 kHz mono i16 PCM,stop 后编码 WAV 并调用 Foundry Local。 -2. `FoundryLocalRuntime`:封装 Foundry Local SDK 的初始化、catalog 查询、execution provider 下载、模型下载、模型加载、endpoint 获取和卸载 / keep-loaded 管理。 - -MVP 调用路径建议先用 Foundry Local SDK 启动 local REST service,再调用 `/v1/audio/transcriptions`。原因: - -- OpenLess 已经有成熟的 multipart WAV 转写路径。 -- REST API 文档明确支持 `language` 参数,便于后续中文 / 中英混输策略调优。 -- SDK 仍负责动态端口、模型下载和加载,避免硬编码本地服务地址。 -- 后续如果 Rust native audio client 提供足够参数和稳定 API,可以把 REST 调用替换为纯 native audio client。 - -## Provider 与模型命名 - -新增 id: - -- `foundry-local-whisper`:Windows 主线本地 ASR。 - -模型别名: - -- 默认:`whisper-small`。 -- 低配选项:`whisper-base`。 -- 调试选项:`whisper-tiny`。 - -默认不强制 `language=zh`。中英混输时让 Whisper 自动检测更稳,避免英文产品名、代码词或中英夹杂被错误归入单一中文模式。后续可在高级设置里增加“优先中文识别”,仅用户明确选择时传 `language=zh`。 - -不要把 `foundry-local-whisper` 混入现有 `local-qwen3` provider。两者模型来源、runtime、平台支持和下载语义不同,应共享“本地 ASR 管理”页面的外壳,但后端 provider 和模型 registry 要分开。 - -## 会话时序 - -1. 用户按当前 OpenLess 全局热键。 -2. `Coordinator` 进入 `Starting`,Windows 侧准备 TSF IME session。 -3. `ensure_asr_credentials` 识别 active provider 是 `foundry-local-whisper`: - - runtime 可用且模型已缓存:继续; - - 模型未缓存:返回可操作错误,胶囊显示“请先下载本地语音模型”,不开始录音; - - runtime 初始化失败:显示“本地语音运行时不可用”,引导设置页。 -4. 创建 `FoundryLocalWhisperAsr`,把它作为 `AudioConsumer` 传给 `Recorder::start`。 -5. 录音期间 recorder 继续向 consumer 推 PCM,capsule 继续显示电平。 -6. 用户再次按热键或松开热键结束录音。 -7. `end_session` 停 recorder,调用 `FoundryLocalWhisperAsr::transcribe()`: - - PCM buffer 编码成临时 WAV; - - 确保模型 loaded; - - 通过 SDK endpoint 调 `/v1/audio/transcriptions`; - - 解析 `{ text }` 为 `RawTranscript`。 -8. 后续完全复用现有逻辑:空 transcript guard、polish / translate、Chinese script preference、Windows TSF-first insert、history append、capsule Done。 - -## 首次使用 UX - -Windows 新用户默认 active ASR 使用 `foundry-local-whisper`,但只在“没有现有 preferences / credentials active ASR”的新安装路径生效,不覆盖老用户。 - -设置页增加或改造“本地语音识别”区: - -- 显示 runtime 状态:可用、初始化中、不可用。 -- 显示 execution provider 状态:已注册、需要下载、下载中、失败。 -- 显示模型列表:`whisper-small`、`whisper-base`、`whisper-tiny`,尺寸和 license 从 Foundry catalog / REST metadata 获取。 -- 提供一键下载 / 取消 / 删除 / 设为默认 / 加载并测试。 -- 下载完成后后台 preload,减少第一次热键录音结束后的等待。 - -首次按热键但模型缺失时: - -- 不调用 Win+H。 -- 不弹系统 Voice Typing。 -- 不开始录音,避免用户说完才发现没有模型。 -- capsule 显示短错误,主窗口跳到本地语音识别页或给出“下载模型”入口。 - -## 质量与性能评估 - -中文 / 中英混输: - -- Whisper 系列对普通话和英文都可用,但 `tiny/base/small` 本地模型质量通常低于云端大模型 ASR 或 Whisper large。 -- `whisper-small` 更适合作为默认质量档;`whisper-base` 用于低配机器。 -- 热词 bias 当前不会直接进入 Whisper 解码;词汇表仍可作为 LLM polish 上下文和 history 命中统计使用。 - -首次延迟: - -- 首次下载 execution provider 和模型可能需要数分钟,取决于网络和硬件。 -- 首次 load 模型可能需要数秒;应在切换 provider / 下载完成后后台 preload。 -- 单次转写是 batch 型,不是 Volcengine 那种 streaming final;capsule 可保持“转写中”直到返回。 - -模型体积: - -- 体积不硬编码。UI 通过 Foundry catalog / REST metadata 显示当前真实 `fileSizeMb`。 -- 安装包不内置模型,避免 release artifact 暴涨和 license 风险。 - -离线能力: - -- 模型和 execution provider 下载完成后,ASR 推理可离线。 -- LLM polish 仍取决于用户配置的 LLM provider;LLM 不可用时按现有规则插入 raw transcript。 - -隐私: - -- ASR 音频在本机处理,不发送到外部 ASR 服务。 -- 首次下载模型和组件会访问 Foundry catalog / Microsoft 分发源。 -- LLM polish 仍可能把 transcript 发送到用户配置的 LLM endpoint;设置页文案需要明确区分“ASR 本地”和“LLM 仍按配置调用”。 - -## Windows 安装器与分发 - -MVP 不修改 Windows TSF IME 注册流程。 - -需要验证: - -- `foundry-local-sdk --features winml` 在 Tauri Windows build 中会引入哪些 DLL、runtime 文件和 redistributable 要求。 -- NSIS / MSI 是否能自动收集这些 native 依赖。 -- Windows release workflow 当前对 NSIS / MSI 有固定红线,不能把 bundler 两轮 invoke、`-sice:ICE80` repair 或 `bash` shell 约束顺手改掉。 -- 如果 Foundry Local runtime 需要额外安装或动态下载组件,UI 必须把“正在准备本地语音运行时”作为一键流程的一部分,而不是要求用户手动跑 `winget`。 - -## 失败与 fallback - -- Foundry runtime 缺失或初始化失败:不开始录音,提示本地语音运行时不可用,保留用户切回云 ASR 的入口。 -- 模型未下载:不开始录音,提示下载模型。 -- 模型下载失败:保留 partial / retry 状态,不切换到 Win+H。 -- 转写超时:沿用 coordinator global timeout,写失败状态,不插入空文本。 -- 转写返回空:沿用 `emptyTranscript` history guard。 -- LLM polish 失败:插入 raw transcript,history 标记 `polishFailed`。 -- TSF 提交失败:按现有 `allow_non_tsf_insertion_fallback` 走 Unicode / clipboard fallback;关闭 fallback 时标记 `windowsImeTsfRequired`。 - -## 文件与模块边界 - -后续实现计划触碰范围: - -- `openless-all/app/src-tauri/Cargo.toml`:Windows 依赖增加 Foundry Local Rust SDK,必要时启用 `winml` feature。 -- `openless-all/app/src-tauri/src/asr/local/`:拆出 provider-neutral local ASR registry,新增 Foundry Whisper runtime / provider;保留 macOS Qwen3 代码。 -- `openless-all/app/src-tauri/src/coordinator.rs`:扩展 `ActiveAsr`,在 `begin_session` 和 `end_session` 分支接入 `FoundryLocalWhisperAsr`。 -- `openless-all/app/src-tauri/src/commands.rs`:新增 Windows local Whisper runtime/model status、download、test、preload 命令,或把现有 `local_asr_*` 扩展成多 backend。 -- `openless-all/app/src-tauri/src/types.rs`:新增 Windows local ASR preferences,如 active Foundry Whisper model、keep-loaded 时长、语言 hint。 -- `openless-all/app/src/lib/localAsr.ts`、`src/pages/LocalAsr.tsx`、`src/pages/Settings.tsx`、`src/i18n/*`:展示 Windows 本地语音识别和模型管理。 -- `openless-all/app/scripts/windows-real-asr-insertion-smoke.ps1`:增加 local ASR 模式,不再强制 Volcengine 凭据。 - -Rust 叶子模块仍只依赖 `types.rs` 和自身 provider 内部类型。跨模块编排继续放在 `coordinator.rs`。 - -## 验证计划 - -静态与单元验证: - -- `asr_configured_for_provider("foundry-local-whisper")` 返回 true,不要求云端 API Key。 -- `ensure_asr_credentials` 对模型缺失返回明确错误。 -- fake Foundry endpoint 返回 `{ "text": "..." }` 时,`FoundryLocalWhisperAsr` 能把 PCM 编成 WAV 并产出 `RawTranscript`。 -- model id、provider id、prefs default 的序列化和迁移测试。 - -集成验证: - -- Windows 真机启动 OpenLess,active ASR 为 `foundry-local-whisper`,未配置 Volcengine / Whisper HTTP。 -- 首次缺模型时按热键,不出现 Win+H 面板,不开始录音,提示下载模型。 -- 下载模型后聚焦 Notepad,按热键录音,说测试短句,结束后 history 新增 session,`rawTranscript` 非空,`finalText` 非空。 -- Ark / LLM 未配置时,最终插入 raw transcript,并按现有 polish fallback 规则记录。 -- Ark / LLM 已配置时,transcript 进入现有 polish / translation 逻辑。 -- Windows TSF IME 已安装时 `insertStatus=inserted`;禁用 TSF 或目标不支持时按当前 fallback 策略表现。 -- 断网后重复已下载模型的听写,ASR 仍可完成;LLM 不可用时 raw transcript 不丢。 - -No Win+H 验证: - -- 代码搜索确认没有 `Win+H`、Voice Typing、`Windows.Media.SpeechRecognition`、SAPI dictation 调用路径。 -- 真机 smoke 过程中截图或窗口枚举确认没有 Voice Typing 面板窗口。 -- 日志只出现 OpenLess recorder、Foundry local ASR、polish、Windows IME / fallback 插入事件。 - -## 开放风险 - -- Foundry Local preview API 可能变化,尤其是 Rust audio client 和 WinML package 分发。 -- Foundry Local 的 Whisper 模型质量和中文标点风格需要真机样本验证,不能只靠官方能力声明。 -- 首次 execution provider 下载和模型下载的错误码、进度回调、缓存位置需要实测。 -- Windows installer 对 SDK native 依赖的收集需要 release workflow 验证。 -- 如果 Foundry Local runtime 无法在 Tauri app 内稳定嵌入,备选路线是用 SDK 管理 local REST service;若 REST 也不稳定,再评估 `whisper.cpp` / ONNX Runtime 自管路线。 diff --git a/docs/windows-lifecycle-tracking/issue-154-dual-hotkey-sources.md b/docs/windows-lifecycle-tracking/issue-154-dual-hotkey-sources.md deleted file mode 100644 index f4df0348..00000000 --- a/docs/windows-lifecycle-tracking/issue-154-dual-hotkey-sources.md +++ /dev/null @@ -1,27 +0,0 @@ -# Issue #154 Tracking - -Scope: Windows dictation lifecycle driven by two hotkey event sources - -Current stage: - -- This branch is a draft PR placeholder. -- No runtime fix is included yet. -- The goal is to lock down source ownership and failure modes before changing behavior. - -Problem statement: - -- Windows currently has both OS-level low-level keyboard hook input and focused-window renderer forwarding. -- macOS/Linux do not have the same dual-source lifecycle driver. -- Shared dedupe exists, but source precedence is not yet a first-class contract. - -Implementation target to converge before coding: - -- Decide whether Windows should have one owner source or an explicit precedence model. -- Define expected behavior for mixed-source press/release ordering. -- Add testable scenarios for hold mode, toggle mode, and focus switching. - -Non-goals in this draft: - -- No hotkey adapter rewrite yet -- No input-stack refactor without agreed target contract -- No unrelated QA hotkey changes diff --git a/docs/windows-tauri-test-agent-research.md b/docs/windows-tauri-test-agent-research.md deleted file mode 100644 index 0ea1126f..00000000 --- a/docs/windows-tauri-test-agent-research.md +++ /dev/null @@ -1,127 +0,0 @@ -# Windows Tauri 测试 Agent / Workflow 调研 - -## 背景 - -OpenLess 是 Tauri v2 + React/Vite + Rust 后端的桌面应用。Windows 真机问题主要集中在: - -- 启动首屏:空边框、白屏、前端首帧前窗口过早显示。 -- 系统能力:全局热键、麦克风隐私权限、剪贴板、前台输入框插入。 -- 本地状态:凭据读写、历史记录、设置保存。 -- 人工输入:物理热键无法用普通 synthetic SendInput 可靠替代。 - -## 可复用方案 - -### 1. 官方 tauri-driver + WebDriver - -来源: - -- https://v2.tauri.app/develop/tests/webdriver/ -- https://github.com/tauri-apps/webdriver-example - -适合做 CI 基线: - -- 启动 Tauri 应用。 -- 检查窗口出现、DOM 内容、按钮点击、设置页导航。 -- Windows CI 可配 `msedgedriver`,Linux CI 可配 `webkit2gtk-driver + xvfb`。 - -参考 workflow: - -- `tauri-apps/webdriver-example/.github/workflows/webdriver-v2.yml` -- 该 workflow 在 `ubuntu-latest` 和 `windows-latest` 上安装 `tauri-driver`,Windows 侧安装 `msedgedriver`,再分别跑 selenium / webdriverio 测试。 - -建议落地: - -- 先选 WebdriverIO,生态和断言更贴近前端团队。 -- 新增 `openless -all/app/webdriver/`,覆盖: - - 应用启动后 1 秒内出现 OpenLess UI。 - - 设置页提供商字段能读出已存在凭据的“非空状态”。 - - 打开设置页不会把未修改字段写回空值。 - - 权限页 Windows 文案不出现 macOS 辅助功能授权提示。 - -### 2. tauri-plugin-playwright - -来源: - -- https://docs.rs/crate/tauri-plugin-playwright/0.1.0 - -适合做更接近 Playwright 的 E2E: - -- 在 Tauri app 内嵌控制 server。 -- 使用 Playwright API 做页面级断言。 -- 对前端团队迁移成本较低。 - -风险: - -- 需要引入 Tauri plugin,测试入口和生产入口要隔离。 -- 目前生态成熟度低于官方 WebDriver。 - -建议落地: - -- 暂不作为第一阶段 CI 基线。 -- 等 WebDriver 跑通后,再评估是否用它补截图、网络、前端状态断言。 - -### 3. Tauri MCP / AI Agent 调试插件 - -来源: - -- https://github.com/P3GLEG/tauri-plugin-mcp -- https://github.com/dirvine/tauri-mcp - -适合做 agent 辅助调试: - -- 截图。 -- 窗口管理。 -- DOM 读取。 -- 鼠标/键盘输入。 -- localStorage 检查。 - -风险: - -- 需要在 app 中接入调试插件,必须确保只在 dev/test 构建启用。 -- 不适合直接放进 production bundle。 - -建议落地: - -- 可以做 `devtools/agent` 分支实验。 -- 目标是让 Codex/Claude/Cursor 能直接看 Tauri 窗口截图和 DOM,降低“用户肉眼测试”的比例。 - -### 4. TestDriver AI - -来源: - -- https://testdriver.ai/vscode -- https://github.com/testdriverai/testdriverai - -适合黑盒探索: - -- 用自然语言描述流程。 -- 支持桌面应用、Windows、GitHub Actions。 -- 能生成测试报告/视频。 - -风险: - -- 外部服务/账号/成本依赖。 -- 对本项目当前开源 CI 基线不应作为唯一门禁。 - -建议落地: - -- 作为 nightly 或人工触发探索测试,不作为 PR 必过的第一层。 -- 可覆盖“打开 OpenLess Dev、进入设置、检查凭据字段非空、按热键后胶囊状态变化”等高层流程。 - -## 推荐实施顺序 - -1. 保留现有 PowerShell smoke:构建、启动、进程响应、hotkey listener 日志。 -2. 增加 WebDriverIO 基线:窗口、DOM、设置页、凭据字段非空状态、Windows 文案。 -3. 增加 Windows 手动门禁脚本:物理热键、真实 ASR、Notepad fallback、麦克风隐私开关。 -4. 评估 Tauri MCP:给 agent 提供截图/DOM/输入能力,减少人工描述。 -5. 评估 TestDriver AI:做黑盒探索和视频报告。 - -## 第一批必须补的测试 - -- 启动首屏不能先显示空窗口边框。 -- Windows 启动不等待麦克风 input stream 探测。 -- 设置页凭据字段加载完成前 blur 不会保存空值。 -- 设置页打开后不修改字段,`credentials.json` 不发生变化。 -- `get_credentials` 与 `read_credential` 对同一文件返回一致状态。 -- 右 Control 默认热键文案在概览、历史、设置中一致。 -- Windows 权限页不显示 macOS 辅助功能授权引导。 diff --git a/docs/windows-ui-tracking/issue-142-capsule-geometry.md b/docs/windows-ui-tracking/issue-142-capsule-geometry.md deleted file mode 100644 index f524e225..00000000 --- a/docs/windows-ui-tracking/issue-142-capsule-geometry.md +++ /dev/null @@ -1,23 +0,0 @@ -# Issue #142 Placeholder / 占位 - -## 中文摘要 - -本 PR 是 issue #142 的 draft 占位,专门跟踪 Windows Capsule 变形、失真与尺寸错位问题。 -当前只保留问题边界、几何证据和后续修复准入条件,不引入业务逻辑改动。 - -## Scope / 范围 - -- Capsule native window bounds -- visual pill metrics -- badge position -- Windows DPI / transparent window clipping - -## Evidence / 证据入口 - -- `openless-all/app/src-tauri/src/lib.rs` -- `openless-all/app/src/components/Capsule.tsx` -- `openless-all/app/src/lib/capsuleLayout.ts` - -## Merge Rule / 合并规则 - -- 仅当 issue #142 的几何对齐与 Windows smoke 验证完成后才允许从 draft 转为 ready。 diff --git a/docs/windows-ui-tracking/issue-143-cold-start-ui.md b/docs/windows-ui-tracking/issue-143-cold-start-ui.md deleted file mode 100644 index 9cfe70a5..00000000 --- a/docs/windows-ui-tracking/issue-143-cold-start-ui.md +++ /dev/null @@ -1,23 +0,0 @@ -# Issue #143 Placeholder / 占位 - -## 中文摘要 - -本 PR 是 issue #143 的 draft 占位,专门跟踪 Windows 冷启动前几秒加载异常、闪烁与 ready 前展示错位问题。 -当前只记录时序边界、现象入口和后续修复出口,不引入无关功能修改。 - -## Scope / 范围 - -- visible / ready timing -- first stable paint -- startup shell exposure -- Windows cold start UX - -## Evidence / 证据入口 - -- `openless-all/app/src-tauri/tauri.conf.json` -- `openless-all/app/src/App.tsx` -- `openless-all/app/src/components/FloatingShell.tsx` - -## Merge Rule / 合并规则 - -- 仅当 issue #143 的启动时序统一且完成 Windows cold-start smoke 后才允许从 draft 转为 ready。 diff --git a/docs/windows-upstream-pr-workflow.md b/docs/windows-upstream-pr-workflow.md deleted file mode 100644 index d1a8bae4..00000000 --- a/docs/windows-upstream-pr-workflow.md +++ /dev/null @@ -1,65 +0,0 @@ -# Windows upstream PR workflow - -## 目标 - -Windows 主线先在 `fork/dev` 完成发现、修复、CI、自审和复审,再收敛成明确 upstream 维护项。不要把未收敛的真机 findings 直接写到 upstream issues 或 upstream PR。 - -## 标准流程 - -1. 在 `fork/dev` 修复问题。 - - 每个提交只解决一个明确问题。 - - findings 先写到本地记录或 fork issue。 - - 不向 upstream 新增噪声 issue。 - -2. 在 `fork/dev` 触发 CI。 - - Windows build 必须过。 - - 新增/修改的 Windows smoke 必须能在本机复跑。 - - 真实凭据、物理热键、ASR、插入 fallback 等不能完全 CI 化的项目,要留下本机证据路径和日志摘要。 - -3. 在 fork 上开自有 PR。 - - base: `fork/dev` - - head: 功能分支 - - PR 描述使用中文,按模板填写。 - - PR 必须包含 fork CI 链接、真机回归摘要、自审结论。 - -4. 复审 fork PR。 - - 先按 code review 方式找阻断项。 - - 修完 review findings 后再次跑 fork CI。 - - 只有 fork PR 复审通过,才能进入 upstream 收敛。 - -5. 收敛 upstream 维护项。 - - 从 fork PR 中拆出最小 upstream 维护切片。 - - upstream PR 只包含已验证的最小改动。 - - upstream PR 描述必须带 fork PR / fork CI 链接,说明该切片来自已验证的 `fork/dev` 工作流。 - - upstream issue 只用于已经确认、可维护、可复现、需要 upstream 跟踪的问题;不要把探索期 findings 扔到 upstream。 - -## upstream PR 进入条件 - -- `fork/dev` 已包含修复。 -- fork PR 已通过 CI。 -- fork PR 已完成自审和复审。 -- upstream 分支从最新 upstream base 切出。 -- upstream diff 能独立解释,不依赖 fork/dev 的其他未提交上下文。 -- PR 描述包含: - - 单一目标 - - 不包含范围 - - fork PR 链接 - - fork CI 链接 - - 本机 Windows 回归证据 - -## 禁止项 - -- 禁止从未验证的本地 finding 直接创建 upstream issue。 -- 禁止绕过 fork/dev CI 直接推 upstream PR。 -- 禁止把多个 Windows 真机问题混成一个 upstream PR。 -- 禁止在 upstream PR 中提交真实服务凭据、用户本地配置、构建产物或临时目录。 - -## 当前执行规则 - -后续 Windows 主线默认顺序为: - -```text -fork/dev 修复 -> fork/dev CI -> fork PR -> 自审/复审 -> upstream 最小 PR -``` - -如果 upstream PR 需要更新,先确认对应 fork PR 和 fork CI 证据,再同步 upstream PR。 diff --git a/issue-420-wayland-plan.md b/issue-420-wayland-plan.md deleted file mode 100644 index 1bcd729d..00000000 --- a/issue-420-wayland-plan.md +++ /dev/null @@ -1,317 +0,0 @@ -# #420 Wayland 支持方案说明 - -> 适用范围:`/home/chris233/openless` -> 关联 issue:[#420](https://github.com/Open-Less/openless/issues/420) -> 目标:给 OpenLess 在 Linux / Wayland 下补一条可靠、与当前仓库决策一致的实现路径,而不是继续把 X11 思路硬套过去。 - -## 1. 当前问题拆分 - -#420 现在实际上混了三类问题: - -1. **Wayland 下全局快捷键不可用** - - 这是因为 Wayland 安全模型不允许普通应用像 X11 那样监听全局键盘事件。 - - 当前仓库已经把 CLI + single-instance 路径做成 Wayland 下的正式可交付方案;portal 仍属于后续研究方向,而不是现阶段已落定的主实现。 - -2. **Wayland 下文本输出不可靠** - - 流式输出路径:`unicode_keystroke.rs` 在 Linux 仍走 `enigo.text(...)`。 - - 一次性输出路径:`insertion.rs` 仍走 `clipboard + simulate_paste(enigo)`。 - - 这两条路径本质都还是 X11 风格假设,在 Wayland 下可能“调用成功但没真正落字”。 - -3. **Wayland 下设置页快捷键录制 / UI 黑屏闪烁** - - 这更像 WebKitGTK / 合成器 / 输入录制 UI 的独立问题。 - - 不应继续和“Wayland 全局快捷键”或“Wayland 文本输出”混成一个修复面。 - -## 2. 关键判断 - -### 2.1 Wayland 有多层可行路径,但不能把尚未验证的 portal 能力写成既定主路线 - -必须分开看: - -- **全局快捷键触发**: - - 从协议方向看,portal / compositor 能力值得研究; - - 但从**当前仓库已落地实现**与跨桌面可交付性看,正式支持路径已经是 `CLI + single-instance 转发`; - - `xdg-desktop-portal` 的 `GlobalShortcuts` 现阶段更适合作为 research track,而不是直接写成产品承诺。 -- **文本插入**:没有 X11 那种“应用可随意向其他应用发键”的通用能力。 - - 剪贴板有现实可行路。 - - 自动输入只能走 **受权限控制** 的 portal / libei / compositor 能力。 - - 不存在一个对所有 Wayland 桌面都等价、无感、无授权的统一注入接口。 - -### 2.2 现阶段最高优先级不是“自动输入一步到位”,而是“用户文本不能丢” - -当前最危险的问题不是“Wayland 下体验不够自动化”,而是: - -- 日志显示成功 -- OpenLess 认为已经插入 -- 用户实际输入框里没有字 - -这个行为会直接破坏产品的核心承诺:**用户的话不能丢**。 - -## 3. 建议总方案 - -按三个阶段推进,而不是一口气追求全自动。 - ---- - -## Phase 1:先止血,确保文本不丢 - -### 目标 - -在 Wayland 下,即使没有自动输入能力,也必须保证: - -- 听写结果至少可靠进入剪贴板 -- UI / 日志明确告诉用户当前走的是哪条 fallback -- 不再出现“代码认为成功,屏幕实际没字”的假成功状态 - -### 建议改动 - -#### 3.1 禁用 Wayland 下的“streaming insert 成功语义” - -当前逻辑里,Linux 流式路径一旦 `type_unicode_chunk()` 返回成功,就会: - -- 累积 `typed_text` -- 标记 `already_streamed=true` -- 跳过后续 inserter - -这在 Wayland 下不可靠。 - -**建议:** -- 检测 `Linux + Wayland` 时,不让 `enigo.text(...)` 的返回值直接成为“已成功插入”的依据。 -- Wayland 下默认不要走 `already_streamed=true` 的成功短路。 - -#### 3.2 Wayland 下默认降级为 copy-only - -当前非流式路径是: - -- 写入剪贴板 -- 再用 `simulate_paste()` 发粘贴快捷键 - -Wayland 下第二步不可靠。 - -**建议:** -- 检测到 Wayland 时,默认走 **copy-only fallback**。 -- 把文本留在剪贴板里,不要立即 restore。 -- 明确给用户提示:`已复制到剪贴板,请手动粘贴`。 - -#### 3.3 把状态文案改成真话 - -需要避免如下误导: - -- “已插入”但实际上没插入 -- “已尝试粘贴”但用户无从判断文本是否已落到目标应用 - -**建议:** -- Wayland fallback 时统一使用明确状态: - - `已复制到剪贴板,请手动粘贴` - - `Wayland 当前未启用自动输入` - - `剪贴板写入失败` - -### Phase 1 接受标准 - -- Wayland 下听写后,文本不会 silently disappear。 -- 即使自动输入失败,用户也总能从剪贴板找回文本。 -- 日志和 UI 状态与真实行为一致。 - ---- - -## Phase 2:巩固当前 Wayland 触发路径 - -### 目标 - -把 Wayland 下已经落地的 `CLI + single-instance` 方案补齐到真正稳定、清晰、可交付,而不是在文档里把尚未验证的 portal 能力提前写成主路线。 - -### 建议改动 - -#### 3.4 明确把 CLI 路径当作当前正式支持方案 - -当前仓库已采用的路径是: - -1. 启动时检测 Wayland session -2. 不安装 `rdev` 全局监听 -3. 通过桌面环境快捷键执行: - - `openless --toggle-dictation` - - `openless --toggle-qa` - - `openless --cancel-dictation` -4. 由 `tauri-plugin-single-instance` 把第二实例 argv 转发给主实例 coordinator - -这里要做的不是推翻,而是补齐: - -- Settings / README / Linux 指南里统一说明这是当前正式支持方式; -- 保证 GNOME / KDE / Hyprland / sway 等示例文案一致; -- 保证“有快捷键可触发”这件事在 Wayland 上可复现、可说明、可排障。 - -#### 3.5 portal 研究保留为后续增强方向 - -`xdg-desktop-portal` `GlobalShortcuts` 可以继续研究,但在仓库明确验证下面几点之前,不应写成主承诺: - -- GNOME / KDE / 其他桌面上的真实可用范围 -- 权限/交互模型是否符合产品心智 -- 回退链路是否比当前 CLI 方案更简单而不是更碎 - -### 为什么这一层应该单独做 - -- 这是当前仓库已经落地的 Wayland 触发方案; -- 它能解决 #420 最核心的“如何触发听写”问题; -- 维护成本和跨桌面稳定性目前都优于贸然切 portal 主路线。 - -### Phase 2 接受标准 - -- Wayland 用户按文档/设置页说明配置后,能稳定触发 Dictation / QA / Cancel。 -- 设置页、README、日志三处对 Wayland 触发方式的表述一致。 -- 不把 `GlobalShortcuts portal` 写成已交付能力;如继续研究,应另开 research issue / PR。 - ---- - -## Phase 3:研究受权限控制的 Wayland 自动输入能力 - -### 目标 - -探索 Wayland 下真正的“自动把文本发到其他应用”能力,但只在 **有 compositor 支持 + 有用户授权** 的情况下启用。 - -### 候选路径 - -#### 3.5 `RemoteDesktop` portal + keyboard events - -优点: -- 有官方 portal 文档 -- 权限模型明确 - -缺点: -- 会话 / 授权交互更重 -- 行为更像“远程控制权限”,不一定适合所有用户心智 - -#### 3.6 `RemoteDesktop` / `InputCapture` + `ConnectToEIS` + `libei` - -优点: -- 这是 Wayland / compositor 体系里更现代的输入模拟路径 -- 比直接赌 `enigo` / XTest 靠谱 - -缺点: -- 实现复杂度高 -- compositor / backend 支持碎片化 -- 仍然不是“全桌面无感通吃”的方案 - -#### 3.7 不建议把主方案押在 `virtual-keyboard-unstable-v1` - -原因: -- 协议本身就标明不适合当通用稳定能力依赖 -- compositor 是否开放给第三方应用不可控 -- 产品层面碎片化风险太高 - -### Phase 3 的产品策略 - -自动输入必须是: - -- **能力探测通过** 才启用 -- **授权成功** 才启用 -- 失败时明确回退到剪贴板方案 - -换句话说: - -> Wayland 自动输入应该是“可选增强能力”,不是默认基本能力。 - ---- - -## 4. 对 #420 的建议拆单 - -建议把后续工作拆成三个 issue / PR 方向: - -### 4.1 `wayland-output-safety` -范围: -- Wayland 下禁用假成功 streaming insert -- Wayland 下默认 copy-only -- 状态文案 / 日志对齐真实行为 - -这是最高优先级。 - -### 4.2 `wayland-trigger-path-hardening` -范围: -- 巩固 `CLI + single-instance` 触发链路 -- Settings / README / Linux 文档统一 -- GNOME / KDE / Hyprland / sway 示例与排障说明对齐 - -这是第二优先级。 - -### 4.3 `wayland-global-shortcuts-portal-research` -范围: -- 评估 `GlobalShortcuts` portal 的真实桌面支持面 -- 验证是否值得从 research 升级为产品能力 -- 只产出调研/原型,不提前改写当前支持承诺 - -这是后续研究方向,不应与当前可交付方案混写。 - -### 4.4 `wayland-hotkey-editor-flicker` -范围: -- 设置页快捷键录制时的闪烁 / 黑屏 -- 只针对 UI / WebKitGTK / 输入录制链路处理 - -这个不要再跟“文本输出”绑一起看。 - ---- - -## 5. 我建议的实际落地顺序 - -### 第一刀(应先做) -- 修 `Wayland 文本输出不可靠` -- 核心目标:**不丢文本** - -### 第二刀 -- 巩固 `CLI + single-instance` 触发链路 -- 核心目标:**让当前 Wayland 方案真正稳定、清晰、可交付** - -### 第三刀 -- 研究 `GlobalShortcuts portal` / `portal + libei` 能力 -- 核心目标:**评估哪些能力值得升级成未来增强项** - -### 第四刀 -- 单独处理设置页闪烁 / 黑屏 - ---- - -## 6. 不建议做的事 - -### 6.1 不建议继续把 `enigo` 返回值当 Wayland 成功依据 - -因为这会继续制造: -- 日志成功 -- UI 成功 -- 用户实际没看到字 - -### 6.2 不建议把未验证的 portal 方案直接写成当前主实现 - -在仓库已经正式落地 CLI 路径的前提下,把 portal 提前写成“既定正路”,会让文档、代码与用户预期再次脱节。 - -### 6.3 不建议把 `virtual-keyboard-unstable-v1` 直接当主实现 - -它更像 compositor 特定能力,不适合直接做成发行版通用路径。 - ---- - -## 7. 结论 - -Wayland 下当然应该走一条“属于 Wayland 的路”,但这条路在当前仓库里应分成两层: - -1. **当前正式触发路径** → `CLI + single-instance` -2. **剪贴板保底** → Wayland-native clipboard / copy-only fallback -3. **未来增强候选** → `GlobalShortcuts portal`、`RemoteDesktop` / `InputCapture` + `libei/EIS`(能力探测 + 用户授权) - -如果只能先做一件事,优先级一定是: - -> **先修文本输出链路,保证用户的话不会丢。** - ---- - -## 8. 参考资料(用于后续实现,不是最终用户文案) - -- XDG Portal GlobalShortcuts - https://flatpak.github.io/xdg-desktop-portal/docs/doc-org.freedesktop.portal.GlobalShortcuts.html -- XDG Portal RemoteDesktop - https://flatpak.github.io/xdg-desktop-portal/docs/doc-org.freedesktop.portal.RemoteDesktop.html -- XDG Portal InputCapture - https://flatpak.github.io/xdg-desktop-portal/docs/doc-org.freedesktop.portal.InputCapture.html -- XDG Portal Clipboard - https://flatpak.github.io/xdg-desktop-portal/docs/doc-org.freedesktop.portal.Clipboard.html -- libei 文档 - https://libinput.pages.freedesktop.org/libei/ -- Wayland core / data transfer model - https://wayland.pages.freedesktop.org/wayland.freedesktop.org/docs/html/ch04.html - https://wayland.freedesktop.org/docs/html/apa.html