Add ExecuWhisper macOS dictation app with LFM2.5 smart formatter#237
Open
seyeong-han wants to merge 2 commits intometa-pytorch:mainfrom
Open
Add ExecuWhisper macOS dictation app with LFM2.5 smart formatter#237seyeong-han wants to merge 2 commits intometa-pytorch:mainfrom
seyeong-han wants to merge 2 commits intometa-pytorch:mainfrom
Conversation
c1d454c to
fdcbab8
Compare
Introduce ExecuWhisper, a Superwhisper-style on-device dictation app backed by ExecuTorch. Audio is captured locally with AVAudioEngine, transcribed by Parakeet TDT (Metal backend), and optionally rewritten by LFM2.5-350M (MLX backend) before paste/save. Both helpers run as warm subprocesses over a JSONL stdin/stdout protocol so first paste latency stays low across consecutive dictations. App features: - Global Ctrl+Space (configurable) overlay dictation that pastes the formatted text back into the active app via a stable Accessibility paste helper installed under Application Support, so Xcode rebuilds do not invalidate the granted permission. - Single smart formatting prompt with safety net: rewrites dictation into final text, never answers spoken questions, falls back to the raw transcript when the formatter output is suspicious (length-ratio guard, prompt-echo guard, "Mode:" / "Sure" / "Here is" guards). - Auto-preload of the Parakeet helper on launch and after foreground health checks, with a toolbar spinner while warming. - Replacements (case-aware, longest-match-first, word-boundary aware). - Session history with rename, pinning, recency grouping, search, and export to txt / json / srt. - Built-in microphone selection, silence detection, and dictation shortcut recorder. Build / packaging: - XcodeGen project (project.yml) with hardened runtime, deployment target macOS 14.0, dead-code-stripping, explicit ENABLE_USER_SCRIPT_SANDBOXING=NO so the post-compile script can bundle the parakeet/lfm25 helpers and mlx.metallib next to the app. - scripts/build.sh for lightweight or --bundle-models releases. - scripts/create_dmg.sh validates required helpers and the paste helper sub-app, then emits a verified UDZO DMG. - ExecuWhisper Paste Helper.app sub-bundle gives macOS a stable bundle identifier (org.pytorch.executorch.ExecuWhisper.PasteHelper) for Accessibility, so we are not asking for a new grant per build. Models: - First-launch download from younghan-meta/Parakeet-TDT-ExecuTorch-Metal and younghan-meta/LFM2.5-ExecuTorch-MLX into Application Support. - Optional --bundle-models build for fully offline distribution. Tests: - 68 Swift Testing cases covering replacements, prompt safety, formatter fallback paths (question-answering, prompt echo, metadata echo), preload state machine, session export, and helper protocol wire format. Made-with: Cursor
fdcbab8 to
29caabb
Compare
Add production-readiness fixes for audio routing, formatter validation, helper signing, release verification, and support diagnostics. This keeps the existing app PR focused while making the DMG safer to dogfood across different Mac audio setups. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds ExecuWhisper, a Superwhisper-style on-device dictation app for macOS, lowered through ExecuTorch.
parakeet_helper).lfm25_formatter_helper), running with one rewriter-only prompt that never answers spoken questions and falls back to the raw transcript when the model output is suspicious..txt/.json/.srt.executorch→ExecuTorch).The app keeps every helper warm across requests so dictate→paste latency stays consistent on the second, third, … dictation. Models are downloaded from Hugging Face on first launch, or can be bundled into the app for fully offline distribution.
What's in the box
ExecuWhisper/: SwiftUI app, helpers, test target (Swift Testing).ExecuWhisper/Support/PasteHelper/: tiny.appsub-bundle for stable Accessibility identity.ExecuWhisper/scripts/build.sh: lightweight or--bundle-modelsrelease builds.ExecuWhisper/scripts/create_dmg.sh: validated DMG packager (verifies helpers + paste helper bundle).ExecuWhisper/scripts/evaluate_formatting.py+evaluation/formatting_cases.jsonl: prompt-quality corpus (categories: prose, list, spoken-intent, action-list, technical, fallback).ExecuWhisper/test_audio/formatting_samples/*.wav: short dictation samples for manual end-to-end checks.How it runs locally
The
Hugging Facemodel repos used by first-launch download:younghan-meta/Parakeet-TDT-ExecuTorch-Metalyounghan-meta/LFM2.5-ExecuTorch-MLXNotable design / hardening
Mode:), filler answers (Sure./Here is), prompt leaks (<|im_start|>etc.), and suspiciously short outputs (< max(2, ceil(input_words * 0.4))when input ≥ 4 words). On rejection the app falls back to the raw transcript and tags the resultformatter-fallback.kAudioOutputUnitProperty_CurrentDeviceinstead of toggling the system default. This eliminates theFailed to create tap, config change pending!race that caused the second consecutive dictation to capture no audio.TranscriptStore.runHealthCheck()rewarms the helper if the runtime drops tounloaded, so foreground re-checks never leave the user with a cold engine. A toolbar spinner is shown while warming.ExecuWhisper Paste Helper.app(org.pytorch.executorch.ExecuWhisper.PasteHelper) under Application Support so the Accessibility grant survives Xcode rebuilds.libomp+libmlx). Mic and network entitlements are kept inExecuWhisper.entitlements.DEAD_CODE_STRIPPING=YES,ENABLE_USER_SCRIPT_SANDBOXING=NO(the post-compile script needs to readparakeet_helper/lfm25_formatter_helper/libomp.dylibfrom outside the build dir),LOCALIZATION_PREFERS_STRING_CATALOGS=YES.Test plan
xcodebuild test -scheme ExecuWhisper -destination 'platform=macOS'→ 68 tests in 15 suites pass../scripts/build.sh→./scripts/create_dmg.sh→ ~12 MB lightweight DMG,hdiutil verifyVALID, mounted contents include the paste helper sub-app../scripts/build.sh --bundle-models→ ~1.0 GB self-contained DMG withmodel.pte/tokenizer.model/lfm2_5_350m_mlx_4w.pte/tokenizer.json/tokenizer_config.json,hdiutil verifyVALID.No audio was captured.regression).Yesreply path falls back to the raw transcript).Made with Cursor