Skip to content

Add ExecuWhisper macOS dictation app with LFM2.5 smart formatter#237

Open
seyeong-han wants to merge 2 commits intometa-pytorch:mainfrom
seyeong-han:execuwhisper-app-with-formatter
Open

Add ExecuWhisper macOS dictation app with LFM2.5 smart formatter#237
seyeong-han wants to merge 2 commits intometa-pytorch:mainfrom
seyeong-han:execuwhisper-app-with-formatter

Conversation

@seyeong-han
Copy link
Copy Markdown
Contributor

Summary

Adds ExecuWhisper, a Superwhisper-style on-device dictation app for macOS, lowered through ExecuTorch.

  • ASR: Parakeet TDT on the Metal backend (parakeet_helper).
  • Smart formatting: LFM2.5-350M on the MLX backend (lfm25_formatter_helper), running with one rewriter-only prompt that never answers spoken questions and falls back to the raw transcript when the model output is suspicious.
  • System dictation: configurable global hotkey (default ⌃Space) opens an overlay; the final text is auto-pasted into the front app via a stable, signed paste helper sub-app.
  • Session management: history, rename, pin, search, and export to .txt / .json / .srt.
  • Replacements: case-aware longest-match-first text rewriting (e.g. executorchExecuTorch).

The app keeps every helper warm across requests so dictate→paste latency stays consistent on the second, third, … dictation. Models are downloaded from Hugging Face on first launch, or can be bundled into the app for fully offline distribution.

What's in the box

  • ExecuWhisper/: SwiftUI app, helpers, test target (Swift Testing).
  • ExecuWhisper/Support/PasteHelper/: tiny .app sub-bundle for stable Accessibility identity.
  • ExecuWhisper/scripts/build.sh: lightweight or --bundle-models release builds.
  • ExecuWhisper/scripts/create_dmg.sh: validated DMG packager (verifies helpers + paste helper bundle).
  • ExecuWhisper/scripts/evaluate_formatting.py + evaluation/formatting_cases.jsonl: prompt-quality corpus (categories: prose, list, spoken-intent, action-list, technical, fallback).
  • ExecuWhisper/test_audio/formatting_samples/*.wav: short dictation samples for manual end-to-end checks.

How it runs locally

# Build helpers in your local executorch checkout
cd ~/executorch
gh pr checkout https://github.com/pytorch/executorch/pull/18861   # parakeet_helper
conda activate et-metal && make parakeet-metal
conda activate et-mlx   && make lfm_2_5_formatter-mlx

# Build app + lightweight DMG
cd executorch-examples/ExecuWhisper
./scripts/build.sh
./scripts/create_dmg.sh ./build/Build/Products/Release/ExecuWhisper.app ./ExecuWhisper.dmg

The Hugging Face model repos used by first-launch download:

  • younghan-meta/Parakeet-TDT-ExecuTorch-Metal
  • younghan-meta/LFM2.5-ExecuTorch-MLX

Notable design / hardening

  • Smart formatter prompt: rewriter-only instruction with three few-shot examples, including a question-stays-a-question case. Validator rejects metadata echoes (Mode:), filler answers (Sure. / Here is), prompt leaks (<|im_start|> etc.), and suspiciously short outputs (< max(2, ceil(input_words * 0.4)) when input ≥ 4 words). On rejection the app falls back to the raw transcript and tags the result formatter-fallback.
  • Audio recorder: binds AVAudioEngine input to the chosen device via kAudioOutputUnitProperty_CurrentDevice instead of toggling the system default. This eliminates the Failed to create tap, config change pending! race that caused the second consecutive dictation to capture no audio.
  • Preload UX: TranscriptStore.runHealthCheck() rewarms the helper if the runtime drops to unloaded, so foreground re-checks never leave the user with a cold engine. A toolbar spinner is shown while warming.
  • Accessibility for Cmd+V: install a stable signed ExecuWhisper Paste Helper.app (org.pytorch.executorch.ExecuWhisper.PasteHelper) under Application Support so the Accessibility grant survives Xcode rebuilds.
  • No app sandbox: hardened runtime is on; library validation is disabled (we link libomp + libmlx). Mic and network entitlements are kept in ExecuWhisper.entitlements.
  • Build settings explicit: DEAD_CODE_STRIPPING=YES, ENABLE_USER_SCRIPT_SANDBOXING=NO (the post-compile script needs to read parakeet_helper/lfm25_formatter_helper/libomp.dylib from outside the build dir), LOCALIZATION_PREFERS_STRING_CATALOGS=YES.

Test plan

  • xcodebuild test -scheme ExecuWhisper -destination 'platform=macOS' → 68 tests in 15 suites pass.
  • DMG pipeline:
    • ./scripts/build.sh./scripts/create_dmg.sh → ~12 MB lightweight DMG, hdiutil verify VALID, mounted contents include the paste helper sub-app.
    • ./scripts/build.sh --bundle-models → ~1.0 GB self-contained DMG with model.pte / tokenizer.model / lfm2_5_350m_mlx_4w.pte / tokenizer.json / tokenizer_config.json, hdiutil verify VALID.
  • Manual dogfood:
    • Cold launch → first-run downloads → preload spinner → ⌃Space dictation → paste into front app.
    • Two back-to-back dictations succeed (no No audio was captured. regression).
    • Formatter rewrites questions instead of answering them (Yes reply path falls back to the raw transcript).

Made with Cursor

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 30, 2026
@seyeong-han seyeong-han force-pushed the execuwhisper-app-with-formatter branch from c1d454c to fdcbab8 Compare April 30, 2026 21:21
Introduce ExecuWhisper, a Superwhisper-style on-device dictation app
backed by ExecuTorch. Audio is captured locally with AVAudioEngine,
transcribed by Parakeet TDT (Metal backend), and optionally rewritten
by LFM2.5-350M (MLX backend) before paste/save. Both helpers run as
warm subprocesses over a JSONL stdin/stdout protocol so first paste
latency stays low across consecutive dictations.

App features:
- Global Ctrl+Space (configurable) overlay dictation that pastes the
  formatted text back into the active app via a stable Accessibility
  paste helper installed under Application Support, so Xcode rebuilds
  do not invalidate the granted permission.
- Single smart formatting prompt with safety net: rewrites dictation
  into final text, never answers spoken questions, falls back to the
  raw transcript when the formatter output is suspicious (length-ratio
  guard, prompt-echo guard, "Mode:" / "Sure" / "Here is" guards).
- Auto-preload of the Parakeet helper on launch and after foreground
  health checks, with a toolbar spinner while warming.
- Replacements (case-aware, longest-match-first, word-boundary aware).
- Session history with rename, pinning, recency grouping, search, and
  export to txt / json / srt.
- Built-in microphone selection, silence detection, and dictation
  shortcut recorder.

Build / packaging:
- XcodeGen project (project.yml) with hardened runtime, deployment
  target macOS 14.0, dead-code-stripping, explicit
  ENABLE_USER_SCRIPT_SANDBOXING=NO so the post-compile script can
  bundle the parakeet/lfm25 helpers and mlx.metallib next to the app.
- scripts/build.sh for lightweight or --bundle-models releases.
- scripts/create_dmg.sh validates required helpers and the paste
  helper sub-app, then emits a verified UDZO DMG.
- ExecuWhisper Paste Helper.app sub-bundle gives macOS a stable
  bundle identifier (org.pytorch.executorch.ExecuWhisper.PasteHelper)
  for Accessibility, so we are not asking for a new grant per build.

Models:
- First-launch download from younghan-meta/Parakeet-TDT-ExecuTorch-Metal
  and younghan-meta/LFM2.5-ExecuTorch-MLX into Application Support.
- Optional --bundle-models build for fully offline distribution.

Tests:
- 68 Swift Testing cases covering replacements, prompt safety,
  formatter fallback paths (question-answering, prompt echo,
  metadata echo), preload state machine, session export, and
  helper protocol wire format.

Made-with: Cursor
@seyeong-han seyeong-han force-pushed the execuwhisper-app-with-formatter branch from fdcbab8 to 29caabb Compare April 30, 2026 21:23
Add production-readiness fixes for audio routing, formatter validation, helper signing, release verification, and support diagnostics. This keeps the existing app PR focused while making the DMG safer to dogfood across different Mac audio setups.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant