research(features): survey comparable STT/transcription projects for feature backlog#2
Merged
Merged
Conversation
Surveys voxtype, BlahST, Handy, Meetily, WhisperKit, Hex, noScribe, and WhisperWriter to produce a prioritized feature backlog (11 High / 12 Medium / 8 Low) covering: push-to-talk/dictation mode, additional transcription engines (Parakeet, Moonshine, SenseVoice), LLM integration patterns, and export formats (SRT, VTT, JSON). All items include attribution notes for the README. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a feature-research pack for Agrapha by surveying comparable local-first STT/meeting transcription projects and turning findings into a prioritized backlog with copy-paste-ready README attribution notes.
Changes:
- Added deep-dive research notes for VoxType, Handy, and BlahST.
- Added a comparable-projects survey (incl. additional discoveries) and a requirements spec for the research deliverable.
- Added an implementation backlog (
plan.md) and a validation report (validation.md) tying research → prioritized items + attribution checks.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| project_plans/agrapha-feature-research/research/voxtype.md | VoxType deep-dive: feature inventory, architecture notes, Agrapha relevance, attribution note. |
| project_plans/agrapha-feature-research/research/handy.md | Handy deep-dive: dictation UX patterns, engine/VAD/history/Apple Intelligence notes, Agrapha relevance, attribution note. |
| project_plans/agrapha-feature-research/research/blahst.md | BlahST deep-dive: script-based patterns, LLM/TTS pipeline, dictation loop, attribution note. |
| project_plans/agrapha-feature-research/research/comparable-projects.md | Survey of additional comparable projects + “additional discoveries” section and per-project attribution notes. |
| project_plans/agrapha-feature-research/requirements.md | Defines scope, feature areas, constraints, and the required backlog item template. |
| project_plans/agrapha-feature-research/implementation/plan.md | 31-item prioritized backlog with “Inspired by” + README attribution notes + effort estimates. |
| project_plans/agrapha-feature-research/implementation/validation.md | Validation report cross-checking requirements coverage and attribution correctness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| ## Summary | ||
|
|
||
| Five additional open-source local-first STT and meeting-transcription projects were identified beyond the three seed projects. The most Agrapha-relevant are Meetily (meeting assistant closest in intent to Agrapha), OpenWhispr (macOS-native, VA + calendar integration, local diarization), and whisper-writer (four recording modes, VAD, continuous recording). All are MIT-licensed. Cloud-only tools and mobile-only apps were excluded. |
Comment on lines
+91
to
+95
| **URL**: https://github.com/savbell/whisper-writer | ||
| **Stars**: 1,049 | ||
| **Language**: Python (PyQt5 GUI, faster-whisper) | ||
| **License**: MIT (implied — no LICENSE file found but standard open-source practices stated) | ||
| **Platform**: Windows, macOS, Linux |
| --- | ||
|
|
||
| ## Feature: Parakeet ONNX Engine | ||
| **Priority:** High |
| **Inspired by:** [Handy](https://github.com/cjpais/Handy), [WhisperWriter](https://github.com/savbell/whisper-writer) | ||
| **What they do:** Handy accepts a `custom_words` list that is injected as Whisper's `initial_prompt` parameter and as a Parakeet custom vocabulary, with fuzzy-match post-correction. WhisperWriter exposes `initial_prompt` directly as a config field for domain conditioning. | ||
| **What Agrapha would do:** Allow users to define a persistent list of names, project codes, and technical terms; inject them as Whisper's `initial_prompt` via the existing JNI bridge so beam search favors those tokens, with optional fuzzy-match correction post-transcription. | ||
| **Attribution note (README):** Custom vocabulary / dictionary injection pattern inspired by [Handy](https://github.com/cjpais/Handy) (MIT) and [WhisperWriter](https://github.com/savbell/whisper-writer) (MIT). |
Comment on lines
+13
to
+17
| | Additional transcription engines beyond Whisper | PARTIAL — no High item | "Parakeet ONNX Engine" (Medium), "Moonshine Engine" (Low), "SenseVoice/Paraformer" (Low), "macOS Native Speech Framework" (Medium) | | ||
| | LLM integration patterns | YES | "Multiple Named LLM Post-Processing Prompts" (High), "One-Shot Speech-to-LLM" (Medium), "Apple Intelligence On-Device Post-Processing" (Medium) | | ||
| | Export formats (Markdown, JSON, SRT, VTT) | YES | "SRT and VTT Export" (High), "JSON Export" (High) | | ||
|
|
||
| **Requirements gap:** Feature area 2 (additional transcription engines) has no High-priority backlog item. The requirements document states all four feature areas must be covered by at least one High-priority item. Parakeet ONNX Engine is the strongest candidate for promotion to High — it is the only alternative engine with a clear implementation path (ONNX Runtime for Java) and concrete evidence from three projects (VoxType, Handy, Meetily, plus the newly discovered Hex). |
Comment on lines
+89
to
+100
| ## Verdict | ||
|
|
||
| **NEEDS REVISION** | ||
|
|
||
| The backlog requires the following changes before it is ready to use: | ||
|
|
||
| **Must fix (blocking):** | ||
| 1. Promote "Parakeet ONNX Engine" from Medium to High priority to satisfy the requirements coverage rule for feature area 2 (additional transcription engines). This is the only gap against the four required High-priority coverage areas. | ||
| 2. Fix "Parakeet ONNX Engine" — "What they do" field overstates Meetily's Parakeet implementation as "ONNX Runtime" when that is unconfirmed (Issue 1). | ||
| 3. Add BlahST to the "Global Hotkey / Dictation Mode" attribution note (Issue 2). | ||
| 4. Add WhisperWriter to the "Silero VAD" attribution note (Issue 3). | ||
|
|
Comment on lines
+175
to
+185
| - Low star count and no license specified; treat as inspiration only, not for attribution | ||
| - Most interesting differentiator: **macOS native Speech framework** (SFSpeechRecognizer) as one of the backends — zero additional model download, built into every Mac since macOS 10.15 | ||
|
|
||
| ### Agrapha Relevance | ||
|
|
||
| - **macOS native Speech framework as a fast/free engine**: SFSpeechRecognizer runs on-device (no download), supports English well, and is already optimised by Apple. Could be offered as the "quick start" engine before a user has downloaded a Whisper model. Latency is ~100–200 ms for short utterances | ||
| - Note: SFSpeechRecognizer sends audio to Apple servers by default unless `requiresOnDeviceRecognition = true` is set (available iOS 13+ / macOS 12+). This restriction must be surfaced to users in Agrapha's privacy model | ||
|
|
||
| ### Attribution Note | ||
|
|
||
| > macOS native Speech framework engine integration pattern noted from [whisper-mac](https://github.com/Explosion-Scratch/whisper-mac). |
…rder, and validation verdict Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Projects researched
condition_on_previous_text, Silero VAD designTop 3 highest-leverage items
initial_prompt; immediate accuracy improvement for recurring names and product termsArtifacts
Test plan
plan.mdand verify each attribution note is accurate for the cited project🤖 Generated with Claude Code