research(features): survey comparable STT/transcription projects for feature backlog by tstapler · Pull Request #2 · tstapler/agrapha

tstapler · 2026-05-09T23:01:10Z

Summary

Surveyed 8 open-source local-first STT and meeting-transcription projects to build a prioritized feature backlog for Agrapha
Produced 31 backlog items (11 High / 12 Medium / 8 Low) across four feature areas with README attribution notes for each
All attributions validated for accuracy; attribution notes are copy-paste-ready for the README

Projects researched

Project	Stars	Key contributions to backlog
VoxType	712	Engine selection architecture, SRT/VTT/JSON export, meeting mode
BlahST	172	Continuous dictation loop, speech-to-LLM one-shot patterns
Handy	21k	Custom vocabulary injection, Apple Intelligence FFI, transcription history schema
Meetily	11.6k	Meeting app auto-detection, SortFormer diarization
WhisperKit	6k	SRT/VTT subtitle data model, on-device TTS
Hex	2k	Dual-engine Parakeet+Whisper toggle pattern (macOS)
noScribe	2k	Transcript correction editor UX
WhisperWriter	1k	`condition_on_previous_text`, Silero VAD design

Top 3 highest-leverage items

Custom Vocabulary injection (S effort, High) — wires into existing JNI initial_prompt; immediate accuracy improvement for recurring names and product terms
Global Hotkey / Dictation Mode (M effort, High) — enables the entire push-to-talk cluster; opens a second daily-use pattern alongside meeting transcription
SRT + VTT + JSON Export (XS each, High) — 3 serializers on data Agrapha already stores; unlocks video editor and automation workflows

Artifacts

project_plans/agrapha-feature-research/
  requirements.md                  feature areas, deliverable spec
  research/
    voxtype.md                     deep-dive: 7 engines, meeting mode, hooks
    blahst.md                      deep-dive: 6 scripts, LLM integration, TTS
    handy.md                       deep-dive: Tauri v2, VAD, Apple Intelligence
    comparable-projects.md         discovery: Meetily, OpenWhispr, WhisperKit, Hex, noScribe, …
  implementation/
    plan.md                        31-item prioritized backlog with attribution notes
    validation.md                  attribution audit, requirements coverage, all issues fixed

Test plan

Read plan.md and verify each attribution note is accurate for the cited project
Confirm all 4 required feature areas have at least one High-priority item
Spot-check 2–3 attribution URLs are live and resolve to the correct repos

🤖 Generated with Claude Code

Surveys voxtype, BlahST, Handy, Meetily, WhisperKit, Hex, noScribe, and WhisperWriter to produce a prioritized feature backlog (11 High / 12 Medium / 8 Low) covering: push-to-talk/dictation mode, additional transcription engines (Parakeet, Moonshine, SenseVoice), LLM integration patterns, and export formats (SRT, VTT, JSON). All items include attribution notes for the README. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a feature-research pack for Agrapha by surveying comparable local-first STT/meeting transcription projects and turning findings into a prioritized backlog with copy-paste-ready README attribution notes.

Changes:

Added deep-dive research notes for VoxType, Handy, and BlahST.
Added a comparable-projects survey (incl. additional discoveries) and a requirements spec for the research deliverable.
Added an implementation backlog (plan.md) and a validation report (validation.md) tying research → prioritized items + attribution checks.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
project_plans/agrapha-feature-research/research/voxtype.md	VoxType deep-dive: feature inventory, architecture notes, Agrapha relevance, attribution note.
project_plans/agrapha-feature-research/research/handy.md	Handy deep-dive: dictation UX patterns, engine/VAD/history/Apple Intelligence notes, Agrapha relevance, attribution note.
project_plans/agrapha-feature-research/research/blahst.md	BlahST deep-dive: script-based patterns, LLM/TTS pipeline, dictation loop, attribution note.
project_plans/agrapha-feature-research/research/comparable-projects.md	Survey of additional comparable projects + “additional discoveries” section and per-project attribution notes.
project_plans/agrapha-feature-research/requirements.md	Defines scope, feature areas, constraints, and the required backlog item template.
project_plans/agrapha-feature-research/implementation/plan.md	31-item prioritized backlog with “Inspired by” + README attribution notes + effort estimates.
project_plans/agrapha-feature-research/implementation/validation.md	Validation report cross-checking requirements coverage and attribution correctness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+
+## Summary
+
+Five additional open-source local-first STT and meeting-transcription projects were identified beyond the three seed projects. The most Agrapha-relevant are Meetily (meeting assistant closest in intent to Agrapha), OpenWhispr (macOS-native, VA + calendar integration, local diarization), and whisper-writer (four recording modes, VAD, continuous recording). All are MIT-licensed. Cloud-only tools and mobile-only apps were excluded.


+**URL**: https://github.com/savbell/whisper-writer
+**Stars**: 1,049
+**Language**: Python (PyQt5 GUI, faster-whisper)
+**License**: MIT (implied — no LICENSE file found but standard open-source practices stated)
+**Platform**: Windows, macOS, Linux


+---
+
+## Feature: Parakeet ONNX Engine
+**Priority:** High


+**Inspired by:** [Handy](https://github.com/cjpais/Handy), [WhisperWriter](https://github.com/savbell/whisper-writer)
+**What they do:** Handy accepts a `custom_words` list that is injected as Whisper's `initial_prompt` parameter and as a Parakeet custom vocabulary, with fuzzy-match post-correction. WhisperWriter exposes `initial_prompt` directly as a config field for domain conditioning.
+**What Agrapha would do:** Allow users to define a persistent list of names, project codes, and technical terms; inject them as Whisper's `initial_prompt` via the existing JNI bridge so beam search favors those tokens, with optional fuzzy-match correction post-transcription.
+**Attribution note (README):** Custom vocabulary / dictionary injection pattern inspired by [Handy](https://github.com/cjpais/Handy) (MIT) and [WhisperWriter](https://github.com/savbell/whisper-writer) (MIT).


+| Additional transcription engines beyond Whisper | PARTIAL — no High item | "Parakeet ONNX Engine" (Medium), "Moonshine Engine" (Low), "SenseVoice/Paraformer" (Low), "macOS Native Speech Framework" (Medium) |
+| LLM integration patterns | YES | "Multiple Named LLM Post-Processing Prompts" (High), "One-Shot Speech-to-LLM" (Medium), "Apple Intelligence On-Device Post-Processing" (Medium) |
+| Export formats (Markdown, JSON, SRT, VTT) | YES | "SRT and VTT Export" (High), "JSON Export" (High) |
+
+**Requirements gap:** Feature area 2 (additional transcription engines) has no High-priority backlog item. The requirements document states all four feature areas must be covered by at least one High-priority item. Parakeet ONNX Engine is the strongest candidate for promotion to High — it is the only alternative engine with a clear implementation path (ONNX Runtime for Java) and concrete evidence from three projects (VoxType, Handy, Meetily, plus the newly discovered Hex).


+## Verdict
+
+**NEEDS REVISION**
+
+The backlog requires the following changes before it is ready to use:
+
+**Must fix (blocking):**
+1. Promote "Parakeet ONNX Engine" from Medium to High priority to satisfy the requirements coverage rule for feature area 2 (additional transcription engines). This is the only gap against the four required High-priority coverage areas.
+2. Fix "Parakeet ONNX Engine" — "What they do" field overstates Meetily's Parakeet implementation as "ONNX Runtime" when that is unconfirmed (Issue 1).
+3. Add BlahST to the "Global Hotkey / Dictation Mode" attribution note (Issue 2).
+4. Add WhisperWriter to the "Silero VAD" attribution note (Issue 3).
+


+- Low star count and no license specified; treat as inspiration only, not for attribution
+- Most interesting differentiator: **macOS native Speech framework** (SFSpeechRecognizer) as one of the backends — zero additional model download, built into every Mac since macOS 10.15
+
+### Agrapha Relevance
+
+- **macOS native Speech framework as a fast/free engine**: SFSpeechRecognizer runs on-device (no download), supports English well, and is already optimised by Apple. Could be offered as the "quick start" engine before a user has downloaded a Whisper model. Latency is ~100–200 ms for short utterances
+- Note: SFSpeechRecognizer sends audio to Apple servers by default unless `requiresOnDeviceRecognition = true` is set (available iOS 13+ / macOS 12+). This restriction must be surfaced to users in Agrapha's privacy model
+
+### Attribution Note
+
+> macOS native Speech framework engine integration pattern noted from [whisper-mac](https://github.com/Explosion-Scratch/whisper-mac).


…rder, and validation verdict Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 9, 2026 23:01

Copilot started reviewing on behalf of tstapler May 9, 2026 23:01 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

tstapler and others added 2 commits May 9, 2026 16:14

fix(research): address Copilot review comments on licenses, section o…

48094b0

…rder, and validation verdict Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(research): add feature comparison table vs reference projects

d672bb3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tstapler merged commit 7c8112d into main May 9, 2026
1 check passed

tstapler deleted the agrapha-feature-research branch May 9, 2026 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(features): survey comparable STT/transcription projects for feature backlog#2

research(features): survey comparable STT/transcription projects for feature backlog#2
tstapler merged 3 commits into
mainfrom
agrapha-feature-research

tstapler commented May 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Summary

		Five additional open-source local-first STT and meeting-transcription projects were identified beyond the three seed projects. The most Agrapha-relevant are Meetily (meeting assistant closest in intent to Agrapha), OpenWhispr (macOS-native, VA + calendar integration, local diarization), and whisper-writer (four recording modes, VAD, continuous recording). All are MIT-licensed. Cloud-only tools and mobile-only apps were excluded.

Conversation

tstapler commented May 9, 2026

Summary

Projects researched

Top 3 highest-leverage items

Artifacts

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants