Skip to content

research(features): survey comparable STT/transcription projects for feature backlog#2

Merged
tstapler merged 3 commits into
mainfrom
agrapha-feature-research
May 9, 2026
Merged

research(features): survey comparable STT/transcription projects for feature backlog#2
tstapler merged 3 commits into
mainfrom
agrapha-feature-research

Conversation

@tstapler
Copy link
Copy Markdown
Owner

@tstapler tstapler commented May 9, 2026

Summary

  • Surveyed 8 open-source local-first STT and meeting-transcription projects to build a prioritized feature backlog for Agrapha
  • Produced 31 backlog items (11 High / 12 Medium / 8 Low) across four feature areas with README attribution notes for each
  • All attributions validated for accuracy; attribution notes are copy-paste-ready for the README

Projects researched

Project Stars Key contributions to backlog
VoxType 712 Engine selection architecture, SRT/VTT/JSON export, meeting mode
BlahST 172 Continuous dictation loop, speech-to-LLM one-shot patterns
Handy 21k Custom vocabulary injection, Apple Intelligence FFI, transcription history schema
Meetily 11.6k Meeting app auto-detection, SortFormer diarization
WhisperKit 6k SRT/VTT subtitle data model, on-device TTS
Hex 2k Dual-engine Parakeet+Whisper toggle pattern (macOS)
noScribe 2k Transcript correction editor UX
WhisperWriter 1k condition_on_previous_text, Silero VAD design

Top 3 highest-leverage items

  1. Custom Vocabulary injection (S effort, High) — wires into existing JNI initial_prompt; immediate accuracy improvement for recurring names and product terms
  2. Global Hotkey / Dictation Mode (M effort, High) — enables the entire push-to-talk cluster; opens a second daily-use pattern alongside meeting transcription
  3. SRT + VTT + JSON Export (XS each, High) — 3 serializers on data Agrapha already stores; unlocks video editor and automation workflows

Artifacts

project_plans/agrapha-feature-research/
  requirements.md                  feature areas, deliverable spec
  research/
    voxtype.md                     deep-dive: 7 engines, meeting mode, hooks
    blahst.md                      deep-dive: 6 scripts, LLM integration, TTS
    handy.md                       deep-dive: Tauri v2, VAD, Apple Intelligence
    comparable-projects.md         discovery: Meetily, OpenWhispr, WhisperKit, Hex, noScribe, …
  implementation/
    plan.md                        31-item prioritized backlog with attribution notes
    validation.md                  attribution audit, requirements coverage, all issues fixed

Test plan

  • Read plan.md and verify each attribution note is accurate for the cited project
  • Confirm all 4 required feature areas have at least one High-priority item
  • Spot-check 2–3 attribution URLs are live and resolve to the correct repos

🤖 Generated with Claude Code

Surveys voxtype, BlahST, Handy, Meetily, WhisperKit, Hex, noScribe, and
WhisperWriter to produce a prioritized feature backlog (11 High / 12 Medium /
8 Low) covering: push-to-talk/dictation mode, additional transcription engines
(Parakeet, Moonshine, SenseVoice), LLM integration patterns, and export formats
(SRT, VTT, JSON). All items include attribution notes for the README.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 9, 2026 23:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a feature-research pack for Agrapha by surveying comparable local-first STT/meeting transcription projects and turning findings into a prioritized backlog with copy-paste-ready README attribution notes.

Changes:

  • Added deep-dive research notes for VoxType, Handy, and BlahST.
  • Added a comparable-projects survey (incl. additional discoveries) and a requirements spec for the research deliverable.
  • Added an implementation backlog (plan.md) and a validation report (validation.md) tying research → prioritized items + attribution checks.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
project_plans/agrapha-feature-research/research/voxtype.md VoxType deep-dive: feature inventory, architecture notes, Agrapha relevance, attribution note.
project_plans/agrapha-feature-research/research/handy.md Handy deep-dive: dictation UX patterns, engine/VAD/history/Apple Intelligence notes, Agrapha relevance, attribution note.
project_plans/agrapha-feature-research/research/blahst.md BlahST deep-dive: script-based patterns, LLM/TTS pipeline, dictation loop, attribution note.
project_plans/agrapha-feature-research/research/comparable-projects.md Survey of additional comparable projects + “additional discoveries” section and per-project attribution notes.
project_plans/agrapha-feature-research/requirements.md Defines scope, feature areas, constraints, and the required backlog item template.
project_plans/agrapha-feature-research/implementation/plan.md 31-item prioritized backlog with “Inspired by” + README attribution notes + effort estimates.
project_plans/agrapha-feature-research/implementation/validation.md Validation report cross-checking requirements coverage and attribution correctness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


## Summary

Five additional open-source local-first STT and meeting-transcription projects were identified beyond the three seed projects. The most Agrapha-relevant are Meetily (meeting assistant closest in intent to Agrapha), OpenWhispr (macOS-native, VA + calendar integration, local diarization), and whisper-writer (four recording modes, VAD, continuous recording). All are MIT-licensed. Cloud-only tools and mobile-only apps were excluded.
Comment on lines +91 to +95
**URL**: https://github.com/savbell/whisper-writer
**Stars**: 1,049
**Language**: Python (PyQt5 GUI, faster-whisper)
**License**: MIT (implied — no LICENSE file found but standard open-source practices stated)
**Platform**: Windows, macOS, Linux
---

## Feature: Parakeet ONNX Engine
**Priority:** High
**Inspired by:** [Handy](https://github.com/cjpais/Handy), [WhisperWriter](https://github.com/savbell/whisper-writer)
**What they do:** Handy accepts a `custom_words` list that is injected as Whisper's `initial_prompt` parameter and as a Parakeet custom vocabulary, with fuzzy-match post-correction. WhisperWriter exposes `initial_prompt` directly as a config field for domain conditioning.
**What Agrapha would do:** Allow users to define a persistent list of names, project codes, and technical terms; inject them as Whisper's `initial_prompt` via the existing JNI bridge so beam search favors those tokens, with optional fuzzy-match correction post-transcription.
**Attribution note (README):** Custom vocabulary / dictionary injection pattern inspired by [Handy](https://github.com/cjpais/Handy) (MIT) and [WhisperWriter](https://github.com/savbell/whisper-writer) (MIT).
Comment on lines +13 to +17
| Additional transcription engines beyond Whisper | PARTIAL — no High item | "Parakeet ONNX Engine" (Medium), "Moonshine Engine" (Low), "SenseVoice/Paraformer" (Low), "macOS Native Speech Framework" (Medium) |
| LLM integration patterns | YES | "Multiple Named LLM Post-Processing Prompts" (High), "One-Shot Speech-to-LLM" (Medium), "Apple Intelligence On-Device Post-Processing" (Medium) |
| Export formats (Markdown, JSON, SRT, VTT) | YES | "SRT and VTT Export" (High), "JSON Export" (High) |

**Requirements gap:** Feature area 2 (additional transcription engines) has no High-priority backlog item. The requirements document states all four feature areas must be covered by at least one High-priority item. Parakeet ONNX Engine is the strongest candidate for promotion to High — it is the only alternative engine with a clear implementation path (ONNX Runtime for Java) and concrete evidence from three projects (VoxType, Handy, Meetily, plus the newly discovered Hex).
Comment on lines +89 to +100
## Verdict

**NEEDS REVISION**

The backlog requires the following changes before it is ready to use:

**Must fix (blocking):**
1. Promote "Parakeet ONNX Engine" from Medium to High priority to satisfy the requirements coverage rule for feature area 2 (additional transcription engines). This is the only gap against the four required High-priority coverage areas.
2. Fix "Parakeet ONNX Engine" — "What they do" field overstates Meetily's Parakeet implementation as "ONNX Runtime" when that is unconfirmed (Issue 1).
3. Add BlahST to the "Global Hotkey / Dictation Mode" attribution note (Issue 2).
4. Add WhisperWriter to the "Silero VAD" attribution note (Issue 3).

Comment on lines +175 to +185
- Low star count and no license specified; treat as inspiration only, not for attribution
- Most interesting differentiator: **macOS native Speech framework** (SFSpeechRecognizer) as one of the backends — zero additional model download, built into every Mac since macOS 10.15

### Agrapha Relevance

- **macOS native Speech framework as a fast/free engine**: SFSpeechRecognizer runs on-device (no download), supports English well, and is already optimised by Apple. Could be offered as the "quick start" engine before a user has downloaded a Whisper model. Latency is ~100–200 ms for short utterances
- Note: SFSpeechRecognizer sends audio to Apple servers by default unless `requiresOnDeviceRecognition = true` is set (available iOS 13+ / macOS 12+). This restriction must be surfaced to users in Agrapha's privacy model

### Attribution Note

> macOS native Speech framework engine integration pattern noted from [whisper-mac](https://github.com/Explosion-Scratch/whisper-mac).
tstapler and others added 2 commits May 9, 2026 16:14
…rder, and validation verdict

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tstapler tstapler merged commit 7c8112d into main May 9, 2026
1 check passed
@tstapler tstapler deleted the agrapha-feature-research branch May 9, 2026 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants