Skip to content

Commit ea02ebe

Browse files
authored
docs: update wake word + roadmap for hark PR #19 (#8)
- wake-word.mdx: rewritten around the shipped foreground service architecture (WakeWordService, showSession(), Stop action) and added a Lifecycle + User Controls section. - architecture.mdx: replaced the "Phase 2 planned" wake word paragraph with the shipped FG service + VIS.showSession() overlay launch flow. - roadmap.mdx: moved wake word robustness + overlay launch + lifecycle refresh to Shipped. Replaced stale "self-hosted / BYOK / Gemma 4" in-progress bullets with current priorities (Settings, action chips, wake word polish, STT, release packaging) and a Deferred section.
1 parent 2bebb57 commit ea02ebe

3 files changed

Lines changed: 68 additions & 34 deletions

File tree

content/docs/hark/architecture.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,11 @@ Main (Dart) -> HarkMainApi (Kotlin) -> Overlay relay -> HarkOverlayFlu
5959

6060
## Wake word detection
6161

62-
Hark supports hands-free activation via the wake phrase "Hey Hark". Wake word detection runs on-device using [openWakeWord](https://github.com/dscripka/openWakeWord) (Apache 2.0) with ONNX Runtime for inference. A custom-trained model (201KB) listens continuously for the trigger phrase and auto-starts the microphone when detected. The wake word engine is integrated through Pigeon bindings in `hark_platform`, consistent with the rest of the native bridge.
62+
Hark supports hands-free activation via the wake phrase "Hey Hark". Wake word detection runs on-device using [openWakeWord](https://github.com/dscripka/openWakeWord) (Apache 2.0) with ONNX Runtime for inference. A custom-trained 201KB model listens continuously for the trigger phrase. Detection runs inside a dedicated foreground `WakeWordService` (`FOREGROUND_SERVICE_TYPE_MICROPHONE`) with a persistent notification, so it survives app swipe-from-recents and OEM background kill behavior. The service is controlled through Pigeon bindings in `hark_platform`, consistent with the rest of the native bridge.
6363

64-
AudioRecord is a shared resource on Android, so the wake word listener and STT cannot run simultaneously. When the wake phrase is detected, the wake word engine stops its audio stream before STT begins listening. Once the voice command completes, the wake word engine restarts. This mutual exclusion is handled automatically in the platform plugin. The wake word engine persists even when the app is backgrounded, so detection continues across app switches. Background service support (Phase 2) will allow detection when the app is fully closed.
64+
On detection, the service pauses the detector (releasing the mic for STT), then calls `HarkApplication.onWakeWordDetected()`, which invokes `HarkVoiceInteractionService.showSession()`. This is the system-sanctioned path for voice assistants to open their UI from the background — Android 12+'s normal background activity launch restrictions don't apply. That triggers `HarkSession.onShow()`, which starts `OverlayActivity`, and the overlay's `onOverlayOpened` callback auto-starts the mic. Say "Hey Hark" from any screen — including the lock screen — and the assistant panel appears with the mic already listening.
65+
66+
AudioRecord is a shared resource on Android, so the wake word listener and STT cannot run simultaneously. After STT finishes, the Dart side sends a resume intent to the service, which restarts the detector. The notification has a **Stop** action that releases the mic without relaunching the app.
6567

6668
## Two-stage resolution
6769

content/docs/hark/wake-word.mdx

Lines changed: 43 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,83 @@
11
---
22
title: Wake Word
3-
last_edited: '2026-04-11T00:00:00.000Z'
3+
last_edited: '2026-04-12T00:00:00.000Z'
44
tocIsHidden: false
55
---
66

7-
Hark supports "Hey Hark" wake word detection using openWakeWord, an open-source library (Apache 2.0) running entirely on-device. Say "Hey Hark" and the microphone activates automatically.
7+
Hark supports "Hey Hark" wake word detection using openWakeWord, an open-source library (Apache 2.0) running entirely on-device. Say "Hey Hark" from any screen and the assistant panel appears with the mic already listening — no need to open the app first.
88

99
The name "Hark" means "to listen." It's also my name. The coincidence was too good to ignore.
1010

1111
## How it works
1212

13-
The pipeline: AudioRecord (16kHz mono) feeds into Silero VAD (voice activity gate), then into openWakeWord (mel-spectrogram, speech embeddings, keyword detection), across the Pigeon bridge to ChatNotifier, which auto-starts the mic.
13+
A foreground Service (`WakeWordService`) owns the detection pipeline. AudioRecord (16kHz mono) feeds into Silero VAD (voice activity gate), then into openWakeWord (mel-spectrogram, speech embeddings, keyword detection). On detection, the service pauses the detector (releasing the mic for STT), then calls `VoiceInteractionService.showSession()` — the system-sanctioned path that lets voice assistants open their UI from the background without tripping Android 12+'s background activity launch restrictions. That triggers `HarkSession.onShow()`, which launches `OverlayActivity`, and the overlay's open callback auto-starts the mic.
1414

1515
Components:
1616

1717
- **openWakeWord**: ONNX-based wake word engine. Runs a mel-spectrogram model and an embedding model to detect the keyword.
1818
- **Silero VAD**: Voice Activity Detection pre-filter. Saves battery by only running wake word inference when speech is detected.
1919
- **Custom model**: `hey_harkh.onnx` (201 KB), trained via the openWakeWord Google Colab notebook.
20+
- **Foreground Service**: `WakeWordService` with `FOREGROUND_SERVICE_TYPE_MICROPHONE`, persistent notification, and `START_STICKY` so Android restarts it if the process is killed.
21+
- **VoiceInteractionService**: `HarkVoiceInteractionService.showSession()` is the cross-background overlay launcher. Works from any screen, including when the user has swiped Hark away from Recents.
2022

2123
## Architecture
2224

2325
```
24-
AudioRecord (16kHz)
26+
AudioRecord (16kHz, inside WakeWordService)
2527
→ Silero VAD (is someone speaking?)
2628
→ openWakeWord engine (is it "Hey Hark"?)
2729
→ WakeWordDetector.kt (Kotlin wrapper)
28-
→ HarkPlatformPlugin (Pigeon bridge)
29-
→ HarkResultFlutterApi.onWakeWordDetected()
30-
→ ChatNotifier (auto-starts microphone)
30+
→ WakeWordService.onWakeWordDetected()
31+
├─→ detector.pause() (release mic for STT)
32+
├─→ HarkApplication.onWakeWordDetected()
33+
│ → HarkVoiceInteractionService.showSession()
34+
│ → HarkSession.onShow() → startActivity(OverlayActivity)
35+
│ → OverlayBridgeService.onOverlayOpened() → ChatNotifier.onMicPressed()
36+
└─→ HarkResultFlutterApi.onWakeWordDetected() (notify Dart for logging)
3137
```
3238

3339
Detection threshold: 0.3. Cooldown: 1500ms between detections.
3440

3541
## AudioRecord mutual exclusion
3642

37-
Android only allows one AudioRecord at a time. Wake word detection and speech recognition (SpeechRecognizer) both need the mic. Hark handles this by:
43+
Android only allows one AudioRecord at a time. Wake word detection and speech recognition (SpeechRecognizer) both need the mic. The service handles this by:
3844

39-
1. Wake word detected. Pause wake word engine (fully stops AudioRecord).
40-
2. STT starts. Speech is transcribed.
41-
3. STT finishes. Resume wake word engine (restarts AudioRecord).
45+
1. Wake word detected → pause wake word engine (fully stops AudioRecord).
46+
2. STT starts and takes the mic.
47+
3. STT finishes → Dart side calls `setWakeWordPaused(false)`, which sends an `ACTION_RESUME` intent to the service, which restarts the detector.
4248

43-
The pause/resume cycle causes a roughly 25-second buffer rebuild delay. openWakeWord needs about 10 seconds of audio context in its embedding buffer before it can reliably detect again. This is a known limitation being addressed.
49+
The pause/resume cycle causes a roughly 25-second buffer rebuild delay. openWakeWord needs about 10 seconds of audio context in its embedding buffer before it can reliably detect again. This is a known limitation being investigated.
4450

45-
## Current status: Phase 1 (shipped)
51+
## User controls
4652

47-
In-app wake word detection is working. When Hark is open, "Hey Hark" activates the mic.
53+
The foreground notification is always visible while wake word is running. It shows:
4854

49-
## Phase 2 (planned): Background service
55+
- **Small icon**: Monochrome Hark robot silhouette (required by Android's small-icon tinting rules).
56+
- **Title**: "Hark".
57+
- **Text**: "Listening for 'Hey Hark'".
58+
- **Tap**: Opens the main chat screen.
59+
- **Stop action**: Sends `ACTION_STOP` to the service, which releases the mic and removes the notification. Returns `START_NOT_STICKY` so Android does not auto-restart after an explicit stop.
5060

51-
A foreground service running wake word detection even when Hark is closed. "Hey Hark" from any screen would launch the overlay and start listening.
61+
Permissions required on Android 13+:
5262

53-
## Phase 3 (planned): Continuous listening session
63+
- `RECORD_AUDIO` (runtime, requested by the chat screen)
64+
- `POST_NOTIFICATIONS` (runtime, requested at init so the FG notification is visible)
65+
- `FOREGROUND_SERVICE` and `FOREGROUND_SERVICE_MICROPHONE` (manifest)
5466

55-
After "Hey Hark" starts a session, the mic stays open for the entire conversation. No need to say the wake word again. The user can interrupt while Hark is speaking (barge-in). This requires acoustic echo cancellation to prevent Hark from hearing its own TTS output.
67+
## Lifecycle
68+
69+
- The service starts from `ChatNotifier._initAsync()` via `startWakeWordService()` in the Pigeon plugin, which sends an `ACTION_START` intent.
70+
- Android's `VoiceInteractionService` system binding keeps the process alive indefinitely, even when the user swipes Hark from Recents.
71+
- `START_STICKY` is an additional safety net: if the OS does kill the service for some reason, it will be restarted with a null intent, which the service handles by re-starting detection.
72+
- When the user taps **Stop** on the notification, the service releases the detector, removes the notification, and returns `START_NOT_STICKY` so Android does not resurrect it.
73+
74+
## What's next
75+
76+
- **Sensitivity slider** exposed via a Settings screen.
77+
- **Privacy indicator** in the app when the mic is hot.
78+
- **Battery impact measurement** on Moto G56 and one other mid-range device.
79+
- **Buffer-rebuild shortening** for the ~25s recovery after STT.
80+
- **Barge-in**: interrupt Hark's TTS mid-sentence. Requires acoustic echo cancellation research.
5681

5782
## Dependencies
5883

content/docs/roadmap.mdx

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Roadmap
3-
last_edited: '2026-04-11T00:00:00.000Z'
3+
last_edited: '2026-04-12T00:00:00.000Z'
44
tocIsHidden: false
55
---
66

@@ -25,26 +25,33 @@ Other SDKs are not documented as installable products until there is code to ins
2525
### Shipped
2626

2727
- On-device discovery via `ContentProvider` scanning
28-
- Two-stage NLU: EmbeddingGemma 308M for intent selection, Qwen3 0.6B for slot filling
28+
- Two-stage NLU: EmbeddingGemma 308M for intent selection, Qwen3 0.5B for slot filling
2929
- Foreground and broadcast dispatch paths with async result correlation
3030
- Registration as Android system assistant (`VoiceInteractionService`, `ROLE_ASSISTANT`)
3131
- Continuous listening mode after assistant-gesture activation
32-
- Lightweight overlay via FlutterEngineGroup with thin UI shell (zero model loading, instant startup)
33-
- hark_platform plugin with Pigeon for type-safe platform communication
34-
- Tested integrations with the OACP Test App demo, Breezy Weather, Binary Eye, Voice Recorder, Wikipedia, ArchiveTune
32+
- Lightweight overlay via `FlutterEngineGroup` with thin UI shell (zero model loading, instant startup)
33+
- `hark_platform` plugin with Pigeon for type-safe platform communication
34+
- **"Hey Hark" wake word** using openWakeWord with ONNX Runtime and a custom-trained 201 KB model
35+
- **Wake word foreground service** with persistent notification, Stop action, and `START_STICKY` restart — detection survives swipe-from-recents
36+
- **Overlay launch on detection** via `VoiceInteractionService.showSession()` — say "Hey Hark" from any screen and the assistant panel appears
37+
- **Lifecycle capability refresh** so uninstalled OACP apps drop out of the registry on app resume without a restart
38+
- Tested integrations with the OACP Test App demo, Breezy Weather, Binary Eye, Voice Recorder, Libre Camera, Wikipedia, ArchiveTune
3539

3640
### In progress
3741

38-
- **Wake word** - "Hey Hark" on-device detection using openWakeWord with ONNX Runtime. Phase 1 (in-app listening, auto-mic activation) is working. Phase 2 (background service for detection when the app is fully closed) is next.
39-
- **Self-hosted inference** - connect Hark to Ollama or LM Studio on a local network for unlimited context and `OACP.md` consumption
40-
- **BYOK cloud fallback** - bring-your-own-key for OpenAI, Gemini, Anthropic when no local model is available
41-
- **Better STT** - evaluate whisper.cpp and sherpa-onnx for fully on-device speech recognition
42-
- **Disambiguation UI** - show top candidates as chips when scores are close and learn from user feedback
42+
- **Settings screen** — permissions, wake word toggle, model info, about
43+
- **Action chips and disambiguation** — tappable chips in chat bubbles, disambiguation buttons when top scores are close
44+
- **Wake word polish** — sensitivity slider, privacy indicator, battery measurement, buffer-rebuild shortening
45+
- **Better STT** evaluate whisper.cpp and sherpa-onnx for fully on-device speech recognition
46+
- **Release packaging** — proper release signing, GitHub Releases, F-Droid submission
4347

44-
### Planned
45-
- **Personalization** - assistant learns user preferences and per-app vocabulary over time
46-
- **Gemma 4 single-model pipeline** - once `flutter_gemma` supports it, collapse the two-stage pipeline into a single call
47-
- **iOS** - exploring feasibility
48+
### Deferred
49+
50+
- **Self-hosted inference** (Ollama, LM Studio) — the two-stage local pipeline is good enough for current capabilities. Revisit once a real quality ceiling appears.
51+
- **BYOK cloud** — defer until self-hosted lands or users demand cloud.
52+
- **Gemma 4 single-model pipeline** — waiting on `flutter_gemma` support and a clear win over the two-stage stack.
53+
- **Barge-in** (interrupt Hark's TTS mid-sentence) — requires acoustic echo cancellation research.
54+
- **iOS** — exploring feasibility.
4855

4956
## Ecosystem
5057

0 commit comments

Comments
 (0)