|
1 | 1 | --- |
2 | 2 | title: Wake Word |
3 | | -last_edited: '2026-04-11T00:00:00.000Z' |
| 3 | +last_edited: '2026-04-12T00:00:00.000Z' |
4 | 4 | tocIsHidden: false |
5 | 5 | --- |
6 | 6 |
|
7 | | -Hark supports "Hey Hark" wake word detection using openWakeWord, an open-source library (Apache 2.0) running entirely on-device. Say "Hey Hark" and the microphone activates automatically. |
| 7 | +Hark supports "Hey Hark" wake word detection using openWakeWord, an open-source library (Apache 2.0) running entirely on-device. Say "Hey Hark" from any screen and the assistant panel appears with the mic already listening — no need to open the app first. |
8 | 8 |
|
9 | 9 | The name "Hark" means "to listen." It's also my name. The coincidence was too good to ignore. |
10 | 10 |
|
11 | 11 | ## How it works |
12 | 12 |
|
13 | | -The pipeline: AudioRecord (16kHz mono) feeds into Silero VAD (voice activity gate), then into openWakeWord (mel-spectrogram, speech embeddings, keyword detection), across the Pigeon bridge to ChatNotifier, which auto-starts the mic. |
| 13 | +A foreground Service (`WakeWordService`) owns the detection pipeline. AudioRecord (16kHz mono) feeds into Silero VAD (voice activity gate), then into openWakeWord (mel-spectrogram, speech embeddings, keyword detection). On detection, the service pauses the detector (releasing the mic for STT), then calls `VoiceInteractionService.showSession()` — the system-sanctioned path that lets voice assistants open their UI from the background without tripping Android 12+'s background activity launch restrictions. That triggers `HarkSession.onShow()`, which launches `OverlayActivity`, and the overlay's open callback auto-starts the mic. |
14 | 14 |
|
15 | 15 | Components: |
16 | 16 |
|
17 | 17 | - **openWakeWord**: ONNX-based wake word engine. Runs a mel-spectrogram model and an embedding model to detect the keyword. |
18 | 18 | - **Silero VAD**: Voice Activity Detection pre-filter. Saves battery by only running wake word inference when speech is detected. |
19 | 19 | - **Custom model**: `hey_harkh.onnx` (201 KB), trained via the openWakeWord Google Colab notebook. |
| 20 | +- **Foreground Service**: `WakeWordService` with `FOREGROUND_SERVICE_TYPE_MICROPHONE`, persistent notification, and `START_STICKY` so Android restarts it if the process is killed. |
| 21 | +- **VoiceInteractionService**: `HarkVoiceInteractionService.showSession()` is the cross-background overlay launcher. Works from any screen, including when the user has swiped Hark away from Recents. |
20 | 22 |
|
21 | 23 | ## Architecture |
22 | 24 |
|
23 | 25 | ``` |
24 | | -AudioRecord (16kHz) |
| 26 | +AudioRecord (16kHz, inside WakeWordService) |
25 | 27 | → Silero VAD (is someone speaking?) |
26 | 28 | → openWakeWord engine (is it "Hey Hark"?) |
27 | 29 | → WakeWordDetector.kt (Kotlin wrapper) |
28 | | - → HarkPlatformPlugin (Pigeon bridge) |
29 | | - → HarkResultFlutterApi.onWakeWordDetected() |
30 | | - → ChatNotifier (auto-starts microphone) |
| 30 | + → WakeWordService.onWakeWordDetected() |
| 31 | + ├─→ detector.pause() (release mic for STT) |
| 32 | + ├─→ HarkApplication.onWakeWordDetected() |
| 33 | + │ → HarkVoiceInteractionService.showSession() |
| 34 | + │ → HarkSession.onShow() → startActivity(OverlayActivity) |
| 35 | + │ → OverlayBridgeService.onOverlayOpened() → ChatNotifier.onMicPressed() |
| 36 | + └─→ HarkResultFlutterApi.onWakeWordDetected() (notify Dart for logging) |
31 | 37 | ``` |
32 | 38 |
|
33 | 39 | Detection threshold: 0.3. Cooldown: 1500ms between detections. |
34 | 40 |
|
35 | 41 | ## AudioRecord mutual exclusion |
36 | 42 |
|
37 | | -Android only allows one AudioRecord at a time. Wake word detection and speech recognition (SpeechRecognizer) both need the mic. Hark handles this by: |
| 43 | +Android only allows one AudioRecord at a time. Wake word detection and speech recognition (SpeechRecognizer) both need the mic. The service handles this by: |
38 | 44 |
|
39 | | -1. Wake word detected. Pause wake word engine (fully stops AudioRecord). |
40 | | -2. STT starts. Speech is transcribed. |
41 | | -3. STT finishes. Resume wake word engine (restarts AudioRecord). |
| 45 | +1. Wake word detected → pause wake word engine (fully stops AudioRecord). |
| 46 | +2. STT starts and takes the mic. |
| 47 | +3. STT finishes → Dart side calls `setWakeWordPaused(false)`, which sends an `ACTION_RESUME` intent to the service, which restarts the detector. |
42 | 48 |
|
43 | | -The pause/resume cycle causes a roughly 25-second buffer rebuild delay. openWakeWord needs about 10 seconds of audio context in its embedding buffer before it can reliably detect again. This is a known limitation being addressed. |
| 49 | +The pause/resume cycle causes a roughly 25-second buffer rebuild delay. openWakeWord needs about 10 seconds of audio context in its embedding buffer before it can reliably detect again. This is a known limitation being investigated. |
44 | 50 |
|
45 | | -## Current status: Phase 1 (shipped) |
| 51 | +## User controls |
46 | 52 |
|
47 | | -In-app wake word detection is working. When Hark is open, "Hey Hark" activates the mic. |
| 53 | +The foreground notification is always visible while wake word is running. It shows: |
48 | 54 |
|
49 | | -## Phase 2 (planned): Background service |
| 55 | +- **Small icon**: Monochrome Hark robot silhouette (required by Android's small-icon tinting rules). |
| 56 | +- **Title**: "Hark". |
| 57 | +- **Text**: "Listening for 'Hey Hark'". |
| 58 | +- **Tap**: Opens the main chat screen. |
| 59 | +- **Stop action**: Sends `ACTION_STOP` to the service, which releases the mic and removes the notification. Returns `START_NOT_STICKY` so Android does not auto-restart after an explicit stop. |
50 | 60 |
|
51 | | -A foreground service running wake word detection even when Hark is closed. "Hey Hark" from any screen would launch the overlay and start listening. |
| 61 | +Permissions required on Android 13+: |
52 | 62 |
|
53 | | -## Phase 3 (planned): Continuous listening session |
| 63 | +- `RECORD_AUDIO` (runtime, requested by the chat screen) |
| 64 | +- `POST_NOTIFICATIONS` (runtime, requested at init so the FG notification is visible) |
| 65 | +- `FOREGROUND_SERVICE` and `FOREGROUND_SERVICE_MICROPHONE` (manifest) |
54 | 66 |
|
55 | | -After "Hey Hark" starts a session, the mic stays open for the entire conversation. No need to say the wake word again. The user can interrupt while Hark is speaking (barge-in). This requires acoustic echo cancellation to prevent Hark from hearing its own TTS output. |
| 67 | +## Lifecycle |
| 68 | + |
| 69 | +- The service starts from `ChatNotifier._initAsync()` via `startWakeWordService()` in the Pigeon plugin, which sends an `ACTION_START` intent. |
| 70 | +- Android's `VoiceInteractionService` system binding keeps the process alive indefinitely, even when the user swipes Hark from Recents. |
| 71 | +- `START_STICKY` is an additional safety net: if the OS does kill the service for some reason, it will be restarted with a null intent, which the service handles by re-starting detection. |
| 72 | +- When the user taps **Stop** on the notification, the service releases the detector, removes the notification, and returns `START_NOT_STICKY` so Android does not resurrect it. |
| 73 | + |
| 74 | +## What's next |
| 75 | + |
| 76 | +- **Sensitivity slider** exposed via a Settings screen. |
| 77 | +- **Privacy indicator** in the app when the mic is hot. |
| 78 | +- **Battery impact measurement** on Moto G56 and one other mid-range device. |
| 79 | +- **Buffer-rebuild shortening** for the ~25s recovery after STT. |
| 80 | +- **Barge-in**: interrupt Hark's TTS mid-sentence. Requires acoustic echo cancellation research. |
56 | 81 |
|
57 | 82 | ## Dependencies |
58 | 83 |
|
|
0 commit comments