feat(voiceclaw): Edge voice processing — local VAD, KWS, ASR, TTS with privacy routing by HenryZ838978 · Pull Request #3 · OpenBMB/EdgeClaw

HenryZ838978 · 2026-03-19T05:51:39Z

Summary

VoiceClaw is a new extension that adds edge-side voice processing to EdgeClaw, closing the audio privacy gap that exists when voice data is sent to cloud STT/TTS providers before any privacy checks.

Key Features

Component	Implementation	Privacy
VAD	Silero VAD via sherpa-onnx-node (~0ms)	Always local
KWS	Configurable wake words, sherpa-onnx-node	Always local
ASR	SenseVoice / Whisper / Paraformer (ONNX)	Local for S2/S3, cloud fallback for S1
TTS	VITS / MatchaTTS / edge-tts	Local for S3, cloud fallback for S1/S2

Privacy Guarantee

S1 (Safe): Cloud ASR/TTS allowed for best quality
S2 (Sensitive): ASR forced local, transcript desensitized before cloud
S3 (Private): Both ASR and TTS forced local — raw audio NEVER leaves the device

Integrates with GuardClaw's existing three-tier privacy system via OpenClaw plugin hooks.

What's Included

extensions/voiceclaw/ — 17 files, ~2400 lines of TypeScript + HTML
WebSocket audio server for real-time browser microphone streaming
Session state machine with barge-in detection
Browser test console with VU meter and event logging
Unit tests for VAD engine and privacy manager
Full README with architecture docs and config examples

Technical Choices

sherpa-onnx-node for all local inference — pure native addon, no Python dependency
Zero changes to existing EdgeClaw/GuardClaw code
Follows existing plugin conventions (openclaw.plugin.json, hooks, configSchema)

Test Plan

pnpm vitest run extensions/voiceclaw/test/
Load plugin with pnpm openclaw gateway run and verify VoiceClaw banner
Open browser console at http://localhost:8501/voiceclaw/ and test microphone streaming
Verify S3 session forces local ASR/TTS (no outbound network calls)
Verify S1 session allows cloud fallback when configured

…routing VoiceClaw brings local VAD, KWS (keyword spotting), ASR, and TTS to EdgeClaw using sherpa-onnx-node. Integrates with GuardClaw's three-tier privacy system to ensure voice data stays on-device for S2/S3 scenarios. Key features: - Silero VAD for real-time speech detection (~0ms latency) - Keyword spotting for configurable wake words - Multi-backend ASR: SenseVoice, Whisper, Paraformer (all local ONNX) - Multi-backend TTS: VITS, MatchaTTS, edge-tts - Voice privacy router: forces local ASR/TTS based on S1/S2/S3 level - WebSocket audio server for browser microphone streaming - Session state machine with barge-in detection - Browser test console with VU meter and event logging - Full test suite for VAD engine and privacy manager Privacy guarantee: for S2/S3 sessions, raw audio NEVER leaves the device. 17 files, 2423 lines of TypeScript + HTML. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voiceclaw): Edge voice processing — local VAD, KWS, ASR, TTS with privacy routing#3

feat(voiceclaw): Edge voice processing — local VAD, KWS, ASR, TTS with privacy routing#3
HenryZ838978 wants to merge 1 commit intoOpenBMB:mainfrom
HenryZ838978:feat/voiceclaw

HenryZ838978 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HenryZ838978 commented Mar 19, 2026

Summary

Key Features

Privacy Guarantee

What's Included

Technical Choices

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant