Skip to content

feat(voiceclaw): Edge voice processing — local VAD, KWS, ASR, TTS with privacy routing#3

Open
HenryZ838978 wants to merge 1 commit intoOpenBMB:mainfrom
HenryZ838978:feat/voiceclaw
Open

feat(voiceclaw): Edge voice processing — local VAD, KWS, ASR, TTS with privacy routing#3
HenryZ838978 wants to merge 1 commit intoOpenBMB:mainfrom
HenryZ838978:feat/voiceclaw

Conversation

@HenryZ838978
Copy link

Summary

VoiceClaw is a new extension that adds edge-side voice processing to EdgeClaw, closing the audio privacy gap that exists when voice data is sent to cloud STT/TTS providers before any privacy checks.

Key Features

Component Implementation Privacy
VAD Silero VAD via sherpa-onnx-node (~0ms) Always local
KWS Configurable wake words, sherpa-onnx-node Always local
ASR SenseVoice / Whisper / Paraformer (ONNX) Local for S2/S3, cloud fallback for S1
TTS VITS / MatchaTTS / edge-tts Local for S3, cloud fallback for S1/S2

Privacy Guarantee

  • S1 (Safe): Cloud ASR/TTS allowed for best quality
  • S2 (Sensitive): ASR forced local, transcript desensitized before cloud
  • S3 (Private): Both ASR and TTS forced local — raw audio NEVER leaves the device

Integrates with GuardClaw's existing three-tier privacy system via OpenClaw plugin hooks.

What's Included

  • extensions/voiceclaw/ — 17 files, ~2400 lines of TypeScript + HTML
  • WebSocket audio server for real-time browser microphone streaming
  • Session state machine with barge-in detection
  • Browser test console with VU meter and event logging
  • Unit tests for VAD engine and privacy manager
  • Full README with architecture docs and config examples

Technical Choices

  • sherpa-onnx-node for all local inference — pure native addon, no Python dependency
  • Zero changes to existing EdgeClaw/GuardClaw code
  • Follows existing plugin conventions (openclaw.plugin.json, hooks, configSchema)

Test Plan

  • pnpm vitest run extensions/voiceclaw/test/
  • Load plugin with pnpm openclaw gateway run and verify VoiceClaw banner
  • Open browser console at http://localhost:8501/voiceclaw/ and test microphone streaming
  • Verify S3 session forces local ASR/TTS (no outbound network calls)
  • Verify S1 session allows cloud fallback when configured

…routing

VoiceClaw brings local VAD, KWS (keyword spotting), ASR, and TTS to
EdgeClaw using sherpa-onnx-node. Integrates with GuardClaw's three-tier
privacy system to ensure voice data stays on-device for S2/S3 scenarios.

Key features:
- Silero VAD for real-time speech detection (~0ms latency)
- Keyword spotting for configurable wake words
- Multi-backend ASR: SenseVoice, Whisper, Paraformer (all local ONNX)
- Multi-backend TTS: VITS, MatchaTTS, edge-tts
- Voice privacy router: forces local ASR/TTS based on S1/S2/S3 level
- WebSocket audio server for browser microphone streaming
- Session state machine with barge-in detection
- Browser test console with VU meter and event logging
- Full test suite for VAD engine and privacy manager

Privacy guarantee: for S2/S3 sessions, raw audio NEVER leaves the device.

17 files, 2423 lines of TypeScript + HTML.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant