A minimal example demonstrating real-time voice conversations using talkio.
- Offline correctness demo using
@talkio/testkit - Real-time voice input via browser microphone
- Speech-to-text using Deepgram Nova 3
- LLM responses using OpenAI GPT-4o-mini
- Text-to-speech using Deepgram Aura 2
- Interruption support - speak to interrupt the agent
- Audio conversion utilities from
talkio
- Bun installed
- Deepgram API key (get one at deepgram.com)
- OpenAI API key (get one at platform.openai.com)
- Create a
.env.localfile with your API keys:
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key- Install dependencies:
bun install- Start the development server:
bun dev- Open http://localhost:3000 in your browser.
Use "Run Correctness Demo" without API keys to verify Talkio orchestration and provider contracts. Use "Start Conversation" after setting API keys for live voice.
Browser (client.tsx) Server (server.ts)
┌─────────────────────┐ ┌──────────────────────────┐
│ Microphone │ │ talkio Agent │
│ ↓ │ │ ├── STT (Deepgram) │
│ ScriptProcessor │ │ ├── LLM (OpenAI) │
│ ↓ │ │ └── TTS (Deepgram) │
│ float32ToLinear16() │ │ │
│ ↓ │──ws──→│ sendAudio() accepts: │
│ WebSocket send │ │ - ArrayBuffer │
└─────────────────────┘ │ - Uint8Array │
│ - Buffer (Node.js) │
┌─────────────────────┐ │ (auto-converted) │
│ linear16ToFloat32() │←─ws───│ │
│ ↓ │ └──────────────────────────┘
│ AudioBufferSource │
│ ↓ │
│ Speakers │
└─────────────────────┘
- Input: 16kHz mono Linear16 PCM
- Output: 24kHz mono Linear16 PCM
The client uses talkio audio conversion utilities:
float32ToLinear16()- Convert Web Audio API Float32Array to Linear16linear16ToFloat32()- Convert Linear16 to Float32Array for playback
// server.ts - sendAudio() accepts multiple input types directly
websocket: {
message(ws, message) {
if (message instanceof ArrayBuffer || ArrayBuffer.isView(message)) {
session.agent.sendAudio(message);
}
},
}// client.tsx - Using talkio conversion functions
import { float32ToLinear16, linear16ToFloat32 } from "talkio";
// Sending audio from microphone
processor.onaudioprocess = (e) => {
const float32 = e.inputBuffer.getChannelData(0);
const pcm16 = float32ToLinear16(float32);
ws.send(pcm16);
};
// Playing received audio
const float32 = linear16ToFloat32(buffer);
audioBuffer.getChannelData(0).set(float32);talkio provides comprehensive audio conversion utilities:
import {
// Format conversion
float32ToLinear16, // Float32Array → ArrayBuffer (Linear16)
linear16ToFloat32, // ArrayBuffer → Float32Array
float32ToInt16, // Float32Array → Int16Array
int16ToFloat32, // Int16Array → Float32Array
// Channel conversion
stereoToMono, // Stereo → Mono (Float32Array or Int16Array)
// Resampling
resample, // Float32Array resampling
resampleInt16, // Int16Array resampling
// Telephony codecs
mulawToLinear16, // G.711 μ-law → Linear16
alawToLinear16, // G.711 A-law → Linear16
linear16ToMulaw, // Linear16 → G.711 μ-law
linear16ToAlaw, // Linear16 → G.711 A-law
} from "talkio";Build and run for production:
bun run build
bun start