Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ console.log('Speech echo WebSocket app listening on port 3000');

Both `WebhookResponse` and `Session` support the same chainable verb methods:

`.say(opts)` `.play(opts)` `.gather(opts)` `.dial(opts)` `.llm(opts)` `.s2s(opts)` `.openai_s2s(opts)` `.google_s2s(opts)` `.elevenlabs_s2s(opts)` `.deepgram_s2s(opts)` `.ultravox_s2s(opts)` `.dialogflow(opts)` `.conference(opts)` `.enqueue(opts)` `.dequeue(opts)` `.hangup()` `.pause(opts)` `.redirect(opts)` `.config(opts)` `.tag(opts)` `.dtmf(opts)` `.listen(opts)` `.transcribe(opts)` `.message(opts)` `.stream(opts)` `.pipeline(opts)` `.dub(opts)` `.alert(opts)` `.answer(opts)` `.leave()` `.sipDecline(opts)` `.sipRefer(opts)` `.sipRequest(opts)`
`.say(opts)` `.play(opts)` `.gather(opts)` `.dial(opts)` `.llm(opts)` `.s2s(opts)` `.openai_s2s(opts)` `.google_s2s(opts)` `.elevenlabs_s2s(opts)` `.deepgram_s2s(opts)` `.ultravox_s2s(opts)` `.dialogflow(opts)` `.conference(opts)` `.enqueue(opts)` `.dequeue(opts)` `.hangup()` `.pause(opts)` `.redirect(opts)` `.config(opts)` `.tag(opts)` `.dtmf(opts)` `.listen(opts)` `.transcribe(opts)` `.message(opts)` `.stream(opts)` `.agent(opts)` `.dub(opts)` `.alert(opts)` `.answer(opts)` `.leave()` `.sipDecline(opts)` `.sipRefer(opts)` `.sipRequest(opts)`

All methods accept the same options as the corresponding verb JSON Schema. Methods are chainable — they return `this`.

Expand Down Expand Up @@ -370,7 +370,7 @@ Beyond verbs, WebSocket apps can perform async operations at any time during a c
Key capabilities:
- **TTS token streaming** — `sendTtsTokens()`, `flushTtsTokens()`, `clearTtsTokens()` — pipe LLM tokens to jambonz incrementally for lowest-latency TTS playback. **Not the same as `autoStreamTts`** (which is a jambonz-internal audio optimization).
- **Inject commands** — `injectMute()`, `injectWhisper()`, `injectDtmf()`, `injectRecord()`, `injectTag()`, `injectListenStatus()` — modify the call mid-stream.
- **LLM tool output** — `toolOutput()` — return tool call results to the pipeline verb's LLM.
- **LLM tool output** — `toolOutput()` — return tool call results to the agent verb's LLM.
- **Cascaded voice AI agents** — build your own STT→LLM→TTS loop using `config` (ttsStream + bargeIn) + `sendTtsTokens()`. Full control over LLM interaction and conversation history.

### Session Events (SDK)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ import { JambonzClient } from '@jambonz/sdk/client';

Both `WebhookResponse` and WebSocket `Session` support the same chainable verb methods:

`.say()` `.play()` `.gather()` `.dial()` `.llm()` `.conference()` `.enqueue()` `.dequeue()` `.hangup()` `.pause()` `.redirect()` `.config()` `.tag()` `.dtmf()` `.listen()` `.transcribe()` `.message()` `.stream()` `.pipeline()` `.dub()` `.alert()` `.answer()` `.leave()` `.sipDecline()` `.sipRefer()` `.sipRequest()`
`.say()` `.play()` `.gather()` `.dial()` `.llm()` `.conference()` `.enqueue()` `.dequeue()` `.hangup()` `.pause()` `.redirect()` `.config()` `.tag()` `.dtmf()` `.listen()` `.transcribe()` `.message()` `.stream()` `.agent()` `.dub()` `.alert()` `.answer()` `.leave()` `.sipDecline()` `.sipRefer()` `.sipRequest()`

All methods accept the same options as the corresponding [verb JSON schemas](schema/verbs/) and are chainable.

Expand Down
10 changes: 5 additions & 5 deletions examples/pipeline/ws-app.ts → examples/agent/ws-app.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ const makeService = createEndpoint({
},
});

const svc = makeService({ path: '/pipeline' });
const svc = makeService({ path: '/agent' });

const systemPrompt = `You are a helpful weather assistant.
You can look up current weather for any location using the get_weather tool.
Expand Down Expand Up @@ -56,7 +56,7 @@ svc.on('session:new', (session) => {

session
.on('/event', (evt: Record<string, any>) => {
console.log('pipeline event:', evt.type);
console.log('agent event:', evt.type);
})
.on('/toolCall', async (evt: Record<string, any>) => {
const { tool_call_id, name, arguments: args } = evt;
Expand Down Expand Up @@ -95,7 +95,7 @@ svc.on('session:new', (session) => {
}
})
.on('/action', (evt: Record<string, any>) => {
console.log('pipeline ended:', evt.completion_reason);
console.log('agent ended:', evt.completion_reason);
session.reply();
})
.on('close', (code: number) => {
Expand All @@ -106,7 +106,7 @@ svc.on('session:new', (session) => {
});

session
.pipeline({
.agent({
stt: {
vendor: 'deepgram',
language: 'en-US',
Expand Down Expand Up @@ -135,4 +135,4 @@ svc.on('session:new', (session) => {
.send();
});

console.log('Pipeline voice agent listening on port 3000');
console.log('Agent listening on port 3000');
24 changes: 12 additions & 12 deletions examples/bedrock-pipeline.ts → examples/bedrock-agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,14 @@ interface TtsConfig {
options?: Record<string, unknown>;
}

interface PipelineOptions {
interface AgentOptions {
stt: SttConfig;
tts: TtsConfig;
turnDetection: 'krisp' | 'stt';
noiseIsolation?: 'krisp' | 'rnnoise';
}

function handleSession(session: Session, opts: PipelineOptions) {
function handleSession(session: Session, opts: AgentOptions) {
const log = logger.child({ call_sid: session.callSid });
const llmVendor = session.data.env_vars?.LLM_VENDOR || 'openai';
const model = session.data.env_vars?.LLM_MODEL || 'gpt-4.1-mini';
Expand All @@ -78,19 +78,19 @@ function handleSession(session: Session, opts: PipelineOptions) {
/* Demo: update_tools mid-conversation to add web search capability.
After the user's second question (turn_end #2), inject a web_search tool.
The agent starts without web search, so early questions get stale answers.
Once the tool is added, the agent can search the web via Tavily. */
Once the tool is added, it can search the web via Tavily. */
let turnCount = 0;
let toolsInjected = false;

session.on('/pipeline-event', (evt: Record<string, unknown>) => {
log.info({payload: evt}, `pipeline event: ${evt.type}`);
session.on('/agent-event', (evt: Record<string, unknown>) => {
log.info({payload: evt}, `agent event: ${evt.type}`);

if (evt.type === 'turn_end') {
turnCount++;
if (turnCount === 2 && !toolsInjected) {
toolsInjected = true;
log.info('injecting web_search tool');
session.updatePipeline({
session.updateAgent({
type: 'update_tools',
tools: [
{
Expand All @@ -112,7 +112,7 @@ function handleSession(session: Session, opts: PipelineOptions) {
},
],
});
session.updatePipeline({
session.updateAgent({
type: 'inject_context',
messages: [
{
Expand Down Expand Up @@ -155,13 +155,13 @@ function handleSession(session: Session, opts: PipelineOptions) {
}
});

session.on('/pipeline-complete', (evt: Record<string, unknown>) => {
log.info({payload: evt}, 'pipeline completed');
session.on('/agent-complete', (evt: Record<string, unknown>) => {
log.info({payload: evt}, 'agent completed');
session.hangup().reply();
});

session
.pipeline({
.agent({
stt: opts.stt,
tts: {
vendor: opts.tts.vendor,
Expand All @@ -183,9 +183,9 @@ function handleSession(session: Session, opts: PipelineOptions) {
bargeIn: {
enable: true,
},
eventHook: '/pipeline-event',
eventHook: '/agent-event',
toolHook: '/tool-call',
actionHook: '/pipeline-complete',
actionHook: '/agent-complete',
})
.send();
}
Expand Down
12 changes: 6 additions & 6 deletions jambonz/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: jambonz
description: >-
Build voice applications on jambonz, an open-source CPaaS. Covers the verb
model, webhook and WebSocket transports, IVR menus, AI voice agents (OpenAI
Realtime, Deepgram, ElevenLabs, Google, Ultravox, pipeline), call routing,
Realtime, Deepgram, ElevenLabs, Google, Ultravox, agent), call routing,
queuing, recording, mid-call control, and SIP. Works with @jambonz/sdk
(TypeScript) or raw JSON from any language. Use with the jambonz MCP server
for schema lookups.
Expand Down Expand Up @@ -38,7 +38,7 @@ jambonz has two editions: **v0.9.x (open source)** and **v10.x (commercial)**. E
Choose the transport based on what the application needs:

### Use WebSocket when:
- Using any speech-to-speech verb (`openai_s2s`, `google_s2s`, `deepgram_s2s`, `ultravox_s2s`, `elevenlabs_s2s`, `s2s`, `pipeline`) — **mandatory**
- Using any speech-to-speech verb (`openai_s2s`, `google_s2s`, `deepgram_s2s`, `ultravox_s2s`, `elevenlabs_s2s`, `s2s`, `agent`) — **mandatory**
- Streaming raw audio (`listen`/`stream` verb with bidirectional audio)
- Using TTS token streaming
- Building complex conversational flows with session state
Expand Down Expand Up @@ -69,11 +69,11 @@ The user wants a caller to have a conversation with an LLM.

**Is the vendor determined at runtime** (e.g. from an env var)? Use `s2s` with `vendor` property.

**Does the user want jambonz to orchestrate STT + LLM + TTS as separate components?** Use `pipeline`.
**Does the user want jambonz to orchestrate STT + LLM + TTS as separate components?** Use `agent`.

**Never use `llm` in generated code** — it is a legacy name. Use either a vendor shortcut or `s2s`.

See [references/voice-ai-guide.md](references/voice-ai-guide.md) for details on s2s vs pipeline, tool calling, and vendor specifics.
See [references/voice-ai-guide.md](references/voice-ai-guide.md) for details on s2s vs agent, tool calling, and vendor specifics.

### "Build an IVR menu / collect input"

Expand Down Expand Up @@ -209,7 +209,7 @@ Use `get_jambonz_schema` to look up the exact JSON structure for any verb.
4. **Missing `anchorMedia: true` on `dial`** — Required for recording during bridged calls. Without it, audio doesn't flow through the media server.
5. **Using `process.env`** — jambonz apps should use application environment variables (`session.data.env_vars` / `req.body.env_vars`), not `process.env`.
6. **`env_vars` only on initial call** — The `env_vars` object is only present in the first webhook POST or `session:new`. Store values in a variable if needed in actionHook handlers.
7. **Webhook transport for s2s/pipeline apps** — These verbs require WebSocket. Always use `createEndpoint` from `@jambonz/sdk/websocket`.
7. **Webhook transport for s2s/agent apps** — These verbs require WebSocket. Always use `createEndpoint` from `@jambonz/sdk/websocket`.
8. **ElevenLabs: passing `model` or `messages`** — ElevenLabs uses `agent_id` auth. The model and prompt are configured in the ElevenLabs dashboard. Pass `llmOptions: {}`.
9. **Marks silently failing** — Marks require `bidirectionalAudio: { enabled: true, streaming: true }` on the listen/stream verb. Without streaming mode, marks are accepted but never fire.
10. **Not binding actionHook listeners before `.send()`** — In WebSocket mode, if no listener is bound for an actionHook, the SDK auto-replies with an empty verb array, which usually means the call hangs up unexpectedly.
Expand Down Expand Up @@ -237,7 +237,7 @@ Use `get_jambonz_schema` to look up the exact JSON structure for any verb.

Load these on demand based on the task:

- [references/voice-ai-guide.md](references/voice-ai-guide.md) — **Load when** building s2s or pipeline voice AI apps. Covers s2s vs pipeline decision, vendor shortcuts, tool/function calling, TTS streaming, eventHook patterns.
- [references/voice-ai-guide.md](references/voice-ai-guide.md) — **Load when** building s2s or agent voice AI apps. Covers s2s vs agent decision, vendor shortcuts, tool/function calling, TTS streaming, eventHook patterns.
- [references/ivr-patterns.md](references/ivr-patterns.md) — **Load when** building IVR menus or gather-based input collection. Covers speech/DTMF/mixed input, multi-level menus, timeout and retry patterns.
- [references/call-control.md](references/call-control.md) — **Load when** building apps with dial, transfer, queuing, conference, recording, or mid-call control. Covers dial targets, SIP ops, enqueue/dequeue, REST API control, inject commands.
- [references/env-vars-and-config.md](references/env-vars-and-config.md) — **Load when** the app needs configurable parameters. Covers the two-step declare+read pattern, schema properties, the "only on initial call" gotcha.
18 changes: 9 additions & 9 deletions jambonz/references/voice-ai-guide.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Voice AI Guide

This reference covers building AI-powered voice agents with jambonz — speech-to-speech (s2s) verbs and the pipeline verb.
This reference covers building AI-powered voice agents with jambonz — speech-to-speech (s2s) verbs and the agent verb.

## s2s vs Pipeline: When to Use Which
## s2s vs Agent: When to Use Which

### Speech-to-Speech (s2s) Verbs

Expand All @@ -17,14 +17,14 @@ Available vendor shortcuts (always prefer these over generic `s2s`):

Use generic `s2s` with `vendor` property **only** when the vendor is determined at runtime.

### Pipeline Verb
### Agent Verb

Use `pipeline` when you want **jambonz to orchestrate separate STT, LLM, and TTS components**. This gives you:
Use `agent` when you want **jambonz to orchestrate separate STT, LLM, and TTS components**. This gives you:
- Mix-and-match: e.g. Deepgram STT + Anthropic LLM + ElevenLabs TTS
- More control over each component's configuration
- Built-in turn detection and interruption handling

The pipeline verb has three main configuration blocks: `recognizer` (STT), `llm` (text LLM), and `synthesizer` (TTS).
The agent verb has three main configuration blocks: `recognizer` (STT), `llm` (text LLM), and `synthesizer` (TTS).

## Vendor-Specific Details

Expand Down Expand Up @@ -131,7 +131,7 @@ session.on('llm:event', (evt) => {

## TTS Token Streaming

For pipeline or custom flows where you generate text and want incremental TTS:
For agent or custom flows where you generate text and want incremental TTS:

1. Enable streaming: `session.config({ ttsStream: { enable: true } })`
2. Send tokens: `await session.sendTtsTokens('chunk of text')`
Expand Down Expand Up @@ -170,11 +170,11 @@ session

Add `toolHook` and bind a handler for tool execution. See the Tool / Function Calling section above.

### Pipeline (mix-and-match components)
### Agent (mix-and-match components)

```typescript
session
.pipeline({
.agent({
recognizer: { vendor: 'deepgram', language: 'en-US' },
llm: {
vendor: 'anthropic',
Expand All @@ -187,4 +187,4 @@ session
.send();
```

Look up full schema: `get_jambonz_schema('pipeline')`
Look up full schema: `get_jambonz_schema('agent')`
12 changes: 6 additions & 6 deletions typescript/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions typescript/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@jambonz/sdk",
"version": "0.2.0",
"version": "0.3.0",
"description": "jambonz SDK for building voice applications — optimized for AI agents",
"author": "Dave Horton",
"license": "MIT",
Expand Down Expand Up @@ -90,7 +90,7 @@
"prepublishOnly": "npm run build"
},
"dependencies": {
"@jambonz/schema": "^0.1.5",
"@jambonz/schema": "^0.2.1",
"ajv": "^8.17.1",
"ws": "^8.18.0"
},
Expand Down
6 changes: 3 additions & 3 deletions typescript/src/client/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,9 @@ export class CallsResource {
return this.update(callSid, { mute_status: status });
}

/** Send a mid-conversation update to an active pipeline verb. */
async updatePipeline(callSid: string, data: NonNullable<UpdateCallRequest['pipeline_update']>): Promise<void> {
return this.update(callSid, { pipeline_update: data });
/** Send a mid-conversation update to an active agent verb. */
async updateAgent(callSid: string, data: NonNullable<UpdateCallRequest['agent_update']>): Promise<void> {
return this.update(callSid, { agent_update: data });
}

/** Enable or disable server-side noise isolation. */
Expand Down
18 changes: 9 additions & 9 deletions typescript/src/types/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ export type {

// Verbs
export type {
AgentVerb,
AlertVerb,
AnswerVerb,
ConferenceVerb,
Expand All @@ -38,7 +39,6 @@ export type {
McpServerConfig,
MessageVerb,
PauseVerb,
PipelineVerb,
PlayVerb,
RedirectVerb,
SayVerb,
Expand All @@ -54,15 +54,15 @@ export type {

// Session
export type {
AgentEvent,
AgentEventType,
AgentLlmResponseEvent,
AgentPreflightMetrics,
AgentTurnEndEvent,
AgentTurnLatency,
AgentUserInterruptionEvent,
AgentUserTranscriptEvent,
CallSession,
PipelineAgentResponseEvent,
PipelineEvent,
PipelineEventType,
PipelinePreflightMetrics,
PipelineTurnEndEvent,
PipelineTurnLatency,
PipelineUserInterruptionEvent,
PipelineUserTranscriptEvent,
TtsStreamingEvent,
TtsStreamingEventType,
WsMessage,
Expand Down
4 changes: 2 additions & 2 deletions typescript/src/types/rest.ts
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,8 @@ export interface UpdateCallRequest {
dtmf?: { digit: string; duration?: number };
/** Tag metadata. */
tag?: Record<string, unknown>;
/** Mid-conversation pipeline update. */
pipeline_update?: {
/** Mid-conversation agent update. */
agent_update?: {
type: 'update_instructions' | 'inject_context' | 'update_tools' | 'generate_reply';
instructions?: string;
messages?: Array<{ role: string; content: string }>;
Expand Down
Loading