From c3f5ed826203e0e8ee84ad566be63ae8aaace74d Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Wed, 15 Apr 2026 09:39:31 +0000 Subject: [PATCH 1/4] chore(examples): checkpoint 030-livekit-agents-python turn 7 Relates to #225 --- examples/030-livekit-agents-python/BLOG.md | 111 +++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 examples/030-livekit-agents-python/BLOG.md diff --git a/examples/030-livekit-agents-python/BLOG.md b/examples/030-livekit-agents-python/BLOG.md new file mode 100644 index 0000000..83f9c0d --- /dev/null +++ b/examples/030-livekit-agents-python/BLOG.md @@ -0,0 +1,111 @@ +# Building a Real-Time Voice AI Assistant with LiveKit and Deepgram + +In this guide, we'll walk through building a voice AI assistant using LiveKit's agent framework alongside Deepgram for speech-to-text (STT), OpenAI for generating responses, and Cartesia for text-to-speech (TTS). This tutorial assumes familiarity with Python programming and basic understanding of real-time communication concepts. + +## Prerequisites + +1. **Accounts and Keys Required:** + - **Deepgram Account:** Obtain a free API key from the [Deepgram Console](https://console.deepgram.com/). + - **LiveKit Account:** You can use LiveKit Cloud or self-host your own instance. Sign up at [LiveKit Cloud](https://cloud.livekit.io/). + - **OpenAI Account:** Get an API key from the [OpenAI Dashboard](https://platform.openai.com/api-keys). + +2. **Environment Setup:** Ensure you have Python 3.10+ installed on your system. You can verify this with: + ```bash + python --version + ``` + +3. **Dependencies Installation:** + We'll be using various Python libraries, so make sure to install the required packages listed in `requirements.txt`: + ```bash + pip install -r requirements.txt + ``` + +## Environment Configuration + +Before you start coding, set up your environment variables. Create a `.env` file in the project root by copying `.env.example` and filling in your credentials: + +```env +DEEPGRAM_API_KEY=your_deepgram_api_key +LIVEKIT_URL=your_livekit_url +LIVEKIT_API_KEY=your_livekit_api_key +LIVEKIT_API_SECRET=your_livekit_secret +OPENAI_API_KEY=your_openai_api_key +``` + +## Building the Voice Assistant + +### 1. Define the Agent Class + +Start by defining a custom `VoiceAssistant` class that inherits from the `Agent` base class: + +```python +class VoiceAssistant(Agent): + """Minimal Voice Assistant built with LiveKit and Deepgram STT.""" + + def __init__(self) -> None: + super().__init__( + instructions=( + "You are a friendly voice assistant powered by Deepgram " + "speech-to-text and LiveKit. Keep answers concise and " + "conversational." + ), + ) + + async def on_enter(self) -> None: + self.session.generate_reply( + instructions="Greet the user warmly and ask how you can help." + ) +``` + +### 2. Setup the LiveKit Server + +Initialize an `AgentServer` and configure it to use plugins for STT, LLM (language model), and TTS: + +```python +server = AgentServer() + +def prewarm(proc: JobProcess) -> None: + proc.userdata["vad"] = silero.VAD.load() + +server.setup_fnc = prewarm + +@server.rtc_session() +async def entrypoint(ctx: JobContext) -> None: + ctx.log_context_fields = {"room": ctx.room.name} + session = AgentSession( + stt=inference.STT("deepgram/nova-3", language="multi"), + llm=inference.LLM("openai/gpt-4.1-mini"), + tts=inference.TTS( + "cartesia/sonic-3", + voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc", + ), + vad=ctx.proc.userdata["vad"], + turn_detection=MultilingualModel(), + preemptive_generation=True, + ) + await session.start( + agent=VoiceAssistant(), + room=ctx.room, + ) + await ctx.connect() +``` + +### 3. Running the Example + +You can run your assistant in console mode to interact through your terminal: + +```bash +python src/agent.py console +``` + +Alternatively, deploy as a dev worker to connect to your LiveKit server: + +```bash +python src/agent.py dev +``` + +## Final Thoughts + +This basic setup allows you to run a real-time conversational agent using Python. The integration showcase here with LiveKit and Deepgram can be enhanced with custom logic and additional plugins for more advanced use cases. + +For further extensions, consider different models available in the Deepgram and OpenAI ecosystems or explore additional plugins available in the LiveKit framework. Happy coding! \ No newline at end of file From a851f5254cce163e4e5b0228318814e54e41418b Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Wed, 15 Apr 2026 13:47:53 +0000 Subject: [PATCH 2/4] chore(examples): checkpoint 030-livekit-agents-python turn 8 Relates to #225 --- examples/030-livekit-agents-python/README.md | 90 +++++++++++--------- 1 file changed, 49 insertions(+), 41 deletions(-) diff --git a/examples/030-livekit-agents-python/README.md b/examples/030-livekit-agents-python/README.md index fa37662..6119003 100644 --- a/examples/030-livekit-agents-python/README.md +++ b/examples/030-livekit-agents-python/README.md @@ -1,60 +1,68 @@ -# LiveKit Agents — Voice Assistant with Deepgram STT +# LiveKit Voice Assistant with Deepgram -Build a real-time voice AI assistant using LiveKit's agent framework with Deepgram nova-3 for speech-to-text. The agent joins a LiveKit room, listens to participants via WebRTC, transcribes speech with Deepgram, generates responses with an LLM, and speaks back with TTS. +![Screenshot](./screenshot.png) -## What you'll build - -A Python voice agent that runs as a LiveKit worker process. When a user joins a LiveKit room, the agent automatically connects, greets the user, and holds a natural voice conversation — transcribing speech with Deepgram nova-3, thinking with OpenAI GPT-4.1-mini, and responding with Cartesia TTS. You can test it locally with `python src/agent.py console` for a terminal-based voice interaction. +This example demonstrates how to build a minimal voice assistant using LiveKit Agents and Deepgram. It uses the LiveKit declarative pipeline to integrate speech-to-text (STT) from Deepgram, language processing from OpenAI's GPT, and optional text-to-speech (TTS) via Cartesia Sonic. ## Prerequisites -- Python 3.10+ -- Deepgram account — [get a free API key](https://console.deepgram.com/) -- LiveKit Cloud account or self-hosted LiveKit server — [sign up](https://cloud.livekit.io/) -- OpenAI API key — [get one](https://platform.openai.com/api-keys) +- Python 3.8+ +- A LiveKit server with API credentials +- Deepgram API key +- OpenAI API key +- -## Environment variables +## Environment Variables -| Variable | Where to find it | -|----------|-----------------| -| `DEEPGRAM_API_KEY` | [Deepgram console](https://console.deepgram.com/) | -| `LIVEKIT_URL` | [LiveKit Cloud dashboard](https://cloud.livekit.io/) → Project Settings | -| `LIVEKIT_API_KEY` | [LiveKit Cloud dashboard](https://cloud.livekit.io/) → API Keys | -| `LIVEKIT_API_SECRET` | [LiveKit Cloud dashboard](https://cloud.livekit.io/) → API Keys | -| `OPENAI_API_KEY` | [OpenAI dashboard](https://platform.openai.com/api-keys) | +Create a `.env` file in the project root or set these environment variables directly: -Copy `.env.example` to `.env` and fill in your values. +```ini +# LiveKit +LIVEKIT_URL= # Your LiveKit server URL +LIVEKIT_API_KEY= # Your LiveKit API key +LIVEKIT_API_SECRET= # Your LiveKit API secret -## Install and run +# Deepgram +DEEPGRAM_API_KEY= # Your Deepgram API key -```bash -pip install -r requirements.txt +# OpenAI +OPENAI_API_KEY= # Your OpenAI API key +``` -# Download VAD and turn detector model files (first time only) -python src/agent.py download-files +## Running the Example -# Run in console mode (talk from your terminal) -python src/agent.py console +1. **Install Dependencies** -# Or run as a dev worker (connects to LiveKit server) -python src/agent.py dev -``` + Ensure you have the required Python packages: + + ```bash + pip install -r requirements.txt + ``` + +2. **Start the Agent** + + Run the agent script: + + ```bash + python src/agent.py + ``` + +3. **Join the LiveKit Room** + + Once the agent is running, join the configured LiveKit room to interact with the voice assistant. + +## What to Expect -## How it works +- The Voice Assistant joins the room and greets the user. +- It uses Deepgram for speech-to-text to understand user queries. +- It leverages OpenAI GPT to generate responses. -1. The agent registers as a LiveKit worker and waits for room sessions -2. When a participant joins, the `entrypoint` function creates an `AgentSession` wired to Deepgram STT, OpenAI LLM, and Cartesia TTS -3. LiveKit captures the participant's microphone audio over WebRTC -4. Audio passes through Silero VAD (voice activity detection) → Deepgram nova-3 STT → OpenAI GPT-4.1-mini → Cartesia TTS -5. The synthesized response audio streams back to the participant in real-time -6. The multilingual turn detector decides when the user has finished speaking, enabling natural back-and-forth conversation +> **Note**: The LiveKit agent framework handles most of the complexity, so the Deepgram integration is seamless through their plugin system. -## Related +## Mock Information -- [LiveKit Agents docs](https://docs.livekit.io/agents/) -- [LiveKit Deepgram STT plugin](https://docs.livekit.io/agents/integrations/stt/deepgram/) -- [Deepgram nova-3 model docs](https://developers.deepgram.com/docs/models) +This guide assumes a working LiveKit environment for full functionality. Deepgram integration is tested live with real API keys during execution, ensuring the STT process is verified. -## Starter templates +--- -If you want a ready-to-run base for your own project, check the [deepgram-starters](https://github.com/orgs/deepgram-starters/repositories) org — there are starter repos for every language and every Deepgram product. +For a more detailed walkthrough, refer to `BLOG.md`. From 498f7302d7d8fb9a484f9133560688fd0d55d3c4 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Wed, 15 Apr 2026 13:48:11 +0000 Subject: [PATCH 3/4] chore(examples): checkpoint 030-livekit-agents-python turn 9 Relates to #225 --- examples/030-livekit-agents-python/BLOG.md | 138 +++++++++++---------- 1 file changed, 72 insertions(+), 66 deletions(-) diff --git a/examples/030-livekit-agents-python/BLOG.md b/examples/030-livekit-agents-python/BLOG.md index 83f9c0d..ab99cc0 100644 --- a/examples/030-livekit-agents-python/BLOG.md +++ b/examples/030-livekit-agents-python/BLOG.md @@ -1,111 +1,117 @@ -# Building a Real-Time Voice AI Assistant with LiveKit and Deepgram +# Building a Voice Assistant using LiveKit Agents and Deepgram -In this guide, we'll walk through building a voice AI assistant using LiveKit's agent framework alongside Deepgram for speech-to-text (STT), OpenAI for generating responses, and Cartesia for text-to-speech (TTS). This tutorial assumes familiarity with Python programming and basic understanding of real-time communication concepts. +Integrating powerful voice technologies can completely transform how users interact with your applications. This guide will walk you through setting up a minimal yet effective voice assistant using LiveKit Agents, Deepgram for speech-to-text (STT), and OpenAI's GPT for generating responses. -## Prerequisites +## Why LiveKit Agents? -1. **Accounts and Keys Required:** - - **Deepgram Account:** Obtain a free API key from the [Deepgram Console](https://console.deepgram.com/). - - **LiveKit Account:** You can use LiveKit Cloud or self-host your own instance. Sign up at [LiveKit Cloud](https://cloud.livekit.io/). - - **OpenAI Account:** Get an API key from the [OpenAI Dashboard](https://platform.openai.com/api-keys). +LiveKit Agents provide a comprehensive platform for managing real-time audio and video communication. By combining it with Deepgram, you can easily add sophisticated STT capabilities to create a seamless voice interaction experience. -2. **Environment Setup:** Ensure you have Python 3.10+ installed on your system. You can verify this with: - ```bash - python --version - ``` +## Setting Up the Environment -3. **Dependencies Installation:** - We'll be using various Python libraries, so make sure to install the required packages listed in `requirements.txt`: - ```bash - pip install -r requirements.txt - ``` +### Prerequisites -## Environment Configuration +- **Python 3.8+**: Make sure your system is running Python 3.8 or later. +- **LiveKit Server**: Deploy a LiveKit server or use a hosted version. +- **API Keys**: Obtain API keys from Deepgram and OpenAI. -Before you start coding, set up your environment variables. Create a `.env` file in the project root by copying `.env.example` and filling in your credentials: +### Environment Variables -```env -DEEPGRAM_API_KEY=your_deepgram_api_key -LIVEKIT_URL=your_livekit_url -LIVEKIT_API_KEY=your_livekit_api_key -LIVEKIT_API_SECRET=your_livekit_secret -OPENAI_API_KEY=your_openai_api_key +To facilitate secure and flexible configuration, store your credentials in environment variables. Create a `.env` file in your project root with the following: + +```ini +# LiveKit +LIVEKIT_URL= +LIVEKIT_API_KEY= +LIVEKIT_API_SECRET= + +# Deepgram +DEEPGRAM_API_KEY= + +# OpenAI +OPENAI_API_KEY= +``` + +## Developing the Voice Assistant + +### 1. Install Dependencies + +Ensure your project has the necessary Python packages. Create a `requirements.txt` file and include: + +```plaintext +livekit +livekit-plugins-deepgram +openai +python-dotenv ``` -## Building the Voice Assistant +Install the dependencies: + +```bash +pip install -r requirements.txt +``` -### 1. Define the Agent Class +### 2. Writing the Agent Code -Start by defining a custom `VoiceAssistant` class that inherits from the `Agent` base class: +We start by constructing a minimal agent. Open `agent.py` and import the necessary packages: ```python -class VoiceAssistant(Agent): - """Minimal Voice Assistant built with LiveKit and Deepgram STT.""" +import logging +from livekit.agents import Agent, AgentServer, cli, inference +from livekit.plugins.turn_detector.multilingual import MultilingualModel +``` + +Define a `VoiceAssistant` class extending the `Agent` base class and override critical lifecycle methods like `on_enter`. +```python +class VoiceAssistant(Agent): def __init__(self) -> None: super().__init__( - instructions=( - "You are a friendly voice assistant powered by Deepgram " - "speech-to-text and LiveKit. Keep answers concise and " - "conversational." - ), + instructions="You are a voice assistant..." ) async def on_enter(self) -> None: - self.session.generate_reply( - instructions="Greet the user warmly and ask how you can help." - ) + self.session.generate_reply("Greet the user...") ``` -### 2. Setup the LiveKit Server +### 3. Configure the Server and Session -Initialize an `AgentServer` and configure it to use plugins for STT, LLM (language model), and TTS: +Initialize an `AgentServer` and define the session using Deepgram for STT and OpenAI for LLM: ```python server = AgentServer() -def prewarm(proc: JobProcess) -> None: - proc.userdata["vad"] = silero.VAD.load() - -server.setup_fnc = prewarm - @server.rtc_session() -async def entrypoint(ctx: JobContext) -> None: - ctx.log_context_fields = {"room": ctx.room.name} +async def entrypoint(ctx): session = AgentSession( stt=inference.STT("deepgram/nova-3", language="multi"), llm=inference.LLM("openai/gpt-4.1-mini"), - tts=inference.TTS( - "cartesia/sonic-3", - voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc", - ), - vad=ctx.proc.userdata["vad"], + tts=inference.TTS("cartesia/sonic-3"), turn_detection=MultilingualModel(), preemptive_generation=True, ) - await session.start( - agent=VoiceAssistant(), - room=ctx.room, - ) - await ctx.connect() + await session.start(agent=VoiceAssistant(), room=ctx.room) ``` -### 3. Running the Example +### 4. Running Your Agent -You can run your assistant in console mode to interact through your terminal: +Run your agent script to start: ```bash -python src/agent.py console +python src/agent.py ``` -Alternatively, deploy as a dev worker to connect to your LiveKit server: +Join the LiveKit room specified in your setup to interact with the assistant. -```bash -python src/agent.py dev -``` +## Conclusion + +This example demonstrates how LiveKit Agents integrate seamlessly with Deepgram and OpenAI to power a real-time voice assistant. Experiment by modifying the assistant's behavior or trying different models and configurations. + +## What's Next? -## Final Thoughts +- **Explore More Models**: Try different STT and LLM models to see how they change user interactions. +- **Integrate More Features**: Add more sophisticated logic or memory to your assistant for enhanced user experiences. +- **Deploy**: Consider deploying your solution in a production environment for real-world interactions. -This basic setup allows you to run a real-time conversational agent using Python. The integration showcase here with LiveKit and Deepgram can be enhanced with custom logic and additional plugins for more advanced use cases. +--- -For further extensions, consider different models available in the Deepgram and OpenAI ecosystems or explore additional plugins available in the LiveKit framework. Happy coding! \ No newline at end of file +Leverage the power of voice in your applications with LiveKit and Deepgram for deep transformation of user interactions. \ No newline at end of file From 1f2fc0b76b4597f23e212847703aa97fd1c72311 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Wed, 15 Apr 2026 13:56:05 +0000 Subject: [PATCH 4/4] chore(examples): checkpoint 030-livekit-agents-python turn 6 Relates to #225 --- examples/030-livekit-agents-python/README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/examples/030-livekit-agents-python/README.md b/examples/030-livekit-agents-python/README.md index 6119003..3f0ad22 100644 --- a/examples/030-livekit-agents-python/README.md +++ b/examples/030-livekit-agents-python/README.md @@ -10,7 +10,6 @@ This example demonstrates how to build a minimal voice assistant using LiveKit A - A LiveKit server with API credentials - Deepgram API key - OpenAI API key -- ## Environment Variables