Skip to content

feat(eot): add audio eot model support#1613

Open
chenghao-mou wants to merge 6 commits into
mainfrom
feat/AGT-2520-multimodal-eou
Open

feat(eot): add audio eot model support#1613
chenghao-mou wants to merge 6 commits into
mainfrom
feat/AGT-2520-multimodal-eou

Conversation

@chenghao-mou
Copy link
Copy Markdown
Member

Description

add audio eot model and local inference support, deprecating silero and turn detector plugins

Changes Made

  • add audio eot model support for both cloud and local versions
  • deprecating silero and turn detector plugins

Pre-Review Checklist

  • Build passes: All builds (lint, typecheck, tests) pass locally
  • AI-generated code reviewed: Removed unnecessary comments and ensured code quality
  • Changes explained: All changes are properly documented and justified above
  • Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included
  • Video demo: A small video demo showing changes works as expected and did not break any existing functionality using Agent Playground (if applicable)

Testing

  • Automated tests added/updated (if applicable)
  • All tests pass
  • Make sure both restaurant_agent.ts and realtime_agent.ts work properly (for major changes)

Additional Notes


Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.

add audio eot model and local inference support, deprecating silero and turn detector plugins
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 27, 2026

🦋 Changeset detected

Latest commit: 679f639

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@chenghao-mou chenghao-mou requested a review from a team May 27, 2026 09:27
@chenghao-mou chenghao-mou marked this pull request as ready for review May 27, 2026 18:24
devin-ai-integration[bot]

This comment was marked as resolved.

*
* Port of Python `livekit.agents.inference.eot.detector`.
*/
import type { InferenceExecutor } from '../../ipc/inference_executor.js';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there will be any overlap coe between the eot under inference v.s. turn detector plugins? Or this is specifically to audio-based eot?

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 13 additional findings in Devin Review.

Open in Devin Review

@@ -1053,11 +1317,32 @@ export class AudioRecognition {
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Audio EOT detector bypassed by early return when STT is enabled but no transcript arrived

In runEOUDetection, the guard at line 1313 returns early when STT is enabled but this.audioTranscript is empty. This guard was correct for text-based turn detectors (which need text to score) but is incorrect for the new audio-based AudioTurnDetector, which operates on audio features and doesn't need text. The PR adds proper audio-detector handling at lines 1330-1345 (hasAudioDetector / useDetector), but the early return prevents the code from ever reaching that logic.

When VAD fires END_OF_SPEECH before any STT transcript arrives (short utterances, STT lag), the audio EOT prediction — which may already be cached from the warmup phase at agents/src/voice/audio_recognition.ts:1749-1757 — is completely skipped. Turn detection stalls until an STT transcript eventually arrives, defeating the latency advantage of the audio EOT model.

(Refers to lines 1313-1317)

Prompt for agents
The early return guard in runEOUDetection at line 1313 skips EOU detection when STT is enabled but no transcript has arrived yet. This guard is appropriate for text-based turn detectors but incorrectly blocks audio-based turn detectors (AudioTurnDetector) that operate on audio features and don't need text.

The fix should allow audio EOT detectors to proceed even without a transcript. The hasAudioDetector variable (computed at line 1330) needs to be evaluated earlier, or the guard needs to be adjusted.

A possible fix: move the hasAudioDetector computation before the guard, then add it as an exclusion condition:

const hasAudioDetector = this.turnDetector instanceof AudioTurnDetector;
if (this.stt && !this.audioTranscript && this.turnDetectionMode !== 'manual' && !hasAudioDetector) {
  this.logger.debug('skipping EOU detection');
  return;
}

This ensures the text-based guard still applies for text detectors while allowing audio detectors to proceed to the predictEndOfTurn path. The existing useDetector logic at lines 1330-1345 already correctly handles the audio detector case.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +9 to +14
log().warn(
'The text-based turn detector from @livekit/agents-plugins-livekit is deprecated. ' +
'The audio EOT detector in `@livekit/agents` inference (AudioTurnDetector) replaces ' +
'it and runs natively on-device via @livekit/local-inference. ' +
'This text-based path will be removed in a future release.',
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Module-level log().warn() crashes import when logger is not yet initialized

The deprecation warning at line 9 calls log().warn() at the top level of the module (during ES module evaluation). The log() function (agents/src/log.ts:35-41) throws TypeError('logger not initialized') if initializeLogger() hasn't been called yet. Since static imports are evaluated before any module body code runs, any file that does import * as livekit from '@livekit/agents-plugin-livekit' will crash if the logger hasn't been initialized by a prior module in the evaluation order. This affects test environments (Vitest, Jest), standalone scripts, and any code path that imports the package outside the CLI workflow where initializeLogger() is called before dynamic agent module imports.

The same issue exists in plugins/silero/src/index.ts:8-14.

Prompt for agents
The deprecation warning uses log().warn() at the top level of the module (import time). Since log() throws TypeError when the logger hasn't been initialized, this crashes any import that happens before initializeLogger() is called.

Two possible approaches:

1. Use console.warn() instead of log().warn() for the deprecation message, since console is always available:
   console.warn('[livekit] The text-based turn detector is deprecated...');

2. Defer the warning to first use (e.g., in the constructor of EnglishModel/MultilingualModel) rather than at import time.

The same fix needs to be applied in plugins/silero/src/index.ts which has the identical pattern.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

…frame

The AudioFrame emitted on START_OF_SPEECH / END_OF_SPEECH sliced off
the prefix-padding samples but still reported `samplesPerChannel =
speechBufferIndex`, so the frame's metadata claimed more samples than
its data contained and downstream consumers (STT, transcription) lost
the pre-roll context the buffer machinery is designed to preserve.

Slice from 0 instead so data length matches samplesPerChannel and the
prefix-padding pre-roll is delivered, matching the Python original.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 15 additional findings in Devin Review.

Open in Devin Review

import { INFERENCE_METHOD_MULTILINGUAL } from './multilingual.js';

log().warn(
'The text-based turn detector from @livekit/agents-plugins-livekit is deprecated. ' +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Typo in deprecation warning: wrong package name

The deprecation warning at module load time references @livekit/agents-plugins-livekit (plural "plugins"), but the actual package name is @livekit/agents-plugin-livekit (singular "plugin"). This makes the warning confusing for users trying to identify which dependency to update.

Suggested change
'The text-based turn detector from @livekit/agents-plugins-livekit is deprecated. ' +
'The text-based turn detector from @livekit/agents-plugin-livekit is deprecated. ' +
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants