feat(eot): add audio eot model support#1613
Conversation
add audio eot model and local inference support, deprecating silero and turn detector plugins
🦋 Changeset detectedLatest commit: 679f639 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| * | ||
| * Port of Python `livekit.agents.inference.eot.detector`. | ||
| */ | ||
| import type { InferenceExecutor } from '../../ipc/inference_executor.js'; |
There was a problem hiding this comment.
I wonder if there will be any overlap coe between the eot under inference v.s. turn detector plugins? Or this is specifically to audio-based eot?
| @@ -1053,11 +1317,32 @@ export class AudioRecognition { | |||
| } | |||
There was a problem hiding this comment.
🔴 Audio EOT detector bypassed by early return when STT is enabled but no transcript arrived
In runEOUDetection, the guard at line 1313 returns early when STT is enabled but this.audioTranscript is empty. This guard was correct for text-based turn detectors (which need text to score) but is incorrect for the new audio-based AudioTurnDetector, which operates on audio features and doesn't need text. The PR adds proper audio-detector handling at lines 1330-1345 (hasAudioDetector / useDetector), but the early return prevents the code from ever reaching that logic.
When VAD fires END_OF_SPEECH before any STT transcript arrives (short utterances, STT lag), the audio EOT prediction — which may already be cached from the warmup phase at agents/src/voice/audio_recognition.ts:1749-1757 — is completely skipped. Turn detection stalls until an STT transcript eventually arrives, defeating the latency advantage of the audio EOT model.
(Refers to lines 1313-1317)
Prompt for agents
The early return guard in runEOUDetection at line 1313 skips EOU detection when STT is enabled but no transcript has arrived yet. This guard is appropriate for text-based turn detectors but incorrectly blocks audio-based turn detectors (AudioTurnDetector) that operate on audio features and don't need text.
The fix should allow audio EOT detectors to proceed even without a transcript. The hasAudioDetector variable (computed at line 1330) needs to be evaluated earlier, or the guard needs to be adjusted.
A possible fix: move the hasAudioDetector computation before the guard, then add it as an exclusion condition:
const hasAudioDetector = this.turnDetector instanceof AudioTurnDetector;
if (this.stt && !this.audioTranscript && this.turnDetectionMode !== 'manual' && !hasAudioDetector) {
this.logger.debug('skipping EOU detection');
return;
}
This ensures the text-based guard still applies for text detectors while allowing audio detectors to proceed to the predictEndOfTurn path. The existing useDetector logic at lines 1330-1345 already correctly handles the audio detector case.
Was this helpful? React with 👍 or 👎 to provide feedback.
| log().warn( | ||
| 'The text-based turn detector from @livekit/agents-plugins-livekit is deprecated. ' + | ||
| 'The audio EOT detector in `@livekit/agents` inference (AudioTurnDetector) replaces ' + | ||
| 'it and runs natively on-device via @livekit/local-inference. ' + | ||
| 'This text-based path will be removed in a future release.', | ||
| ); |
There was a problem hiding this comment.
🔴 Module-level log().warn() crashes import when logger is not yet initialized
The deprecation warning at line 9 calls log().warn() at the top level of the module (during ES module evaluation). The log() function (agents/src/log.ts:35-41) throws TypeError('logger not initialized') if initializeLogger() hasn't been called yet. Since static imports are evaluated before any module body code runs, any file that does import * as livekit from '@livekit/agents-plugin-livekit' will crash if the logger hasn't been initialized by a prior module in the evaluation order. This affects test environments (Vitest, Jest), standalone scripts, and any code path that imports the package outside the CLI workflow where initializeLogger() is called before dynamic agent module imports.
The same issue exists in plugins/silero/src/index.ts:8-14.
Prompt for agents
The deprecation warning uses log().warn() at the top level of the module (import time). Since log() throws TypeError when the logger hasn't been initialized, this crashes any import that happens before initializeLogger() is called.
Two possible approaches:
1. Use console.warn() instead of log().warn() for the deprecation message, since console is always available:
console.warn('[livekit] The text-based turn detector is deprecated...');
2. Defer the warning to first use (e.g., in the constructor of EnglishModel/MultilingualModel) rather than at import time.
The same fix needs to be applied in plugins/silero/src/index.ts which has the identical pattern.
Was this helpful? React with 👍 or 👎 to provide feedback.
…frame The AudioFrame emitted on START_OF_SPEECH / END_OF_SPEECH sliced off the prefix-padding samples but still reported `samplesPerChannel = speechBufferIndex`, so the frame's metadata claimed more samples than its data contained and downstream consumers (STT, transcription) lost the pre-roll context the buffer machinery is designed to preserve. Slice from 0 instead so data length matches samplesPerChannel and the prefix-padding pre-roll is delivered, matching the Python original. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| import { INFERENCE_METHOD_MULTILINGUAL } from './multilingual.js'; | ||
|
|
||
| log().warn( | ||
| 'The text-based turn detector from @livekit/agents-plugins-livekit is deprecated. ' + |
There was a problem hiding this comment.
🟡 Typo in deprecation warning: wrong package name
The deprecation warning at module load time references @livekit/agents-plugins-livekit (plural "plugins"), but the actual package name is @livekit/agents-plugin-livekit (singular "plugin"). This makes the warning confusing for users trying to identify which dependency to update.
| 'The text-based turn detector from @livekit/agents-plugins-livekit is deprecated. ' + | |
| 'The text-based turn detector from @livekit/agents-plugin-livekit is deprecated. ' + |
Was this helpful? React with 👍 or 👎 to provide feedback.
Description
add audio eot model and local inference support, deprecating silero and turn detector plugins
Changes Made
Pre-Review Checklist
Testing
restaurant_agent.tsandrealtime_agent.tswork properly (for major changes)Additional Notes
Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.