fix(amd): defer SIP listening until answer#1639
fix(amd): defer SIP listening until answer#1639rosetta-livekit-bot[bot] wants to merge 1 commit into
Conversation
|
| if (!participant) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
🔴 AMD never starts listening when participant resolution fails after track publication
In gateListening(), when participant is undefined at line 584 (e.g., the participant disconnected between track publication and the lookup, or publicationSid is falsy), the .then() handler returns without calling startListening(). This means the no-speech timer never starts, VAD events and transcripts are silently dropped (handleUserStateChanged and consumeTranscript both return early when this.listening is false), and AMD only settles when the detection timer fires after 20 seconds, producing a meaningless UNCERTAIN result.
This is a regression from the previous code which always called startNoSpeechTimer() after track publication. The .catch() handler at agents/src/voice/amd.ts:610-622 correctly falls back to startListening(), but the !participant branch in .then() does not.
| if (!participant) { | |
| return; | |
| } | |
| if (!participant) { | |
| this.startListening(); | |
| return; | |
| } | |
Was this helpful? React with 👍 or 👎 to provide feedback.
| const current = room.remoteParticipants.get(identity); | ||
| if (current && current.attributes[attribute] === value) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
🟡 waitForParticipantAttribute hangs if participant disconnects between existence check and listener setup
In waitForParticipantAttribute, the participant existence is checked at line 1078 before event listeners are registered at lines 1114-1117. If the participant disconnects in that window, the ParticipantDisconnected event fires before our listener is registered (missed), and at line 1120 current is undefined (participant gone from map). The function falls through to await fut.await which can only resolve via room disconnect or signal abort — there's no path that detects the participant already left.
Comparison with existing waitForParticipant pattern
The sibling function waitForParticipant (agents/src/utils.ts:1028-1048) correctly avoids this by setting up listeners first and then checking existing participants, ensuring no events are missed. waitForParticipantAttribute should similarly check !current after listener registration and reject/throw.
In the AMD caller context this is mitigated by the trackGateAbort signal (eventually aborted by cleanup), but the utility function itself is incorrect for any caller without such a safety net.
| const current = room.remoteParticipants.get(identity); | |
| if (current && current.attributes[attribute] === value) { | |
| return; | |
| } | |
| const current = room.remoteParticipants.get(identity); | |
| if (!current) { | |
| throw new Error(`Participant ${identity} is no longer in the room`); | |
| } | |
| if (current.attributes[attribute] === value) { | |
| return; | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
Previously AMD started all its timers as soon as the SIP audio track was subscribed. Because audio tracks can be published during ringing or carrier early media (before SIP ANSWER), this poisoned the classifier with pre-answer audio and burned the no-speech budget.
This PR correctly waits for SIP active state to start the no-speech timer. Detection timeout is still armed when track is subscribed.
Also adds a reusable wait_for_participant_attribute helper in utils.participant with a dedicated
ParticipantAttributeWaitAbortedexception, and tracks/cleans up the deferred setup task properly.Now it maps to
uncertain;Previously, verdict is emitted when the prediction arrives and if the silence threshold is satisfied, but it doesn't mean the user turn is ended. A new option
wait_until_finished(default False) now make sure we wait for both EOT and silence threshold for machines.This helps when previously, new normal generation can be triggered in parallel with a generate_reply call after AMD has interrupted any preemptive generations, emitted the verdict and released the hold for playout (the new generation is too late for AMD to interrupt, due to late STT for example).
If EOT is properly being waited, in addition to the silence threshold, then the parallel generation will be interrupted when
interrupt_on_machine=True.This flag still respects the overall
timeoutvalue if there is no speech or transcript arrives.