Fully configure frame processors when they are used directly on an audio stream#679
Fully configure frame processors when they are used directly on an audio stream#6791egoman wants to merge 20 commits into
Conversation
…io stream And extracting metadata from that room that can be fed into the frame processor.
3e5a9ab to
f62c247
Compare
…oStream This makes it less complex.
The agents sdk can pass this opt-out flag so that it can reuse the frame processor across many audio tracks
Need to think about this a bit more, this pattern as written won't work, since the FrameProcessor today can't have a set of no-op credentials pushed.
…Processor methods, and use them when moving a track out of a room
564b2c7 to
8d3f4fe
Compare
These tests exercise all the frame processor track reparenting under room / etc paths.
| num_channels: int = 1, | ||
| frame_size_ms: int | None = None, | ||
| noise_cancellation: Optional[NoiseCancellationOptions | FrameProcessor[AudioFrame]] = None, | ||
| noise_cancellation_leave_open: bool = False, |
There was a problem hiding this comment.
| noise_cancellation_leave_open: bool = False, |
Can we move that inside NoiseCancellationOptions?
There was a problem hiding this comment.
Unfortunately, no - this is important to the FrameProcessor[AudioFrame] side of that noise_cancellation union. Open to putting it somewhere else but it needs to be settable in the FrameProcessor path.
There was a problem hiding this comment.
hmm, not sure if it's a good idea, but could it be a field on the FrameProcessor interface instead?
Then we could add it to NoiseCancellationOptions and new FrameProcessors would be able to set it on the processor itself
There was a problem hiding this comment.
It's not a setting that a frame processor would always want to have set or not have set, so I'm not sure that would really make sense either.
For context, the reason this is here is so the agents sdk can reuse a single FrameProcessor across multiple underlying tracks. Previously, this wasn't a problem in the way this used to work, because the agents sdk had the responsibility of closing the FrameProcessor, so it could easily do it at room disconnection time. But in order to support the ability to use FrameProcessors directly on an AudioStream, calling close needs to be pushed down deeper than the agents sdk layer. This flag allows the caller to explictly tell AudioStream that they will manage cleaning up the FrameProcessor so that both use cases can continue to work.
There was a problem hiding this comment.
I think this flag is not really configuring the noise suppression behavior, but how AudioStream deals with its own noise suppression, maybe the naming of noise_cancellation_leave_open is a bit confusing ?
how about close_noise_cancellation_on_stream_close or manage_noise_cancellation_processor ?
Updates the python sdk so that
FrameProcessor-based noise cancellation providers can be used directly onAudioStream, without having to go through the agent's RoomIO to be able to initialize itself with credentials.For example, with this change, something like the below becomes possible:
The way this works -
Tracks now keep track of which room they are part of (holding aweakrefvalue). When the room a track is in changes, it computes new frame processor options and sends these to anyAudioStreams which are associated with the track.The
noise_cancellation_leave_openparameter allows the agents sdk to call thisfrom_trackmethod with a frame processor which remains open across the whole session, and won't be auto-closed when the track is closed.This goes along with livekit/agents#5867, which removes the relevant event handling logic in the agents sdk. I will follow up with a node version of this once the python one is in a good state.
Todo