Add source@v1 role for audio input devices#52
Add source@v1 role for audio input devices#52rudyberends wants to merge 7 commits intoSendspin:mainfrom
Conversation
|
This would be amazing! One thing that I’ve been toying with on my current setup (turntable preamp->ADC->ffmpeg->icecast) is using an audio fingerprinting library or service to identify what’s playing on the turntable, and injecting the metadata into the stream. If it’s in the scope of what Sendspin is intended to do, it would be awesome to support that natively. edit: or would this be better handled by the server? |
|
Any updates or comments? Btw, believe I read somewhere they where considering calling this new type of client-role for "sender"? Ping @maximmaxim345 and @marcelveldt
For reference and if interested in discussing this further then please also see these related discussions and requests as well: |
|
There isn't much progress into getting a source or sender role into the specification yet, since it would be nice to have a working implementation in Music Assistant first (so we can figure out issues with the specification of a new role before it's part of the spec). Right now it's rather convoluted to add new roles in |
maximmaxim345
left a comment
There was a problem hiding this comment.
Thanks for the proposal and reference implementations @rudyberends !
One major thing missing with this role is sending of the base64 encoded header for opus and flac.
I think the most consistent way to solve this is to create a copy of the stream messages:
input_stream/startinput_stream/request-format(this can replace the format section ofserver/command.source)input_stream/end
In case we ever have another role that sends data from the client to the server, these input_stream messages can be reused.
| - `supported_commands`: string[] - subset of: 'play' | 'pause' | 'stop' | 'next' | 'previous' | 'volume' | 'mute' | 'repeat_off' | 'repeat_one' | 'repeat_all' | 'shuffle' | 'unshuffle' | 'switch' | 'select_source' | ||
| - `volume`: integer - volume of the whole group, range 0-100 | ||
| - `muted`: boolean - mute state of the whole group | ||
| - sources?: object[] - list of available/known sources on the server |
There was a problem hiding this comment.
Lets remove the select_source command from this PR.
If we include this, this should rather be part of future role since this adds quite a lot of data for basic controller use cases.
Just an idea: Maybe that future role will also allow you to see your library and select a album or playlist for playback? But that's something for later.
There was a problem hiding this comment.
Agreed — removed select_source from this PR and left it for a future “media/inputs” role. The reference implementation has been updated accordingly (no controller command, no select/clear CLI; only source listing remains).
There was a problem hiding this comment.
Can you drop sources here as well?
I think this also belongs to that future "media/inputs" role since on its own it doesn't really bring helpful information to the clients.
| - `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported | ||
| - `signal?`: 'unknown' | 'present' | 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported |
There was a problem hiding this comment.
What is the use case of unknown?
Maybe I'm missing something, but couldn't the client just set line_sense to false?
There was a problem hiding this comment.
It isn’t strictly required. We could simplify by only using present/absent, and treat signal as “unknown” when it’s omitted (or when line_sense=false).
The only reason to keep unknown is semantic clarity for clients that do support line sensing but can’t determine it yet (startup, device not ready, no samples). If we want to keep the spec minimal, dropping unknown is perfectly fine.
There was a problem hiding this comment.
I think we can drop unknown here to keep it simple.
For the server it doesn't really matter if the state is either unknown or absent since I imagine most implementations would treat it the same.
|
I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge:
Proposed Requirement: If we want sources with native output to remain viable in a synced environment, the spec should optionally allow sources to:
|
Thanks for the note — totally get the concern. The key point is that source@v1 is intentionally capture‑only. The client is meant to be as dumb as possible: it timestamps audio in the server time domain (using the existing time‑sync offset) and sends frames upstream. From there, the server already does what it does for every stream: buffer, resample/encode if needed, and distribute synchronized playback to the group. If the device also wants to hear its own input, the correct model is simply source + player on the same client, and the server will send the synchronized stream back to it like any other player. The reference implementation already demonstrates this: a source can be selected and played back in perfect sync across multiple clients, including the device that captured the input. So “synced playback” isn’t missing — it’s already solved by the existing server → player pipeline. What is outside scope is a source trying to keep its native local output in sync with the network stream. That would require hardware‑specific delay control and isn’t part of the source role by design. In short: capture stays dumb, server owns sync, and local playback is handled by the standard player path. |
@HarvsG also see these related discussions and requests as well as they talk more about different use case scenarios for using this feature, including ideas how a client as a appliance-like ”product” (e.g. device based on a ESP32 or a Raspberry Pi) could have two roles as both as source (capture) and player (output) at the same time: and
For a such scenario to work practically I think the client device needs to have two active roles as both as source (capture) and player (output) at the same time, as that would allow the client to send the source to the server and then get the returning syncronized stream to play locally. Hence the client device can not simply have pass-through for the local audio as then it will be impossible to syncronize. Therefore the physical client ”product” needs both inputs and outputs on the same device. While not compaible softwarewise check out the ports on existing devices like example the"UniFi PoE Audio Port" and the ”WiiM Ultra” which are soley used for a visual aid to show ideas of the type of different audio input and output ports that could be featureed on same audio input and output device in order to do both audio-capture for input source to the server and playback to output to local speakers at the same time:
|
I agree and i implemented exactly that in the reference flow: Added input_stream/start with codec_header (base64) for Opus/FLAC I also added optional source control commands in the reference implementation (play/pause/next/previous/activate/deactivate). These are advertised via source@v1_support.controls and sent as command.source.control. They’re purely optional and intended for controllable sources (e.g. networked players), while line‑in sources simply omit them. |
| } | ||
| ``` | ||
|
|
||
| ### Server → Client: `input_stream/request-format` |
There was a problem hiding this comment.
One of the things I miss in the spec is what if a client starts streaming data to the server, but the server doesn't care?
We would want a way to specify that. Maybe a request-format message to ask for none codec, and that is the default at start?
There was a problem hiding this comment.
The use cases I want to make sure that are covered by this role:
- A user starts a turntable connected to a Sendspin source device. The server starts playing the music in the same room as the turntable the moment the turntable starts playing, without interaction from the user on the Sendspin source device or Sendspin server.
- A user has the output of their computer available as a source in Sendspin, and from the Sendspin server can say: start streaming this source to speakers
There was a problem hiding this comment.
Great point. This is already the intended behavior, and it is how the implementation works.
The model is:
source.command (start / stop) defines server ingest interest.
Default is effectively not interested (stop) until the server sends start.
A source should only send media after start and input_stream/start.
If a source sends chunks while not started, the server drops/ignores them.
So we do not need a none codec to represent “server doesn’t care”; stop already covers that.
For the two use cases:
Turntable auto-start
- Source reports signal/state (signal: present, optional started event).
- Server policy decides to ingest and sends command: "start".
- Audio is routed immediately to the target room.
Computer output selectable from server
- User selects source in server UI/control plane.
- Server sends command: "start" to that source.
- Source sends input_stream/start + chunks.
- Server sends command: "stop" when done.
I agree we should make this explicit in the spec text (default stop + ignore/drop when not started), but no new mechanism is required.
There was a problem hiding this comment.
I have another Idea for the turntable use case: can't the client just directly start the input_stream?
If we require the server to immediately switch to the Turntable once it's detected, we don't need the negotiation with checking for client/state.source.signal = 'present'.
I think its a good idea to make this explicit, can you update the text @rudyberends ?
| - `stop`: server requests ingest to become inactive. The client should send `input_stream/end`, stop sending source audio chunks, and transition to `state: "idle"`. | ||
| - `control` is optional upstream-device control intent and only applies when advertised in `source@v1_support.controls`. | ||
| - `play` | `pause` | `next` | `previous`: control content playback behavior on the upstream source device (if supported). | ||
| - `activate` | `deactivate`: prepare or power-manage the upstream source path (for example power on/off, wake/sleep, input enable/disable). |
There was a problem hiding this comment.
How would the server know when to call this? Why can't the source do this automatically on play using a hook
|
|
||
| #### `vad` semantics | ||
|
|
||
| `vad` is an optional server hint for source-side line-sense behavior (`threshold_db`, `hold_ms`). It allows centralized tuning and consistent behavior across sources/groups. Clients may ignore unsupported hints. |
There was a problem hiding this comment.
This feels out of scope and more like a fleet management feature. This should just be locally configured?
There was a problem hiding this comment.
I'm also in favor of moving the VAD configuration outside the Sendspin protocol.
What we could do is recommend specific threshold_db and hold_ms values, but using those shouldn't be mandatory.
| - `start`: server requests ingest to become active. The client should transition to `state: "streaming"`, send `input_stream/start`, and then send source audio chunks. | ||
| - `stop`: server requests ingest to become inactive. The client should send `input_stream/end`, stop sending source audio chunks, and transition to `state: "idle"`. | ||
| - `control` is optional upstream-device control intent and only applies when advertised in `source@v1_support.controls`. | ||
| - `play` | `pause` | `next` | `previous`: control content playback behavior on the upstream source device (if supported). |
There was a problem hiding this comment.
If we include this, then we should also include the state of the source? (Basically the same info we send about Sendspin being played)
I wonder if this is scope creep and we shouldn't include this for now.
What is the use case?
|
Hey @rudyberends ! Now that the aiosendspin refactor is done, the biggest blocker for this PR is gone. Can you take a look at the remaining open review comments? I closed the ones you already addressed before. And as a heads-up for Music Assistant, we won’t implement this source role fully for now due to security concerns. But we can still merge the full |


Summary
This PR introduces a new source@v1 role to the Sendspin protocol, allowing audio input devices (e.g. line-in, turntable preamps, HDMI, Bluetooth receivers, microphones) to be represented and selected in a consistent, protocol-native way.
The goal is to enable remote audio inputs without increasing client complexity, while keeping the Sendspin server as the single place where all heavy processing happens.
All changes are additive and backward-compatible.
Motivation
Several real-world setups require audio to enter Sendspin from a device rather than originate inside the server:
• Line-in or turntable inputs connected to speakers or satellites
• HDMI / ARC inputs from TVs
• Bluetooth receivers acting as a local input
• Voice assistant or microphone satellites forwarding captured audio
Today the protocol focuses primarily on server-originated playback streams. This PR adds an additive source@v1 role to represent audio inputs as first-class sources, while keeping the server responsible for processing and distribution.
Design overview
The source role represents a client that:
• captures audio locally
• streams it to the server
• optionally reports basic signal presence or level
• does not perform any heavy processing
The server remains fully authoritative:
• resampling, transcoding, EQ
• buffering and synchronization
• visualization and distribution to players
Sources are intentionally kept simple so they can run on constrained devices.
Input semantics
Sources explicitly describe their behavior using two orthogonal concepts:
Input type
• analog – line-level style inputs (AUX, turntable preamp)
Audio presence depends on physical user interaction.
• digital – HDMI, S/PDIF, Bluetooth, or similar
Audio is usually continuous or remotely controllable.
Activation model
• manual – cannot be reliably started remotely (e.g. turntable)
• remote – server can start/stop capture predictably
• always_on – capture is always available
This allows controllers and UIs to behave sensibly without hard-coding device assumptions.
Signal presence and feedback
For analog inputs especially, it is important to avoid “playing silence”.
Therefore the role optionally supports:
• signal: present | absent | unknown (line sensing)
• level: normalized audio level (RMS/peak)
Both are optional and only reported if the source supports them.
Controller integration
• Controllers receive a list of available sources via server/state
• A new controller command select_source allows selecting an active source for a group
• Server-local inputs can be exposed as virtual source clients, using the same model
This keeps source selection aligned with existing group control concepts.
Protocol changes (high level)
• New role: source@v1
• Additions to:
• client/hello
• client/state
• client/command
• server/state
• server/command
• Binary message allocation for source audio frames
• Controller extensions for listing and selecting sources
No existing roles or message semantics are changed.
Compatibility
• Fully backward-compatible
• Existing clients and servers can ignore the new role
• No impact on playback, grouping, timing, or synchronization
Open for feedback
This proposal is meant as a starting point.
Feedback is very welcome on:
• field naming and structure
• activation semantics
• signal/level reporting
• whether anything should be simplified or removed
If parts of this feel out of scope or misaligned with Sendspin’s direction, I’m very happy to adjust or iterate.
If helpful, I am also happy to adapt or provide reference implementations.
Thanks for the great project and taking the time to review this.