Add source@v1 role for audio input devices by rudyberends · Pull Request #52 · Sendspin/spec

rudyberends · 2025-12-16T14:28:55Z

Summary

This PR introduces a new source@v1 role to the Sendspin protocol, allowing audio input devices (e.g. line-in, turntable preamps, HDMI, Bluetooth receivers, microphones) to be represented and selected in a consistent, protocol-native way.

The goal is to enable remote audio inputs without increasing client complexity, while keeping the Sendspin server as the single place where all heavy processing happens.

All changes are additive and backward-compatible.

Motivation

Several real-world setups require audio to enter Sendspin from a device rather than originate inside the server:
• Line-in or turntable inputs connected to speakers or satellites
• HDMI / ARC inputs from TVs
• Bluetooth receivers acting as a local input
• Voice assistant or microphone satellites forwarding captured audio

Today the protocol focuses primarily on server-originated playback streams. This PR adds an additive source@v1 role to represent audio inputs as first-class sources, while keeping the server responsible for processing and distribution.

Design overview

The source role represents a client that:
• captures audio locally
• streams it to the server
• optionally reports basic signal presence or level
• does not perform any heavy processing

The server remains fully authoritative:
• resampling, transcoding, EQ
• buffering and synchronization
• visualization and distribution to players

Sources are intentionally kept simple so they can run on constrained devices.

Input semantics

Sources explicitly describe their behavior using two orthogonal concepts:

Input type
• analog – line-level style inputs (AUX, turntable preamp)
Audio presence depends on physical user interaction.
• digital – HDMI, S/PDIF, Bluetooth, or similar
Audio is usually continuous or remotely controllable.

Activation model
• manual – cannot be reliably started remotely (e.g. turntable)
• remote – server can start/stop capture predictably
• always_on – capture is always available

This allows controllers and UIs to behave sensibly without hard-coding device assumptions.

Signal presence and feedback

For analog inputs especially, it is important to avoid “playing silence”.
Therefore the role optionally supports:
• signal: present | absent | unknown (line sensing)
• level: normalized audio level (RMS/peak)

Both are optional and only reported if the source supports them.

Controller integration
• Controllers receive a list of available sources via server/state
• A new controller command select_source allows selecting an active source for a group
• Server-local inputs can be exposed as virtual source clients, using the same model

This keeps source selection aligned with existing group control concepts.

Protocol changes (high level)
• New role: source@v1
• Additions to:
• client/hello
• client/state
• client/command
• server/state
• server/command
• Binary message allocation for source audio frames
• Controller extensions for listing and selecting sources

No existing roles or message semantics are changed.

Compatibility
• Fully backward-compatible
• Existing clients and servers can ignore the new role
• No impact on playback, grouping, timing, or synchronization

Open for feedback

This proposal is meant as a starting point.

Feedback is very welcome on:
• field naming and structure
• activation semantics
• signal/level reporting
• whether anything should be simplified or removed

If parts of this feel out of scope or misaligned with Sendspin’s direction, I’m very happy to adjust or iterate.

If helpful, I am also happy to adapt or provide reference implementations.

Thanks for the great project and taking the time to review this.

jgillies · 2025-12-31T15:37:22Z

This would be amazing! One thing that I’ve been toying with on my current setup (turntable preamp->ADC->ffmpeg->icecast) is using an audio fingerprinting library or service to identify what’s playing on the turntable, and injecting the metadata into the stream. If it’s in the scope of what Sendspin is intended to do, it would be awesome to support that natively.

edit: or would this be better handled by the server?

Hedda · 2026-01-19T08:33:38Z

Any updates or comments? Btw, believe I read somewhere they where considering calling this new type of client-role for "sender"?

Ping @maximmaxim345 and @marcelveldt

Motivation

Several real-world setups require audio to enter Sendspin from a device rather than originate inside the server: • Line-in or turntable inputs connected to speakers or satellites • HDMI / ARC inputs from TVs • Bluetooth receivers acting as a local input • Voice assistant or microphone satellites forwarding captured audio

Today the protocol focuses primarily on server-originated playback streams. This PR adds an additive source@v1 role to represent audio inputs as first-class sources, while keeping the server responsible for processing and distribution.

For reference and if interested in discussing this further then please also see these related discussions and requests as well:

maximmaxim345 · 2026-01-19T16:02:51Z

There isn't much progress into getting a source or sender role into the specification yet, since it would be nice to have a working implementation in Music Assistant first (so we can figure out issues with the specification of a new role before it's part of the spec). Right now it's rather convoluted to add new roles in aiosendspin (the server library used by Music Assistant). So I'm working on rewriting parts of aiosendspin first.
After that is done, getting the Visualization role included and tested (#28) is a higher priority, but I'd also love to see a source/sender role as part of the Sendspin specification.

maximmaxim345

Thanks for the proposal and reference implementations @rudyberends !

One major thing missing with this role is sending of the base64 encoded header for opus and flac.

I think the most consistent way to solve this is to create a copy of the stream messages:

input_stream/start
input_stream/request-format (this can replace the format section of server/command.source)
input_stream/end

In case we ever have another role that sends data from the client to the server, these input_stream messages can be reused.

README.md

maximmaxim345 · 2026-01-22T08:26:24Z

README.md

+  - `supported_commands`: string[] - subset of: 'play' | 'pause' | 'stop' | 'next' | 'previous' | 'volume' | 'mute' | 'repeat_off' | 'repeat_one' | 'repeat_all' | 'shuffle' | 'unshuffle' | 'switch' | 'select_source'
  - `volume`: integer - volume of the whole group, range 0-100
  - `muted`: boolean - mute state of the whole group
+  - sources?: object[] - list of available/known sources on the server


Lets remove the select_source command from this PR.
If we include this, this should rather be part of future role since this adds quite a lot of data for basic controller use cases.

Just an idea: Maybe that future role will also allow you to see your library and select a album or playlist for playback? But that's something for later.

Agreed — removed select_source from this PR and left it for a future “media/inputs” role. The reference implementation has been updated accordingly (no controller command, no select/clear CLI; only source listing remains).

Can you drop sources here as well?
I think this also belongs to that future "media/inputs" role since on its own it doesn't really bring helpful information to the clients.

README.md

maximmaxim345 · 2026-01-22T08:32:25Z

README.md

+  - `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported
+  - `signal?`: 'unknown' | 'present' | 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported


What is the use case of unknown?
Maybe I'm missing something, but couldn't the client just set line_sense to false?

It isn’t strictly required. We could simplify by only using present/absent, and treat signal as “unknown” when it’s omitted (or when line_sense=false).

The only reason to keep unknown is semantic clarity for clients that do support line sensing but can’t determine it yet (startup, device not ready, no samples). If we want to keep the spec minimal, dropping unknown is perfectly fine.

I think we can drop unknown here to keep it simple.
For the server it doesn't really matter if the state is either unknown or absent since I imagine most implementations would treat it the same.

HarvsG · 2026-01-24T19:54:31Z

I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge:

Local Playback Conflict: Example inputs like a TV (with built-in speakers) or a turntable (via a pre-amp) are often configured to play audio locally - whilst they would be simultaneously feeding Sendspin.
Lack of Latency Control: Most consumer hardware cannot report internal timestamps or buffer/delay their local output. Consequently, the local audio will play ahead of the synchronized Sendspin network stream.
The AV-Sync Dilemma: To achieve sync, users would have to mute the native output.
- For video sources (TVs), this creates significant lip-sync issues as the video will remain ahead of the distributed audio.
- For turntables this may mean muting the best speakers in the house, or switching inputs to a sendspin client

Proposed Requirement: If we want sources with native output to remain viable in a synced environment, the spec should optionally allow sources to:

Report accurate timestamp information.
Support internal buffering/delay to align local playback with the rest of the Sendspin group.

rudyberends · 2026-01-24T22:29:03Z

I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge:

Local Playback Conflict: Example inputs like a TV (with built-in speakers) or a turntable (via a pre-amp) are often configured to play audio locally - whilst they would be simultaneously feeding Sendspin.

Lack of Latency Control: Most consumer hardware cannot report internal timestamps or buffer/delay their local output. Consequently, the local audio will play ahead of the synchronized Sendspin network stream.

The AV-Sync Dilemma: To achieve sync, users would have to mute the native output.

For video sources (TVs), this creates significant lip-sync issues as the video will remain ahead of the distributed audio.

For turntables this may mean muting the best speakers in the house, or switching inputs to a sendspin client

Proposed Requirement: If we want sources with native output to remain viable in a synced environment, the spec should optionally allow sources to:

Report accurate timestamp information.

Support internal buffering/delay to align local playback with the rest of the Sendspin group.

Thanks for the note — totally get the concern. The key point is that source@v1 is intentionally capture‑only. The client is meant to be as dumb as possible: it timestamps audio in the server time domain (using the existing time‑sync offset) and sends frames upstream.

From there, the server already does what it does for every stream: buffer, resample/encode if needed, and distribute synchronized playback to the group. If the device also wants to hear its own input, the correct model is simply source + player on the same client, and the server will send the synchronized stream back to it like any other player.

The reference implementation already demonstrates this: a source can be selected and played back in perfect sync across multiple clients, including the device that captured the input.

So “synced playback” isn’t missing — it’s already solved by the existing server → player pipeline. What is outside scope is a source trying to keep its native local output in sync with the network stream. That would require hardware‑specific delay control and isn’t part of the source role by design.

In short: capture stays dumb, server owns sync, and local playback is handled by the standard player path.

Hedda · 2026-01-25T09:03:40Z

I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge

@HarvsG also see these related discussions and requests as well as they talk more about different use case scenarios for using this feature, including ideas how a client as a appliance-like ”product” (e.g. device based on a ESP32 or a Raspberry Pi) could have two roles as both as source (capture) and player (output) at the same time:

[QUESTION] Possible solutions for use case where embedded hardware can also have physical Line-Level Audio input and act as a Resonate audio stream source? #14

and

https://github.com/orgs/music-assistant/discussions/2343

Local Playback Conflict: Example inputs like a TV (with built-in speakers) or a turntable (via a pre-amp) are often configured to play audio locally - whilst they would be simultaneously feeding Sendspin.

For a such scenario to work practically I think the client device needs to have two active roles as both as source (capture) and player (output) at the same time, as that would allow the client to send the source to the server and then get the returning syncronized stream to play locally.

Hence the client device can not simply have pass-through for the local audio as then it will be impossible to syncronize. Therefore the physical client ”product” needs both inputs and outputs on the same device.

While not compaible softwarewise check out the ports on existing devices like example the"UniFi PoE Audio Port" and the ”WiiM Ultra” which are soley used for a visual aid to show ideas of the type of different audio input and output ports that could be featureed on same audio input and output device in order to do both audio-capture for input source to the server and playback to output to local speakers at the same time:

rudyberends · 2026-01-25T15:26:13Z

Thanks for the proposal and reference implementations @rudyberends !

One major thing missing with this role is sending of the base64 encoded header for opus and flac.

I think the most consistent way to solve this is to create a copy of the stream messages:

input_stream/start

input_stream/request-format (this can replace the format section of server/command.source)

input_stream/end

In case we ever have another role that sends data from the client to the server, these input_stream messages can be reused.

I agree and i implemented exactly that in the reference flow:

Added input_stream/start with codec_header (base64) for Opus/FLAC
Added input_stream/request-format and removed format from command.source
Added input_stream/end and require input_stream/start before sending audio chunks
So the source role now mirrors the stream message pattern and is reusable for future client→server media roles.

I also added optional source control commands in the reference implementation (play/pause/next/previous/activate/deactivate). These are advertised via source@v1_support.controls and sent as command.source.control. They’re purely optional and intended for controllable sources (e.g. networked players), while line‑in sources simply omit them.
Are you OK with including this in the spec as an optional capability, or should we keep it out of the spec for now?

README.md

balloob · 2026-02-16T14:46:44Z

README.md

+}
+```
+
+### Server → Client: `input_stream/request-format`


One of the things I miss in the spec is what if a client starts streaming data to the server, but the server doesn't care?

We would want a way to specify that. Maybe a request-format message to ask for none codec, and that is the default at start?

The use cases I want to make sure that are covered by this role:

A user starts a turntable connected to a Sendspin source device. The server starts playing the music in the same room as the turntable the moment the turntable starts playing, without interaction from the user on the Sendspin source device or Sendspin server.

A user has the output of their computer available as a source in Sendspin, and from the Sendspin server can say: start streaming this source to speakers

Great point. This is already the intended behavior, and it is how the implementation works.

The model is:

source.command (start / stop) defines server ingest interest.
Default is effectively not interested (stop) until the server sends start.
A source should only send media after start and input_stream/start.
If a source sends chunks while not started, the server drops/ignores them.
So we do not need a none codec to represent “server doesn’t care”; stop already covers that.

For the two use cases:

Turntable auto-start

Source reports signal/state (signal: present, optional started event).

Server policy decides to ingest and sends command: "start".

Audio is routed immediately to the target room.

Computer output selectable from server

User selects source in server UI/control plane.

Server sends command: "start" to that source.

Source sends input_stream/start + chunks.

Server sends command: "stop" when done.

I agree we should make this explicit in the spec text (default stop + ignore/drop when not started), but no new mechanism is required.

I have another Idea for the turntable use case: can't the client just directly start the input_stream?
If we require the server to immediately switch to the Turntable once it's detected, we don't need the negotiation with checking for client/state.source.signal = 'present'.

I think its a good idea to make this explicit, can you update the text @rudyberends ?

…havior

balloob · 2026-02-17T09:44:09Z

README.md

+  - `stop`: server requests ingest to become inactive. The client should send `input_stream/end`, stop sending source audio chunks, and transition to `state: "idle"`.
+- `control` is optional upstream-device control intent and only applies when advertised in `source@v1_support.controls`.
+  - `play` | `pause` | `next` | `previous`: control content playback behavior on the upstream source device (if supported).
+  - `activate` | `deactivate`: prepare or power-manage the upstream source path (for example power on/off, wake/sleep, input enable/disable).


How would the server know when to call this? Why can't the source do this automatically on play using a hook

balloob · 2026-02-17T09:46:53Z

README.md

+
+#### `vad` semantics
+
+`vad` is an optional server hint for source-side line-sense behavior (`threshold_db`, `hold_ms`). It allows centralized tuning and consistent behavior across sources/groups. Clients may ignore unsupported hints.


This feels out of scope and more like a fleet management feature. This should just be locally configured?

I'm also in favor of moving the VAD configuration outside the Sendspin protocol.
What we could do is recommend specific threshold_db and hold_ms values, but using those shouldn't be mandatory.

balloob · 2026-02-17T09:50:10Z

README.md

+  - `start`: server requests ingest to become active. The client should transition to `state: "streaming"`, send `input_stream/start`, and then send source audio chunks.
+  - `stop`: server requests ingest to become inactive. The client should send `input_stream/end`, stop sending source audio chunks, and transition to `state: "idle"`.
+- `control` is optional upstream-device control intent and only applies when advertised in `source@v1_support.controls`.
+  - `play` | `pause` | `next` | `previous`: control content playback behavior on the upstream source device (if supported).


If we include this, then we should also include the state of the source? (Basically the same info we send about Sendspin being played)

I wonder if this is scope creep and we shouldn't include this for now.

What is the use case?

maximmaxim345 · 2026-03-03T10:13:20Z

Hey @rudyberends !

Now that the aiosendspin refactor is done, the biggest blocker for this PR is gone.

Can you take a look at the remaining open review comments? I closed the ones you already addressed before.

And as a heads-up for Music Assistant, we won’t implement this source role fully for now due to security concerns.
Specifically, we will hold off on adding automatic switching to a playing source until the Sendspin spec has support for authentication and encryption.

But we can still merge the full source@v1 into the spec before that.

spec: introduce source role for streaming audio inputs to server

e958810

Hedda mentioned this pull request Jan 19, 2026

[QUESTION] Possible solutions for use case where embedded hardware can also have physical Line-Level Audio input and act as a Resonate audio stream source? #14

Open

rudyberends and others added 2 commits January 20, 2026 13:32

spec: updated spec to reflect reference implementation

7036dde

Merge branch 'main' into spec/add-source-role

e2aac33

maximmaxim345 reviewed Jan 22, 2026

View reviewed changes

This was referenced Jan 22, 2026

Added Local Audio Source Provider music-assistant/server#2356

Draft

allow specifying source ffmpeg format Sendspin/sendspin-cli#102

Merged

rudyberends added 2 commits January 25, 2026 10:11

Update source spec (formats, controller, timing notes)

05ab13d

Add input_stream messages and VAD hints for source@v1

8352361

rudyberends marked this pull request as ready for review January 27, 2026 17:57

Hedda mentioned this pull request Feb 11, 2026

[REQUEST] ESP32 Louder ESP dev board with Line-Level Audio input (to act as ADC and network streamer)? sonocotta/esp32-audio-dock#53

Open

spec(source): align source@v1 docs with reference implementation

49cb037

balloob reviewed Feb 16, 2026

View reviewed changes

README.md Show resolved Hide resolved

balloob reviewed Feb 16, 2026

View reviewed changes

spec(source): clarify command/control semantics and default ingest be…

c8fda54

…havior

balloob reviewed Feb 17, 2026

View reviewed changes

		- `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported
		- `signal?`: 'unknown' \| 'present' \| 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported


		#### `vad` semantics

		`vad` is an optional server hint for source-side line-sense behavior (`threshold_db`, `hold_ms`). It allows centralized tuning and consistent behavior across sources/groups. Clients may ignore unsupported hints.

Conversation

rudyberends commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgillies commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hedda commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maximmaxim345 commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maximmaxim345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HarvsG commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rudyberends commented Jan 24, 2026

Uh oh!

Hedda commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rudyberends commented Jan 25, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maximmaxim345 commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rudyberends commented Dec 16, 2025 •

edited

Loading

jgillies commented Dec 31, 2025 •

edited

Loading

Hedda commented Jan 19, 2026 •

edited

Loading

maximmaxim345 commented Jan 19, 2026 •

edited

Loading

HarvsG commented Jan 24, 2026 •

edited

Loading

Hedda commented Jan 25, 2026 •

edited

Loading