Music duck and keep-awake behaviour during dictation by AndrewBeniston · Pull Request #325 · altic-dev/FluidVoice

AndrewBeniston · 2026-05-06T20:05:03Z

What this adds

Two independent settings, both wired through the existing pause-media hooks in ASRService. Either could be merged on its own; happy to split into two PRs if you prefer.

Music During Transcription (replaces the existing pause toggle)

The current Pause Media During Transcription boolean becomes a three-mode picker: Leave Playing, Pause (the existing behaviour, unchanged), or Lower Volume.

Lower Volume snapshots the system output volume, fades it down to 10% over 200ms, and fades it back up the same way when you stop. The reason for the duck rather than the existing pause: psychologically it feels much nicer to have the music keep playing quietly. When the audio cuts entirely, the absence of background sound is jarring — a bit like in a film when the room tone drops out and you suddenly notice the silence. Dipping the volume to 10% keeps the ambience without competing with what you're saying into the mic. 10% turns out to be the sweet spot — quiet enough not to corrupt the transcript, loud enough that the room still feels alive.

Keep Mac Awake While Dictating

Holds an IOPMAssertion of type kIOPMAssertionTypePreventUserIdleDisplaySleep for the duration of a recording so the screen doesn't dim or lock mid-session. Released the instant recording stops. Useful for long-form dictation where the user is sitting still and the display would otherwise time out and trigger the screen-lock cascade.

Defaults to true. Doesn't cover lid-close sleep (that would need kIOPMAssertionTypePreventSystemSleep with the lid-close option, which feels like overreach — happy to add it if you want).

Implementation notes

SystemVolumeController (new) — thread-safe wrapper around AudioObjectGet/SetPropertyData on the default output device's master volume scalar. Falls back to writing channels 1+2 individually for devices that don't expose kAudioObjectPropertyElementMain.
SleepPreventionService (new) — IOPMAssertionCreateWithName / IOPMAssertionRelease wrapper.
MediaPlaybackService gains duckSystemVolume() / restoreSystemVolume(previous:) and a unified restore(from: MediaSessionAction) that handles either pause-resume or volume-restore at every exit path.
The fade ramp runs on a detached Task so the main actor isn't blocked between steps. A new fade cancels any in-flight one, which matters when the user releases the hotkey before the duck-down has finished and the fade-up needs to take over cleanly. If the user's volume is already at or below 10%, the duck is skipped entirely (would otherwise raise a quieter volume up to 10%, which is the wrong direction).
Volume restore fires the moment the audio engine is fully torn down, not at the end of stop() after final transcription completes. That way the volume snaps back when the user releases the hotkey rather than 1-3s later when the model finishes.

Backwards compatibility

The legacy pauseMediaDuringTranscription UserDefaults key migrates to a new MediaBehaviorDuringTranscription enum on first read. The pauseMediaDuringTranscription Bool getter/setter remains on SettingsStore as a shim (true ↔ .pause, false ↔ .none) so the BackupService payload stays format-compatible without a schema bump. Old backups restore cleanly as .pause or .none. The duck setting itself doesn't survive a backup/restore round trip — felt acceptable since it's a UX preference rather than load-bearing config, but happy to bump the backup schema if you'd rather.

Testing

Built and tested locally on Apple Silicon (macOS 15) against upstream/main. Verified manually:

Picker switches cleanly between modes; legacy boolean migrates on first launch
Lower Volume + Spotify: smooth fade-down on hotkey hold, smooth fade-up on release; volume returns to original level (not just 100%)
Lower Volume with system volume already at 5%: skipped, no false volume raise
Tap-and-immediately-release hotkey: fade-up starts from wherever the fade-down had got to, no jump
Keep-awake assertion holds while recording, releases on stop (verified via pmset -g assertions | grep FluidVoice)
Keep-awake released even on start() failure path (forced by denying mic mid-test)
Plist round-trip: settings persist across app restarts under the same com.FluidApp.app bundle ID

Not added: integration tests for either feature. Felt awkward to test system-volume mutation and IOPMAssertion lifecycle in CI, but if you'd like me to attempt it I'll happily add them.

Two new things you can turn on in Settings. Music During Transcription is now a picker (Leave Playing / Pause / Lower Volume) instead of a boolean. Lower Volume snapshots the system output volume, fades it down to 10% over 200ms, and fades it back up the same way when you stop. The fade ramp runs on a detached Task so the main actor isn't blocked between steps; a new fade cancels any in-flight one, which matters when the user releases the hotkey before the duck-down has finished and the fade-up has to take over cleanly. If the user's volume is already at or below the duck target, the duck is skipped entirely. Keep Mac Awake While Dictating holds an IOPMAssertion of type kIOPMAssertionTypePreventUserIdleDisplaySleep for the duration of a recording so the screen doesn't dim or lock mid-session. Released as soon as recording stops, including in the start() failure path. The volume restore fires the moment the audio engine is fully torn down, not at the end of stop() after final transcription completes. That way the volume snaps back when the user releases the hotkey rather than 1-3s later when the model finishes. Two new files: SystemVolumeController wraps the CoreAudio volume reads and writes (AudioObjectGet/SetPropertyData on the default output device's master volume scalar, with per-channel fallback for devices that don't expose master), SleepPreventionService wraps the IOPMAssertion. The legacy pauseMediaDuringTranscription Bool migrates to a new MediaBehaviorDuringTranscription enum on first read, and remains as a compatibility shim on the SettingsStore so the BackupService payload stays format-compatible — old backups restore as .pause or .none. The duck setting itself doesn't survive a backup/restore round trip, which seems fine since it's a UX preference rather than load-bearing config.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5685ffe5a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T20:08:23Z

+            } else if self.mediaBehaviorDuringTranscription == .pause {
+                self.mediaBehaviorDuringTranscription = .none


Make media-backup restore override duck mode on false

When SettingsStore.restore(from:) applies payload.pauseMediaDuringTranscription, a false value will not clear .duck because the legacy setter only maps false to .none if the current mode is .pause. This makes restore behavior depend on pre-existing local state: importing a backup that intends “leave playing” can silently keep ducking enabled if the target machine is already on .duck. Restore should apply the backup deterministically regardless of current mode.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-06T20:08:23Z

+        let left = readChannelVolume(deviceID: deviceID, channel: 1)
+        let right = readChannelVolume(deviceID: deviceID, channel: 2)
+        switch (left, right) {
+        case let (l?, r?): return (l + r) / 2


Preserve per-channel volume when ducking fallback devices

The duck snapshot stores only a single averaged scalar in the fallback path ((l + r) / 2), then restore writes that one value back to both channels. On output devices that do not expose master volume but do have distinct left/right levels (for example non-centered balance setups), one duck cycle permanently flattens channel balance. This should capture and restore per-channel values instead of collapsing to an average.

Useful? React with 👍 / 👎.

Two fixes for issues the automated reviewer flagged. 1. Backup restore: previously the legacy pauseMediaDuringTranscription bool's setter only mapped false to .none if the current mode was .pause, so restoring a backup that meant "leave playing" onto a machine currently set to .duck silently kept ducking. Restore now reads the new lossless mediaBehaviorDuringTranscription enum field from the payload when present, falling back to the bool only for backups created by older builds. The legacy bool's setter is also tightened so false always means .none — deterministic regardless of prior state. 2. Per-channel volume preservation: the duck snapshot path used to average L/R into a single scalar on devices without master volume, then write that average back to both channels on restore. One duck cycle would permanently flatten any non-centred balance. Replace the Float scalar in MediaSessionAction.ducked and the duck/restore API surface with a SystemVolumeSnapshot enum that holds either the master scalar or distinct L/R values, then restore the snapshot exactly at the end of the fade-up ramp so balance survives. Fade itself stays a scalar interpolation — the precise channel restore only happens on the final write, so the ramp's CPU profile is unchanged.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf40a9af80

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T21:42:22Z

+    /// Re-applies a snapshot exactly. For master snapshots this writes the
+    /// master scalar; for per-channel snapshots this writes the original
+    /// left and right values independently, preserving stereo balance that
+    /// `setVolume(_:)` would otherwise have flattened.


Restore ducked volume on the same output device

restore(_:) always resolves defaultOutputDeviceID() at restore time, but the snapshot does not carry the device that was originally ducked. If the default output changes during dictation (for example, Bluetooth headphones disconnect/reconnect or the user switches outputs), the old device can remain stuck at the ducked level while restore is applied to a different device. This leaves a persistent user-visible volume mutation after recording stops.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-07T21:42:22Z

    let pauseMediaDuringTranscription: Bool
+    /// Lossless capture of the unified media-behaviour enum (none / pause /
+    /// duck). Optional so that backups created by older builds (which only
+    /// wrote the legacy bool) still decode cleanly. New builds prefer this
+    /// field on restore and fall back to the bool only when it's nil.
+    let mediaBehaviorDuringTranscription: SettingsStore.MediaBehaviorDuringTranscription?
    let vocabularyBoostingEnabled: Bool


Include keep-awake preference in backup payload

The new preventSleepDuringTranscription setting is exposed in UI and persisted locally, but it is not part of SettingsBackupPayload, so backup/restore silently drops the user’s choice and reverts to the default (true) after restore. Since this commit explicitly updates backup format for adjacent media settings, omitting this new toggle makes restore behavior inconsistent for users migrating machines.

Useful? React with 👍 / 👎.

Three changes to make the duck feel snappy rather than lagging behind the start sound. The duck was previously fired inside ASRService.start(), which is called via Task { await asr.start() } after captureRecordingContext + applyDictationShortcutSelectionContext + setActiveRecordingMode + setOverlayMode have all run. That puts the duck ~80ms behind the hotkey on a typical Mac, while the start sound (which is fired from the same beginDictationRecording call site) becomes audible from CoreAudio in ~30ms. So the user heard the sound first and the fade a beat later. 1. Hoist the duck and start sound to the very top of beginDictationRecording in ContentView. They now fire alongside the hotkey press, before any of the pre-recording UI work. The duck snapshot is threaded through to ASRService.start() as a `preAppliedMediaAction` parameter so the existing in-start duck-handling code adopts the action without re-firing it. 2. Snap the volume halfway down to the duck target SYNCHRONOUSLY inside duckSystemVolume() before kicking off the detached fade Task. CoreAudio property writes are sub-millisecond round trip, so the audible drop now lands within ~1ms of the call returning, bypassing both Task.detached scheduling latency (occasionally 10-30ms on Debug builds) and the fade ramp's first few steps where each per-step volume change is too small to perceive. The detached fade then smoothly lands the rest of the way, so the duck still has a soft edge. 3. ASRService.start() restores the pre-applied media action on its guard-failure paths (mic-not-authorized, already-running) so a hotkey press that doesn't end up starting a session doesn't leave the user's volume stuck at the duck target.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a83116966

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T22:18:17Z

+        self.activeFadeTask?.cancel()
+


Preserve original volume across overlapping duck sessions

Cancelling any in-flight fade before starting a new one can lock in a partially-restored volume as the new baseline. If a user starts a second recording while the previous restoreSystemVolume fade-up is still running (the 100ms window), duckSystemVolume() snapshots that transient level and startFade cancels the prior restore, so the next stop restores to the transient value instead of the true pre-session volume. This produces cumulative volume drift across rapid back-to-back recordings.

Useful? React with 👍 / 👎.

altic-dev · 2026-05-07T22:42:15Z

@AndrewBeniston Thanks for the PR - seem useful to me :) But only ask would be to fix the Codex suggestions + Follow the PR template We have to help me review better. Also please attach some demo / pictures to show the working. Thank you!

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Music duck and keep-awake behaviour during dictation#325

Music duck and keep-awake behaviour during dictation#325
AndrewBeniston wants to merge 3 commits intoaltic-dev:mainfrom
AndrewBeniston:pr/duck-and-keep-awake

AndrewBeniston commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

altic-dev commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		} else if self.mediaBehaviorDuringTranscription == .pause {
		self.mediaBehaviorDuringTranscription = .none

Conversation

AndrewBeniston commented May 6, 2026

What this adds

Music During Transcription (replaces the existing pause toggle)

Keep Mac Awake While Dictating

Implementation notes

Backwards compatibility

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

altic-dev commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants