Skip to content

Music duck and keep-awake behaviour during dictation#325

Open
AndrewBeniston wants to merge 3 commits intoaltic-dev:mainfrom
AndrewBeniston:pr/duck-and-keep-awake
Open

Music duck and keep-awake behaviour during dictation#325
AndrewBeniston wants to merge 3 commits intoaltic-dev:mainfrom
AndrewBeniston:pr/duck-and-keep-awake

Conversation

@AndrewBeniston
Copy link
Copy Markdown

What this adds

Two independent settings, both wired through the existing pause-media hooks in ASRService. Either could be merged on its own; happy to split into two PRs if you prefer.

Music During Transcription (replaces the existing pause toggle)

The current Pause Media During Transcription boolean becomes a three-mode picker: Leave Playing, Pause (the existing behaviour, unchanged), or Lower Volume.

Lower Volume snapshots the system output volume, fades it down to 10% over 200ms, and fades it back up the same way when you stop. The reason for the duck rather than the existing pause: psychologically it feels much nicer to have the music keep playing quietly. When the audio cuts entirely, the absence of background sound is jarring — a bit like in a film when the room tone drops out and you suddenly notice the silence. Dipping the volume to 10% keeps the ambience without competing with what you're saying into the mic. 10% turns out to be the sweet spot — quiet enough not to corrupt the transcript, loud enough that the room still feels alive.

Keep Mac Awake While Dictating

Holds an IOPMAssertion of type kIOPMAssertionTypePreventUserIdleDisplaySleep for the duration of a recording so the screen doesn't dim or lock mid-session. Released the instant recording stops. Useful for long-form dictation where the user is sitting still and the display would otherwise time out and trigger the screen-lock cascade.

Defaults to true. Doesn't cover lid-close sleep (that would need kIOPMAssertionTypePreventSystemSleep with the lid-close option, which feels like overreach — happy to add it if you want).

Implementation notes

  • SystemVolumeController (new) — thread-safe wrapper around AudioObjectGet/SetPropertyData on the default output device's master volume scalar. Falls back to writing channels 1+2 individually for devices that don't expose kAudioObjectPropertyElementMain.
  • SleepPreventionService (new) — IOPMAssertionCreateWithName / IOPMAssertionRelease wrapper.
  • MediaPlaybackService gains duckSystemVolume() / restoreSystemVolume(previous:) and a unified restore(from: MediaSessionAction) that handles either pause-resume or volume-restore at every exit path.
  • The fade ramp runs on a detached Task so the main actor isn't blocked between steps. A new fade cancels any in-flight one, which matters when the user releases the hotkey before the duck-down has finished and the fade-up needs to take over cleanly. If the user's volume is already at or below 10%, the duck is skipped entirely (would otherwise raise a quieter volume up to 10%, which is the wrong direction).
  • Volume restore fires the moment the audio engine is fully torn down, not at the end of stop() after final transcription completes. That way the volume snaps back when the user releases the hotkey rather than 1-3s later when the model finishes.

Backwards compatibility

The legacy pauseMediaDuringTranscription UserDefaults key migrates to a new MediaBehaviorDuringTranscription enum on first read. The pauseMediaDuringTranscription Bool getter/setter remains on SettingsStore as a shim (true ↔ .pause, false ↔ .none) so the BackupService payload stays format-compatible without a schema bump. Old backups restore cleanly as .pause or .none. The duck setting itself doesn't survive a backup/restore round trip — felt acceptable since it's a UX preference rather than load-bearing config, but happy to bump the backup schema if you'd rather.

Testing

Built and tested locally on Apple Silicon (macOS 15) against upstream/main. Verified manually:

  • Picker switches cleanly between modes; legacy boolean migrates on first launch
  • Lower Volume + Spotify: smooth fade-down on hotkey hold, smooth fade-up on release; volume returns to original level (not just 100%)
  • Lower Volume with system volume already at 5%: skipped, no false volume raise
  • Tap-and-immediately-release hotkey: fade-up starts from wherever the fade-down had got to, no jump
  • Keep-awake assertion holds while recording, releases on stop (verified via pmset -g assertions | grep FluidVoice)
  • Keep-awake released even on start() failure path (forced by denying mic mid-test)
  • Plist round-trip: settings persist across app restarts under the same com.FluidApp.app bundle ID

Not added: integration tests for either feature. Felt awkward to test system-volume mutation and IOPMAssertion lifecycle in CI, but if you'd like me to attempt it I'll happily add them.

Two new things you can turn on in Settings.

Music During Transcription is now a picker (Leave Playing / Pause /
Lower Volume) instead of a boolean. Lower Volume snapshots the system
output volume, fades it down to 10% over 200ms, and fades it back up
the same way when you stop. The fade ramp runs on a detached Task so
the main actor isn't blocked between steps; a new fade cancels any
in-flight one, which matters when the user releases the hotkey before
the duck-down has finished and the fade-up has to take over cleanly.
If the user's volume is already at or below the duck target, the duck
is skipped entirely.

Keep Mac Awake While Dictating holds an IOPMAssertion of type
kIOPMAssertionTypePreventUserIdleDisplaySleep for the duration of a
recording so the screen doesn't dim or lock mid-session. Released as
soon as recording stops, including in the start() failure path.

The volume restore fires the moment the audio engine is fully torn
down, not at the end of stop() after final transcription completes.
That way the volume snaps back when the user releases the hotkey
rather than 1-3s later when the model finishes.

Two new files: SystemVolumeController wraps the CoreAudio volume reads
and writes (AudioObjectGet/SetPropertyData on the default output
device's master volume scalar, with per-channel fallback for devices
that don't expose master), SleepPreventionService wraps the
IOPMAssertion. The legacy pauseMediaDuringTranscription Bool migrates
to a new MediaBehaviorDuringTranscription enum on first read, and
remains as a compatibility shim on the SettingsStore so the
BackupService payload stays format-compatible — old backups restore
as .pause or .none. The duck setting itself doesn't survive a
backup/restore round trip, which seems fine since it's a UX preference
rather than load-bearing config.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5685ffe5a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2956 to +2957
} else if self.mediaBehaviorDuringTranscription == .pause {
self.mediaBehaviorDuringTranscription = .none
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make media-backup restore override duck mode on false

When SettingsStore.restore(from:) applies payload.pauseMediaDuringTranscription, a false value will not clear .duck because the legacy setter only maps false to .none if the current mode is .pause. This makes restore behavior depend on pre-existing local state: importing a backup that intends “leave playing” can silently keep ducking enabled if the target machine is already on .duck. Restore should apply the backup deterministically regardless of current mode.

Useful? React with 👍 / 👎.

let left = readChannelVolume(deviceID: deviceID, channel: 1)
let right = readChannelVolume(deviceID: deviceID, channel: 2)
switch (left, right) {
case let (l?, r?): return (l + r) / 2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve per-channel volume when ducking fallback devices

The duck snapshot stores only a single averaged scalar in the fallback path ((l + r) / 2), then restore writes that one value back to both channels. On output devices that do not expose master volume but do have distinct left/right levels (for example non-centered balance setups), one duck cycle permanently flattens channel balance. This should capture and restore per-channel values instead of collapsing to an average.

Useful? React with 👍 / 👎.

Two fixes for issues the automated reviewer flagged.

1. Backup restore: previously the legacy pauseMediaDuringTranscription
   bool's setter only mapped false to .none if the current mode was
   .pause, so restoring a backup that meant "leave playing" onto a
   machine currently set to .duck silently kept ducking. Restore now
   reads the new lossless mediaBehaviorDuringTranscription enum field
   from the payload when present, falling back to the bool only for
   backups created by older builds. The legacy bool's setter is also
   tightened so false always means .none — deterministic regardless of
   prior state.

2. Per-channel volume preservation: the duck snapshot path used to
   average L/R into a single scalar on devices without master volume,
   then write that average back to both channels on restore. One duck
   cycle would permanently flatten any non-centred balance. Replace
   the Float scalar in MediaSessionAction.ducked and the duck/restore
   API surface with a SystemVolumeSnapshot enum that holds either the
   master scalar or distinct L/R values, then restore the snapshot
   exactly at the end of the fade-up ramp so balance survives. Fade
   itself stays a scalar interpolation — the precise channel restore
   only happens on the final write, so the ramp's CPU profile is
   unchanged.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf40a9af80

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/// Re-applies a snapshot exactly. For master snapshots this writes the
/// master scalar; for per-channel snapshots this writes the original
/// left and right values independently, preserving stereo balance that
/// `setVolume(_:)` would otherwise have flattened.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore ducked volume on the same output device

restore(_:) always resolves defaultOutputDeviceID() at restore time, but the snapshot does not carry the device that was originally ducked. If the default output changes during dictation (for example, Bluetooth headphones disconnect/reconnect or the user switches outputs), the old device can remain stuck at the ducked level while restore is applied to a different device. This leaves a persistent user-visible volume mutation after recording stops.

Useful? React with 👍 / 👎.

Comment on lines 63 to 69
let pauseMediaDuringTranscription: Bool
/// Lossless capture of the unified media-behaviour enum (none / pause /
/// duck). Optional so that backups created by older builds (which only
/// wrote the legacy bool) still decode cleanly. New builds prefer this
/// field on restore and fall back to the bool only when it's nil.
let mediaBehaviorDuringTranscription: SettingsStore.MediaBehaviorDuringTranscription?
let vocabularyBoostingEnabled: Bool
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include keep-awake preference in backup payload

The new preventSleepDuringTranscription setting is exposed in UI and persisted locally, but it is not part of SettingsBackupPayload, so backup/restore silently drops the user’s choice and reverts to the default (true) after restore. Since this commit explicitly updates backup format for adjacent media settings, omitting this new toggle makes restore behavior inconsistent for users migrating machines.

Useful? React with 👍 / 👎.

Three changes to make the duck feel snappy rather than lagging behind
the start sound.

The duck was previously fired inside ASRService.start(), which is
called via Task { await asr.start() } after captureRecordingContext +
applyDictationShortcutSelectionContext + setActiveRecordingMode +
setOverlayMode have all run. That puts the duck ~80ms behind the
hotkey on a typical Mac, while the start sound (which is fired from
the same beginDictationRecording call site) becomes audible from
CoreAudio in ~30ms. So the user heard the sound first and the fade a
beat later.

1. Hoist the duck and start sound to the very top of
   beginDictationRecording in ContentView. They now fire alongside
   the hotkey press, before any of the pre-recording UI work. The
   duck snapshot is threaded through to ASRService.start() as a
   `preAppliedMediaAction` parameter so the existing in-start
   duck-handling code adopts the action without re-firing it.

2. Snap the volume halfway down to the duck target SYNCHRONOUSLY
   inside duckSystemVolume() before kicking off the detached fade
   Task. CoreAudio property writes are sub-millisecond round trip,
   so the audible drop now lands within ~1ms of the call returning,
   bypassing both Task.detached scheduling latency (occasionally
   10-30ms on Debug builds) and the fade ramp's first few steps where
   each per-step volume change is too small to perceive. The detached
   fade then smoothly lands the rest of the way, so the duck still
   has a soft edge.

3. ASRService.start() restores the pre-applied media action on its
   guard-failure paths (mic-not-authorized, already-running) so a
   hotkey press that doesn't end up starting a session doesn't leave
   the user's volume stuck at the duck target.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a83116966

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +265 to +266
self.activeFadeTask?.cancel()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve original volume across overlapping duck sessions

Cancelling any in-flight fade before starting a new one can lock in a partially-restored volume as the new baseline. If a user starts a second recording while the previous restoreSystemVolume fade-up is still running (the 100ms window), duckSystemVolume() snapshots that transient level and startFade cancels the prior restore, so the next stop restores to the transient value instead of the true pre-session volume. This produces cumulative volume drift across rapid back-to-back recordings.

Useful? React with 👍 / 👎.

@altic-dev
Copy link
Copy Markdown
Owner

@AndrewBeniston Thanks for the PR - seem useful to me :) But only ask would be to fix the Codex suggestions + Follow the PR template We have to help me review better. Also please attach some demo / pictures to show the working. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants