Music duck and keep-awake behaviour during dictation#325
Music duck and keep-awake behaviour during dictation#325AndrewBeniston wants to merge 3 commits intoaltic-dev:mainfrom
Conversation
Two new things you can turn on in Settings. Music During Transcription is now a picker (Leave Playing / Pause / Lower Volume) instead of a boolean. Lower Volume snapshots the system output volume, fades it down to 10% over 200ms, and fades it back up the same way when you stop. The fade ramp runs on a detached Task so the main actor isn't blocked between steps; a new fade cancels any in-flight one, which matters when the user releases the hotkey before the duck-down has finished and the fade-up has to take over cleanly. If the user's volume is already at or below the duck target, the duck is skipped entirely. Keep Mac Awake While Dictating holds an IOPMAssertion of type kIOPMAssertionTypePreventUserIdleDisplaySleep for the duration of a recording so the screen doesn't dim or lock mid-session. Released as soon as recording stops, including in the start() failure path. The volume restore fires the moment the audio engine is fully torn down, not at the end of stop() after final transcription completes. That way the volume snaps back when the user releases the hotkey rather than 1-3s later when the model finishes. Two new files: SystemVolumeController wraps the CoreAudio volume reads and writes (AudioObjectGet/SetPropertyData on the default output device's master volume scalar, with per-channel fallback for devices that don't expose master), SleepPreventionService wraps the IOPMAssertion. The legacy pauseMediaDuringTranscription Bool migrates to a new MediaBehaviorDuringTranscription enum on first read, and remains as a compatibility shim on the SettingsStore so the BackupService payload stays format-compatible — old backups restore as .pause or .none. The duck setting itself doesn't survive a backup/restore round trip, which seems fine since it's a UX preference rather than load-bearing config.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5685ffe5a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } else if self.mediaBehaviorDuringTranscription == .pause { | ||
| self.mediaBehaviorDuringTranscription = .none |
There was a problem hiding this comment.
Make media-backup restore override duck mode on false
When SettingsStore.restore(from:) applies payload.pauseMediaDuringTranscription, a false value will not clear .duck because the legacy setter only maps false to .none if the current mode is .pause. This makes restore behavior depend on pre-existing local state: importing a backup that intends “leave playing” can silently keep ducking enabled if the target machine is already on .duck. Restore should apply the backup deterministically regardless of current mode.
Useful? React with 👍 / 👎.
| let left = readChannelVolume(deviceID: deviceID, channel: 1) | ||
| let right = readChannelVolume(deviceID: deviceID, channel: 2) | ||
| switch (left, right) { | ||
| case let (l?, r?): return (l + r) / 2 |
There was a problem hiding this comment.
Preserve per-channel volume when ducking fallback devices
The duck snapshot stores only a single averaged scalar in the fallback path ((l + r) / 2), then restore writes that one value back to both channels. On output devices that do not expose master volume but do have distinct left/right levels (for example non-centered balance setups), one duck cycle permanently flattens channel balance. This should capture and restore per-channel values instead of collapsing to an average.
Useful? React with 👍 / 👎.
Two fixes for issues the automated reviewer flagged. 1. Backup restore: previously the legacy pauseMediaDuringTranscription bool's setter only mapped false to .none if the current mode was .pause, so restoring a backup that meant "leave playing" onto a machine currently set to .duck silently kept ducking. Restore now reads the new lossless mediaBehaviorDuringTranscription enum field from the payload when present, falling back to the bool only for backups created by older builds. The legacy bool's setter is also tightened so false always means .none — deterministic regardless of prior state. 2. Per-channel volume preservation: the duck snapshot path used to average L/R into a single scalar on devices without master volume, then write that average back to both channels on restore. One duck cycle would permanently flatten any non-centred balance. Replace the Float scalar in MediaSessionAction.ducked and the duck/restore API surface with a SystemVolumeSnapshot enum that holds either the master scalar or distinct L/R values, then restore the snapshot exactly at the end of the fade-up ramp so balance survives. Fade itself stays a scalar interpolation — the precise channel restore only happens on the final write, so the ramp's CPU profile is unchanged.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf40a9af80
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /// Re-applies a snapshot exactly. For master snapshots this writes the | ||
| /// master scalar; for per-channel snapshots this writes the original | ||
| /// left and right values independently, preserving stereo balance that | ||
| /// `setVolume(_:)` would otherwise have flattened. |
There was a problem hiding this comment.
Restore ducked volume on the same output device
restore(_:) always resolves defaultOutputDeviceID() at restore time, but the snapshot does not carry the device that was originally ducked. If the default output changes during dictation (for example, Bluetooth headphones disconnect/reconnect or the user switches outputs), the old device can remain stuck at the ducked level while restore is applied to a different device. This leaves a persistent user-visible volume mutation after recording stops.
Useful? React with 👍 / 👎.
| let pauseMediaDuringTranscription: Bool | ||
| /// Lossless capture of the unified media-behaviour enum (none / pause / | ||
| /// duck). Optional so that backups created by older builds (which only | ||
| /// wrote the legacy bool) still decode cleanly. New builds prefer this | ||
| /// field on restore and fall back to the bool only when it's nil. | ||
| let mediaBehaviorDuringTranscription: SettingsStore.MediaBehaviorDuringTranscription? | ||
| let vocabularyBoostingEnabled: Bool |
There was a problem hiding this comment.
Include keep-awake preference in backup payload
The new preventSleepDuringTranscription setting is exposed in UI and persisted locally, but it is not part of SettingsBackupPayload, so backup/restore silently drops the user’s choice and reverts to the default (true) after restore. Since this commit explicitly updates backup format for adjacent media settings, omitting this new toggle makes restore behavior inconsistent for users migrating machines.
Useful? React with 👍 / 👎.
Three changes to make the duck feel snappy rather than lagging behind
the start sound.
The duck was previously fired inside ASRService.start(), which is
called via Task { await asr.start() } after captureRecordingContext +
applyDictationShortcutSelectionContext + setActiveRecordingMode +
setOverlayMode have all run. That puts the duck ~80ms behind the
hotkey on a typical Mac, while the start sound (which is fired from
the same beginDictationRecording call site) becomes audible from
CoreAudio in ~30ms. So the user heard the sound first and the fade a
beat later.
1. Hoist the duck and start sound to the very top of
beginDictationRecording in ContentView. They now fire alongside
the hotkey press, before any of the pre-recording UI work. The
duck snapshot is threaded through to ASRService.start() as a
`preAppliedMediaAction` parameter so the existing in-start
duck-handling code adopts the action without re-firing it.
2. Snap the volume halfway down to the duck target SYNCHRONOUSLY
inside duckSystemVolume() before kicking off the detached fade
Task. CoreAudio property writes are sub-millisecond round trip,
so the audible drop now lands within ~1ms of the call returning,
bypassing both Task.detached scheduling latency (occasionally
10-30ms on Debug builds) and the fade ramp's first few steps where
each per-step volume change is too small to perceive. The detached
fade then smoothly lands the rest of the way, so the duck still
has a soft edge.
3. ASRService.start() restores the pre-applied media action on its
guard-failure paths (mic-not-authorized, already-running) so a
hotkey press that doesn't end up starting a session doesn't leave
the user's volume stuck at the duck target.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a83116966
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.activeFadeTask?.cancel() | ||
|
|
There was a problem hiding this comment.
Preserve original volume across overlapping duck sessions
Cancelling any in-flight fade before starting a new one can lock in a partially-restored volume as the new baseline. If a user starts a second recording while the previous restoreSystemVolume fade-up is still running (the 100ms window), duckSystemVolume() snapshots that transient level and startFade cancels the prior restore, so the next stop restores to the transient value instead of the true pre-session volume. This produces cumulative volume drift across rapid back-to-back recordings.
Useful? React with 👍 / 👎.
|
@AndrewBeniston Thanks for the PR - seem useful to me :) But only ask would be to fix the Codex suggestions + Follow the PR template We have to help me review better. Also please attach some demo / pictures to show the working. Thank you! |
What this adds
Two independent settings, both wired through the existing pause-media hooks in
ASRService. Either could be merged on its own; happy to split into two PRs if you prefer.Music During Transcription (replaces the existing pause toggle)
The current
Pause Media During Transcriptionboolean becomes a three-mode picker: Leave Playing, Pause (the existing behaviour, unchanged), or Lower Volume.Lower Volume snapshots the system output volume, fades it down to 10% over 200ms, and fades it back up the same way when you stop. The reason for the duck rather than the existing pause: psychologically it feels much nicer to have the music keep playing quietly. When the audio cuts entirely, the absence of background sound is jarring — a bit like in a film when the room tone drops out and you suddenly notice the silence. Dipping the volume to 10% keeps the ambience without competing with what you're saying into the mic. 10% turns out to be the sweet spot — quiet enough not to corrupt the transcript, loud enough that the room still feels alive.
Keep Mac Awake While Dictating
Holds an
IOPMAssertionof typekIOPMAssertionTypePreventUserIdleDisplaySleepfor the duration of a recording so the screen doesn't dim or lock mid-session. Released the instant recording stops. Useful for long-form dictation where the user is sitting still and the display would otherwise time out and trigger the screen-lock cascade.Defaults to
true. Doesn't cover lid-close sleep (that would needkIOPMAssertionTypePreventSystemSleepwith the lid-close option, which feels like overreach — happy to add it if you want).Implementation notes
SystemVolumeController(new) — thread-safe wrapper aroundAudioObjectGet/SetPropertyDataon the default output device's master volume scalar. Falls back to writing channels 1+2 individually for devices that don't exposekAudioObjectPropertyElementMain.SleepPreventionService(new) —IOPMAssertionCreateWithName/IOPMAssertionReleasewrapper.MediaPlaybackServicegainsduckSystemVolume()/restoreSystemVolume(previous:)and a unifiedrestore(from: MediaSessionAction)that handles either pause-resume or volume-restore at every exit path.Taskso the main actor isn't blocked between steps. A new fade cancels any in-flight one, which matters when the user releases the hotkey before the duck-down has finished and the fade-up needs to take over cleanly. If the user's volume is already at or below 10%, the duck is skipped entirely (would otherwise raise a quieter volume up to 10%, which is the wrong direction).stop()after final transcription completes. That way the volume snaps back when the user releases the hotkey rather than 1-3s later when the model finishes.Backwards compatibility
The legacy
pauseMediaDuringTranscriptionUserDefaults key migrates to a newMediaBehaviorDuringTranscriptionenum on first read. ThepauseMediaDuringTranscriptionBool getter/setter remains onSettingsStoreas a shim (true ↔.pause, false ↔.none) so theBackupServicepayload stays format-compatible without a schema bump. Old backups restore cleanly as.pauseor.none. The duck setting itself doesn't survive a backup/restore round trip — felt acceptable since it's a UX preference rather than load-bearing config, but happy to bump the backup schema if you'd rather.Testing
Built and tested locally on Apple Silicon (macOS 15) against
upstream/main. Verified manually:pmset -g assertions | grep FluidVoice)start()failure path (forced by denying mic mid-test)com.FluidApp.appbundle IDNot added: integration tests for either feature. Felt awkward to test system-volume mutation and
IOPMAssertionlifecycle in CI, but if you'd like me to attempt it I'll happily add them.