diff --git a/docs/audio-dsp-upgrade-plan.md b/docs/audio-dsp-upgrade-plan.md index 900401a817..25c955f229 100644 --- a/docs/audio-dsp-upgrade-plan.md +++ b/docs/audio-dsp-upgrade-plan.md @@ -2,436 +2,696 @@ ## Goal -Build a QLC-native, scientifically grounded audio analysis pipeline for RGB scripts and completely remove the bundled dependency on `LedFx` naming, helpers, and compatibility shims. +Replace QLC+'s minimal audio analysis (32 log bands + spectral-flux beat detector, mic-only) with a scientifically grounded pipeline that powers RGB scripts, Audio Trigger widgets, and AI-driven cue generation, and that is musically aware enough for EDM stage lighting. -The final state should feel like a modern VJ/audio-reactive engine: stable in quiet rooms, responsive on club systems, musically meaningful, frame-rate independent, and inspectable when it behaves oddly. +The pipeline must, in build priority order: + +1. **Live first, low-latency, best-possible.** Drive scripts, widgets, and MCP tools from rich live features computed on the audio thread with end-to-end onset latency under ~10 ms and a per-frame budget under ~1 ms per channel. This is the foundation the rest builds on. +2. Expose a single uniform `AudioFeatures` view so consumers don't care where the data came from. +3. Pre-analyze any audio file in the user's library and cache rich features (BPM, beat grid, key/chroma, multi-band envelopes, structural drops, spectral shape) — added once the live path is shipped and stable. +4. Identify what is currently playing on stage via one-shot acoustic fingerprinting, switching the same `AudioFeatures` view from live to cached values for richer (key-aware, structural) lighting. +5. Track the play position continuously via a tiered source — DJ-software protocols when wired, chromagram cross-correlation otherwise — so cached features replay in sync through DJ EQ, pitch shift, and tempo bend. + +Live analysis works on its own and is shippable independently. Cached, identified, and tiered position-tracked features each upgrade the same view incrementally. + +This project is young. **No backwards compatibility is required** — old XML, old per-script DSP, and the bundled `ledfx_compat.js` shim can all be removed. Simple and high quality wins over preservation. ## End State | Area | Final direction | | --- | --- | -| Core DSP | **C++ `AudioAnalyzer`** class in engine layer. All envelope, AGC, trigger, spectral feature computation in C++. Available to any consumer. | -| Audio Profiles | **Document-level `AudioProfile`** objects hold DSP configuration (bands, envelopes, AGC, triggers, noise gate). Multiple named profiles per project (e.g., "Kick Sensitive", "Ambient Smooth"). | -| Audio Trigger Widget | **VCAudioTrigger is the primary editor and live monitor** for Audio Profiles — not the owner of script audio config. Shows envelope curves, AGC meter, trigger state lamps, spectral features. | -| Script API | Scripts read pre-computed features from enriched `audio` object (`audio.bands.sub`, `audio.triggers.bass.fired`). JS does selection, not computation. Per-script DSP sliders removed; scripts select a profile and optionally apply lightweight mapping (intensity, band selection). | -| Shared JS files | Replace `ledfx_compat.js` with `RGBUtil` (color/map/noise helpers). No bundled script should call `LedFx.*`. | -| Visual helpers | `RGBUtil` namespace for color/map/noise. | -| Compatibility | Backward-compatible XML with schema versioning. Old per-script slider values converted to generated profile on load. Legacy per-bar triggers preserved alongside new per-band triggers. | -| Verification | Synthetic audio injection, deterministic feature tests, golden comparison tests, and live debug visualization. | - -## Current Ground Truth - -| Fact | Consequence | -| --- | --- | -| `rgbscriptv4.cpp` preloads `ledfx_compat.js`, then `audio_common.js` into one shared `QJSEngine`. | Loader order must change when `ledfx_compat.js` is removed. | -| 28 `audio*.js` scripts set `usesAudio = true`. | Migration must be scripted/audited, not hand-waved. | -| `rgbMap(width, height, rgb, step, audio)` receives `{ spectrum, volume, beat, bpm, maxMagnitude }`. | We already have a transport path; we should enrich it, not create a second one. | -| `audio.spectrum` is 32 log-spaced bands from 40 Hz to 5000 Hz, normalized per frame. | It is useful for spectral shape, not absolute loudness. | -| `audio.volume` is attack/release-smoothed signal power. | It is the better basis for AGC and global energy. | -| Current `LedFx.lows_power()`, `mids_power()`, `high_power()`, and `melbank_thirds()` split the log bands into equal thirds. | These helpers are the main primitive math to remove. | -| `AudioCapture` runs on its own QThread, emits `dataProcessed(double*, int, double, quint32)`. | Analyzer must receive richer internal data, not just the signal — `dataProcessed()` lacks raw RMS/peak/FFT bins. | -| `AudioCapture` has per-consumer band tracking via `registerBandsNumber(N)` with ref-counting. | Variable-band support must be preserved for VCAudioTrigger. | -| `VCAudioTrigger` does its own normalize/smooth/threshold in C++ (`slotSpectrumDataChanged()`). | Duplicates DSP that should live in the shared `AudioAnalyzer`. | -| All RGBMatrix instances share one `QJSEngine` (`s_jsThread->engine`). | Module-level JS state aliases across scripts. Per-script state must live in C++ channels. | -| `AudioCapture` computes true RMS internally but only emits smoothed `power`. | New analyzer needs raw RMS/peak before smoothing, not after. | +| Offline analysis | **Essentia** (AGPL-3.0 accepted) computes BPM, beat/bar grid, key, chroma, multi-band envelopes, structural drops, danceability, MFCC. Run once per file. | +| Identification | **Olaf** (GPL-3.0, native C) builds a constellation-hash index. Used **one-shot** at session start (and on suspected track change) to identify the current song and an initial play position. Olaf's tempo-shift brittleness is acceptable here because identification needs only one good match, not continuous lock. | +| Position tracking | A tiered `PositionSource` abstraction. Tier 1: DJ-software protocols (OS2L beat counter + cached beat grid, Pro DJ Link, StagelinQ). Tier 2: **chromagram cross-correlation** against cached chroma in a ±2 s window every ~5 s, with a small speed-search yielding tempo-deviation as a bonus. Tier 3: aubio onsets + internal clock when nothing else is wired. | +| Live analysis | **aubio** (GPL-3.0) computes onsets, tempo, multi-band on the live capture stream. Always-on; the only feature source when no track is identified. | +| Storage | A single SQLite database holds tracks, features, fingerprint hashes, and analyzer profiles. No per-file sidecars, no project-embedded blobs. | +| Decode | Qt 6 `QAudioDecoder` (FFmpeg backend) handles MP3/WAV/FLAC/OGG/M4A across platforms. ffmpeg is bundled with the app. | +| Core feature view | One `AudioFeatures` struct is read by all consumers. Computed live by `LiveAudioAnalyzer` or replayed from the cache by `CachedAudioAnalyzer`. | +| Audio Profiles | Document-level `AudioProfile` objects hold per-channel envelope/AGC/trigger configuration. Multiple named profiles per project. | +| Audio Trigger Widget | Rebuilt as the audio control center: library browser, recognition-lock badge, key indicator, drop/build lamps, perceptual band editor, envelope/AGC/trigger/spectral panels. | +| Scripts | Read enriched `audio` object (`audio.bands.*`, `audio.triggers.*`, `audio.music.bpm`, `audio.music.key`, `audio.events.drop`, `audio.match.locked`). No DSP in JS. | +| JS helpers | `RGBUtil` namespace for color/map/noise. `ledfx_compat.js` and `audio_common.js` deleted. | +| MCP | Tools for batch analysis, feature lookup, recognition state, and event subscription. | +| Verification | Synthetic injection tests, deterministic feature tests, FMA/Jamendo CC corpus benchmarks for BPM, key, drop accuracy and recognition lock latency. | + +## Architecture -## Scientific Audio Model +``` +┌────────────────────────────────────────────────────────────────────┐ +│ Offline (one-shot per file, batch over the user library) │ +│ │ +│ Audio file ──▶ QAudioDecoder ──▶ PCM │ +│ ├─▶ Essentia ──▶ Features+Chroma│ +│ └─▶ Olaf ─────▶ Hashes │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌──────────────────┐ │ +│ │ SQLite audio.db │ │ +│ └──────────────────┘ │ +└────────────────────────────────────────────────────────────────────┘ + +┌────────────────────────────────────────────────────────────────────┐ +│ Live │ +│ │ +│ Mic / line ──▶ AudioCapture ──▶ AudioFrame ──▶ LiveAnalyzer │ +│ (aubio + spectral)│ +│ │ +│ Identification (one-shot, ~5 s window, re-runs on drift/silence): │ +│ AudioFrame ──▶ Olaf ──▶ (track_id, initial position_ms) │ +│ │ +│ Position tracking (continuous, tiered, picks highest available): │ +│ ┌────────────────────────────────────────────────────────┐ │ +│ │ Tier 1 protocols (when wired) │ │ +│ │ OS2L listener ──▶ beat# + BPM │ │ +│ │ ProDJLink/StagelinQ ──▶ track + position │ │ +│ │ Tier 2 chromagram tracker │ │ +│ │ live chroma ⨯ cached chroma → position + speed │ │ +│ │ Tier 3 internal clock + aubio drift correction │ │ +│ └─────────────────────────┬──────────────────────────────┘ │ +│ ▼ │ +│ PositionSource │ +│ (priority + confidence + drift) │ +│ │ │ +│ ▼ │ +│ CachedAudioAnalyzer reads SQLite │ +│ at the locked position │ +│ │ │ +│ live ──┐ ▼ │ +│ └──▶ AudioFeatures (unified view) │ +│ │ │ +│ ┌──────────────┼───────────────┐ │ +│ ▼ ▼ ▼ │ +│ AudioProfile RGBMatrix VCAudioTrigger │ +│ (channels) (scripts) (UI / triggers) │ +└────────────────────────────────────────────────────────────────────┘ +``` -Treat audio analysis as a feature extraction pipeline with explicit units and stages. +## Library Responsibilities + +| Library | Mode | Computes | Why | +| --- | --- | --- | --- | +| **Essentia** | Offline | BPM (RhythmExtractor2013), beat/bar grid, key + scale (KeyExtractor / HPCP), 12-bin chroma at ~10 Hz, danceability, multi-band mel, spectral centroid/rolloff/flatness, MFCC, structural segmentation, drop candidates from novelty curve | Single library covers nearly all wanted features including the chroma needed for position tracking. AGPL accepted. | +| **Olaf** | Offline (index) + Live (one-shot ID) | Constellation-hash fingerprints; on the live side runs over a rolling ~5 s window to identify the track and produce an initial position estimate | Native C, embedded-friendly. Used only for identification, not continuous lock — Olaf's well-known brittleness to >3% time-stretch is acceptable when one good match is enough. | +| **Chromagram tracker** | Live | Cross-correlates a live chroma window against the cached chroma stream around the expected position, searching a small speed range (±10%) | Tempo-tolerant continuous position tracking. Runs at a low rate (~5 s cadence). Reuses Essentia code path on the live side for chroma. | +| **OS2L listener** | Live | Receives `beat`/`btn`/`cmd` JSON over TCP from VirtualDJ et al, advertised over Bonjour | Exact beat phase + BPM with no microphone latency, when a supported DJ app is running. Already partly supported in QLC+ v4. | +| **aubio** | Live | Real-time onsets, tempo, pitch, multi-band — sub-10 ms latency | Always-on live feature source; sole driver when no track is identified. Mature C API. | +| **Qt Multimedia** | Offline + Live | MP3/WAV/FLAC/OGG/M4A decoding via FFmpeg | Already in tree. ffmpeg bundled with the app for cross-platform consistency. | +| **SQLite** (Qt `QSqlDatabase` with `QSQLITE` driver) | Storage | Tracks, features, chroma, fingerprint hashes, profiles | Already a transitive Qt dependency. Single file. Inspectable. | + +## Single SQLite Schema (`~/.local/share/qlcplus/audio.db`) + +```sql +-- One row per analyzed file +CREATE TABLE tracks ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + path TEXT NOT NULL, -- absolute path or library-relative + sha256 TEXT NOT NULL UNIQUE, -- content hash for re-detection across moves + duration_ms INTEGER NOT NULL, + sample_rate INTEGER NOT NULL, + channels INTEGER NOT NULL, + title TEXT, + artist TEXT, + bpm REAL, + key TEXT, -- e.g., "F# minor" + danceability REAL, + analyzed_at INTEGER NOT NULL, -- unix epoch + analyzer_version INTEGER NOT NULL +); +CREATE INDEX tracks_sha ON tracks(sha256); +CREATE INDEX tracks_path ON tracks(path); + +-- Per-frame features (one row per ~23 ms frame) +CREATE TABLE feature_frames ( + track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE, + frame_index INTEGER NOT NULL, + time_ms INTEGER NOT NULL, + rms_db REAL, + centroid_hz REAL, + rolloff_hz REAL, + flatness REAL, + flux REAL, + bands_blob BLOB, -- 5 perceptual band floats, packed + PRIMARY KEY (track_id, frame_index) +); +CREATE INDEX ff_time ON feature_frames(track_id, time_ms); + +-- Discrete beat events +CREATE TABLE beats ( + track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE, + time_ms INTEGER NOT NULL, + beat_index INTEGER NOT NULL, -- position in track + bar_index INTEGER, -- nullable until downbeats settled + confidence REAL, + PRIMARY KEY (track_id, beat_index) +); +CREATE INDEX beats_time ON beats(track_id, time_ms); + +-- Onsets (kicks, snares, etc.) for fast trigger replay +CREATE TABLE onsets ( + track_id INTEGER NOT NULL, + time_ms INTEGER NOT NULL, + band TEXT NOT NULL, -- 'sub'|'bass'|'lowMid'|'mid'|'high' + strength REAL, + PRIMARY KEY (track_id, time_ms, band) +); + +-- Structural events (build start, drop, breakdown, outro) +CREATE TABLE structural_events ( + track_id INTEGER NOT NULL, + time_ms INTEGER NOT NULL, + kind TEXT NOT NULL, -- 'build'|'drop'|'break'|'outro' + confidence REAL, + PRIMARY KEY (track_id, time_ms, kind) +); + +-- Olaf fingerprint hashes (used only for one-shot identification) +CREATE TABLE fingerprints ( + hash INTEGER NOT NULL, -- Olaf 64-bit packed hash + track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE, + time_ms INTEGER NOT NULL, + PRIMARY KEY (hash, track_id, time_ms) +); +CREATE INDEX fp_hash ON fingerprints(hash); + +-- 12-bin chroma at ~10 Hz, used by the chromagram position tracker. +-- Stored as one BLOB row per track (12 floats * frames). Easier to mmap as +-- a contiguous matrix than per-frame rows for cross-correlation queries. +CREATE TABLE chroma ( + track_id INTEGER PRIMARY KEY REFERENCES tracks(id) ON DELETE CASCADE, + frame_rate_hz REAL NOT NULL, -- typically 10 + frame_count INTEGER NOT NULL, + matrix_blob BLOB NOT NULL -- 12*frame_count floats, row-major +); + +-- AudioProfile DSP configuration (replaces XML embedding for shareability; +-- still mirrored to .qxw for project portability) +CREATE TABLE audio_profiles ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + name TEXT NOT NULL, + is_default INTEGER NOT NULL DEFAULT 0, + config_json TEXT NOT NULL -- AudioChannelConfig serialized +); +``` -### C++ Analysis Layer +The `feature_frames` BLOB packs 5 little-endian floats (sub/bass/lowMid/mid/high). Per-track storage at 23 ms frames is ~30 kB / minute, ~150 MB for 5000 four-minute tracks — acceptable on modern disks. -Add or evolve a native analyzer around `AudioCapture` so the engine computes stable, reusable audio features once per frame. +## AudioFeatures (Unified View) -**Critical:** The analyzer must receive raw data from inside `AudioCapture`, not just the `dataProcessed()` signal. `dataProcessed()` only emits 32 log-spaced band magnitudes plus smoothed power — this is insufficient for accurate RMS, peak, spectral centroid, rolloff, or flatness. An internal `AudioFrame` struct passes raw time-domain stats and FFT bin data. +```cpp +struct AudioFeatures { + // Source + enum Source { Live, Cached } source; + qint64 trackId = -1; // valid when source == Cached + double positionMs = 0; // 0 when source == Live + + // Loudness + float rmsDb; + float peakDb; + float crestFactor; + + // Spectrum + std::array bandsLog; + std::array bandsDb; + std::array bandsNormalized; + + // Perceptual bands (post-envelope, AGC-adjusted) + struct PerceptualBands { + float sub, bass, lowMid, mid, high; + } bands; + + // Spectral shape + float spectralFlux; + float spectralCentroidHz; + float spectralRolloffHz; + float spectralFlatness; + + // Music + struct Music { + float bpm; + float beatPhase01; // 0..1 within current beat + float barPhase01; // 0..1 within current bar (cached only) + int beatIndex; // -1 live + int barIndex; // -1 live + float beatConfidence; + QString key; // e.g. "F# minor", empty live + float keyConfidence; + } music; + + // Discrete events for the current frame + struct Events { + bool onset; + bool beat; + bool drop; // structural drop + bool buildStart; + bool breakStart; + } events; + + // Recognition state + struct Match { + bool locked; + qint64 trackId; + double positionMs; + float confidence; + float driftMs; // estimated drift between live and locked timeline + } match; + + // Frame timing + quint64 frameIndex; + double audioDtMs; +}; +``` + +`AudioFeatures` is produced by `LiveAudioAnalyzer` or `CachedAudioAnalyzer` and consumed by `AudioChannel`s, scripts, and the widget. + +## C++ Analysis Layer -#### Internal Frame (AudioCapture → AudioAnalyzer) +### Internal Frame (AudioCapture → analyzers) ```cpp struct AudioFrame { - const float* mono; // time-domain mono samples + const float* mono; int sampleCount; - const float* fftMagnitudes; // raw FFT bin magnitudes + const float* fftMagnitudes; int fftBins; int sampleRate; - double rms; // raw RMS (before any smoothing) - double peak; // raw peak amplitude + double rms; + double peak; quint64 frameIndex; - double dtMs; // time since previous audio frame + double dtMs; }; ``` -#### Shared Features (computed once per audio frame) +### LiveAudioAnalyzer (aubio + Essentia-online subset) -| Feature | Why it matters | -| --- | --- | -| `rmsDb` | Absolute loudness in dBFS for noise gates, AGC, and confidence. Computed from raw RMS, not legacy smoothed `power`. | -| `peakDb` | Clipping/transient awareness. | -| `crestFactor` | Distinguishes punchy transients from dense sustained material. | -| `bandsLog[32]` | Existing log-spaced spectrum, exposed with clear frequency metadata. | -| `bandsDb[32]` | Spectrum in dB, useful for thresholds and calibrated gates. | -| `bandsNormalized[32]` | Visual-friendly normalized spectrum for bars and matrices. | -| `perceptualBands` | Sub/bass/lowMid/mid/high grouped from log bands. | -| `spectralFlux` | Onset strength and buildup/drop detection. | -| `spectralCentroidHz` | Brightness/timbre feature. | -| `spectralRolloffHz` | Energy distribution feature. | -| `spectralFlatness` | Noise-like vs tonal material. | -| `beat`, `bpm`, `beatConfidence`, `beatPhase` | Musically stable beat-driven effects. | -| `noiseFloorDb` | Adaptive silence/noise gating. | -| `audioDtMs` | Time since previous audio frame (for envelope/trigger timing). | - -#### Per-Consumer Channels (`AudioChannel`) - -Each consumer creates a channel with its own configuration for envelopes, AGC, and triggers. Channel state is owned by `AudioAnalyzer` and updated on the audio thread. +Runs on the audio thread. Computes shared spectrum/loudness features, feeds aubio for onsets/tempo, and produces a rolling 12-bin chroma at ~10 Hz for the position tracker. Emits `AudioFeatures` with `source = Live`, `match.identified = false` initially. + +### AudioIdentifier (Olaf, one-shot) + +Maintains a rolling ~5 s sample buffer. On request — at session start, on chromagram tracker drift exceeding ~2 s, on extended low-confidence position, or every 60 s as a sanity check — it runs Olaf against the SQLite fingerprint index and returns `(track_id, initial_position_ms, confidence)` or `none`. Runs on a worker thread; never on the audio thread. Track changes detected here invalidate the current position source and re-seed Tier 2. + +### PositionSource (tiered abstraction) + +Single object that all consumers read. Holds the highest-priority source currently confident, plus a `lastUpdateMs` for staleness checks. Tiers, in priority order: + +| Tier | Source | Provides | Priority condition | +| --- | --- | --- | --- | +| 1a | Pro DJ Link / StagelinQ | `track_id`, `position_ms`, `bpm` | Reachable on LAN, reporting active deck | +| 1b | OS2L | `beat#`, `bpm`, `change` flag | TCP connection alive; needs Tier-2 or AudioIdentifier-supplied `track_id` to bind beat# to song timeline | +| 2 | ChromaPositionTracker | `position_ms`, `speed_factor` | Track identified by AudioIdentifier; correlation peak above threshold | +| 3 | Aubio + internal clock | drifting `position_ms` only | Always available | + +Handoff is automatic. Each tier publishes `confidence` and `staleness`; the source picks the highest-priority tier whose values are fresh and confident. Per-source latency offsets are configurable and calibrated by cross-correlating Tier 1/2 timestamps against onset events from Tier 3. + +### ChromaPositionTracker + +Given `(track_id, expected_position_ms)`, slices a window of cached chroma around the expected position (default ±2 s) and the current ~5 s of live chroma. Computes normalized cross-correlation across a small grid of speed factors (0.92, 0.94, …, 1.08). Returns the (offset_ms, speed_factor, peak_value) maximum. Updates at ~5 s cadence; ~5–10 ms CPU per query (12-bin chroma, 100 frames vs ~50 frames live, on a worker thread). + +### OS2LPositionSource + +Listens on TCP for OS2L `beat` events. Once the AudioIdentifier has supplied a `track_id`, the cached beat grid lets us map an incoming `beat#` to a `position_ms`. If the OS2L client provides a beat-zero anchor relative to the song start (VirtualDJ does, via the `change:true` flag and known beat counter semantics), use it directly; otherwise calibrate by aligning the first OS2L beat to the nearest cached beat above the AudioIdentifier's reported position. + +### CachedAudioAnalyzer + +Given the current `PositionSource` value, queries SQLite for surrounding feature frames, beats, onsets, and structural events, and reconstructs an `AudioFeatures` view with `source = Cached`. When `speed_factor != 1.0` from the chroma tracker, beats and onsets are time-warped accordingly so triggers fire on the right musical moments. Falls back to `source = Live` when no track is identified. + +### AudioChannel (per-profile) + +Same handle-based API as in the prior plan. Reads `AudioFeatures`, applies envelope/AGC/triggers per `AudioChannelConfig`, exposes immutable `AudioSnapshot`. ```cpp struct AudioChannelConfig { - EnvelopeConfig envelope; // attackMs, releaseMs per band - AgcConfig agc; // maxGainDb, releaseMs, noiseFloorDb - TriggerConfig triggers; // thresholds, hysteresis, cooldownMs, holdMs - BandLayout bandLayout; // which bands to track (default: perceptual 5) - VolumeConfig volume; // smoothing for volume meter + EnvelopeConfig envelope; // attack/release per band, ms + AgcConfig agc; // maxGainDb, releaseMs, noiseFloorDb + TriggerConfig triggers; // Schmitt thresholds, hold, cooldown + BandLayout bandLayout; // perceptual 5 by default + VolumeConfig volume; }; -// Handle-based API (thread-safe, no raw pointer exposure) -AudioChannelHandle handle = analyzer->createChannel(config); -handle.updateConfig(newConfig); // atomic pending config, applied at next frame boundary -AudioSnapshot snap = handle.snapshot(); // short copy, lock-free or short read lock -handle.close(); // safe deferred unregister +AudioChannelHandle h = analyzer->createChannel(config); +h.updateConfig(newConfig); // applied at next frame boundary +AudioSnapshot snap = h.snapshot(); +h.close(); ``` -**Snapshot** (immutable value object, returned per channel): +### AudioLibraryIndexer (offline) -| Field | Meaning | -| --- | --- | -| `bands` | Smoothed, gain-adjusted perceptual bands (sub/bass/lowMid/mid/high + aliases). | -| `spectrum` | Processed spectrum for bars, waves, and matrices. | -| `triggers` | Per-band trigger state: `value`, `active`, `firedThisFrame`, `releasedThisFrame`, `heldMs`, `cooldownRemainingMs`. | -| `volume` | Raw, smoothed, normalized, and AGC-adjusted loudness. | -| `music` | BPM, beat phase, beat confidence, bar phase. | -| `audioDtMs` | Audio frame delta for envelope timing. | +Walks a directory tree, runs each new file through Essentia + Olaf in a `QThreadPool`, writes results to SQLite in a single transaction per track. Skips files whose `sha256` already exists. Reports progress to UI. -#### Trigger State Machine +## Perceptual Bands -Triggers follow a precise state machine to avoid ambiguity across consumers: +Five named groups, mapped from the existing 32 log-spaced bands and verified against sine sweeps. Names and ranges live in a single header constant, not in scripts. -| Field | Semantics | -| --- | --- | -| `value` | Current smoothed input value (0..1). | -| `active` | True while above low threshold (Schmitt hysteresis). | -| `firedThisFrame` | True on the frame where `active` transitions from false → true. Frame-stable (not consumed on read). | -| `releasedThisFrame` | True on the frame where `active` transitions from true → false. | -| `heldMs` | How long `active` has been true continuously. | -| `cooldownRemainingMs` | Time until next `firedThisFrame` can occur. | +| Group | Index range | Frequency intent | +| --- | --- | --- | +| `sub` | 0..8 | Kick fundamental, sub pressure | +| `bass` | 9..12 | Bass body | +| `lowMid` | 13..18 | Warmth, body | +| `mid` | 19..25 | Vocals, snare body | +| `high` | 26..31 | Hats, clap edge | -### JS Layer (minimal) +## JS Layer -JS scripts read pre-computed features. No DSP computation in JS. - -| API | Purpose | -| --- | --- | -| `RGBUtil.rgb(r, g, b)` | Color packing (replacing `LedFx.rgb`). | -| `RGBUtil.hsv2rgb(h, s, v)` | HSV conversion. | -| `RGBUtil.createMap(w, h)` | Pixel map allocation. | -| `RGBUtil.interpolate(a, b, t)` | Linear interpolation. | -| `RGBUtil.simplex2d(x, y)` | Simplex noise. | -| `RGBUtil.noiseField2d(...)` | Noise field generation. | -| `AudioDSP.Filter(decay, rise)` | Optional JS-side ExpFilter matching `LedFx.ExpFilter` shape for scripts that need per-pixel smoothing. | - -Scripts access pre-computed audio features directly: +JS consumes pre-computed values. No DSP in JS. ```javascript function rgbMap(width, height, rgb, step, audio) { var sub = audio.bands.sub; var fired = audio.triggers.bass.firedThisFrame; - var vol = audio.volume.agc; - // ... use values directly, no DSP needed + if (audio.events.drop) { /* hard hit */ } + if (audio.match.locked) { + var key = audio.music.key; // e.g. "F# minor" + var bar = audio.music.barPhase01; + } } ``` -## Perceptual Bands +| API | Purpose | +| --- | --- | +| `RGBUtil.rgb(r,g,b)` | Color packing | +| `RGBUtil.hsv2rgb(h,s,v)` | HSV conversion | +| `RGBUtil.createMap(w,h)` | Pixel map allocation | +| `RGBUtil.interpolate(a,b,t)` | Linear interpolation | +| `RGBUtil.simplex2d(x,y)` | Simplex noise | +| `RGBUtil.noiseField2d(...)` | Noise field | +| `AudioDSP.Filter(decayMs, riseMs)` | Optional per-pixel exponential smoothing | -The existing 32 QLC+ spectrum bands are already logarithmic. Group them by musical purpose rather than equal thirds. +`ledfx_compat.js` and `audio_common.js` are deleted. -| Group | Index range | Approximate intent | -| --- | --- | --- | -| `sub` | `0..8` | Kick fundamental, sub pressure, low-end movement. | -| `bass` | `9..12` | Bass body and low toms. | -| `lowMid` | `13..18` | Warmth, body, mud. | -| `mid` | `19..25` | Vocals, synth body, snare body. | -| `high` | `26..31` | Hats, clap edge, snare snap, brightness. | +## VCAudioTrigger (Full Rewrite) + +The widget is rebuilt as the audio control center. The live-data panels (bands, envelope, AGC, triggers, spectral) ship in M4. The library browser, recognition badge, drop/build/key indicators, and position-source picker are added incrementally in M6–M9 — they sit in the same chrome but stay greyed out until their backend lands. Layout: + +``` +┌────────────────────────────────────────────────────────────┐ +│ [Profile ▼] [● Live] [♪ "Strobe" — 1:23.4 lock 92%] │ ← header +├────────────────────────────────────────────────────────────┤ +│ ┌──────────┐ Bands Envelope AGC Triggers Spectral │ ← tabs +│ │ Library │ │ +│ │ ▸ Folder │ ┌── live monitor strip ─────────────────┐ │ +│ │ ▸ Track │ │ [sub bass lowMid mid high] │ │ +│ │ 3 unan │ │ envelope curves · AGC gain · lamps │ │ +│ │ Index ▶ │ └───────────────────────────────────────┘ │ +│ └──────────┘ ┌── tab content ────────────────────────┐ │ +│ │ band edges, envelope sliders, etc. │ │ +│ └───────────────────────────────────────┘ │ +├────────────────────────────────────────────────────────────┤ +│ Drop ◯ Build ◯ Break ◯ Key: F# min BPM: 128 Bar: 3.4 │ ← status +└────────────────────────────────────────────────────────────┘ +``` + +Panels: + +- **Library** — browse and index a folder of audio files. Shows progress, sha mismatches, missing files. "Re-analyze" per track. +- **Bands** — perceptual band edge editors with frequency labels. +- **Envelope** — per-band attack/release sliders, live mini-graph. +- **AGC** — max gain dB, release ms, noise floor dB, live gain meter. +- **Triggers** — per-band Schmitt thresholds, hold/cooldown ms, lamp + fires/sec. +- **Spectral** — live centroid, rolloff, flatness readouts. +- **Status strip** — drop/build/break lamps, key indicator, BPM, bar phase. +- **Recognition badge** — green when `match.locked`, shows track name and position. -These ranges should be verified against generated sine sweeps and then encoded as named constants, not hidden magic numbers. +The legacy per-bar trigger UI is removed. Per-band Schmitt triggers replace it. ## Math Standards | Area | Requirement | | --- | --- | -| Time constants | Use `alpha = 1 - exp(-dtMs / tauMs)`. No frame-count fade math in final scripts. | -| AGC | Use dB or volume envelope, not per-frame normalized spectrum RMS. Include max gain, release time, and noise gate. | -| Triggers | Use adaptive baseline, Schmitt hysteresis, hold time, and refractory/cooldown. Return one-shot and gated states separately. | -| Onsets | Use positive spectral flux with adaptive threshold and minimum interval. | -| Compression | Use soft-knee compression or saturating curves before mapping to brightness. | -| Silence | Gate low-confidence frames using `volume`, `rmsDb`, or `maxMagnitude`, so noise does not become visuals. | -| Units | Store constants as ms, dB, Hz, or normalized `0..1`; avoid unlabeled slider math. | +| Time constants | `alpha = 1 - exp(-dtMs / tauMs)`. No frame-count fade math. | +| AGC | dB-domain envelope, capped max gain, configurable noise floor. | +| Triggers | Adaptive baseline + Schmitt hysteresis + hold + cooldown. | +| Onsets | aubio default (HFC or specdiff) live; Essentia OnsetDetectionGlobal offline. | +| Beat | aubio tempo live; Essentia RhythmExtractor2013 offline (more accurate, slower). | +| Key | Essentia KeyExtractor with EDMA profile. | +| Drop | Essentia novelty curve + percussive/tonal split + low-frequency energy ramp. Heuristic: rising flux + rising rms over 4–16 bars, then sudden full-band energy after a partial silence. | +| Compression | Soft-knee compression bounded to `[0, 1]`. | +| Silence | Gate using `rmsDb` and `match.confidence`. | +| Units | ms, dB, Hz, normalized 0..1. No bare slider numbers. | -## Removed Draft Helpers +## Design Decisions -The exploratory helpers previously added to `AudioParams` were removed before any bundled script depended on them. +### DD1. Live features are the foundation, low-latency is non-negotiable -| Removed helper | Why it was removed | Replacement direction | -| --- | --- | --- | -| `AudioParams.adaptiveGain(algo, spectrum)` | It used RMS of `audio.spectrum`, but QLC+ normalizes `audio.spectrum` per frame, so the value mostly describes spectral shape rather than real loudness. | C++ `AudioChannel` AGC using raw `rmsDb` from `AudioFrame`, noise floor, and capped gain. | -| `AudioParams.logScaleBands(spectrum)` | It assumed linear FFT bins, while C++ already provides log-spaced bands. The chosen ranges also made low too narrow and high too broad. | C++ `AudioAnalyzer` perceptual band grouping with verified frequency ranges. | -| `AudioParams.frameNormalizedDecay(decayMs, frameMs)` | It returned an interpolation alpha, not a decayed value. The name invited misuse in ported scripts. | C++ `AudioChannel` envelope smoothing using `audioDtMs` and `alpha = 1 - exp(-dt/tau)`. | -| `AudioParams.softSaturate(value, threshold)` | It could return values above `1.0`, which is ambiguous for normalized brightness helpers. | C++ soft-knee compression in `AudioChannel` with documented output range `[0, 1]`. | -| `AudioParams.hysteresisTrigger(algo, state, value)` | It used static thresholds and returned only gate state, not one-shot edges. | C++ trigger state machine in `AudioChannel` with Schmitt hysteresis, hold, cooldown, `firedThisFrame`, and `active`. | +The live path ships first and works on its own. Targets: onset detection latency <10 ms (audio buffer 256 samples at 48 kHz, FFT hop 256, aubio default), per-frame analyzer budget <1 ms per `AudioChannel`, no heap allocation in the hot path, no locks held while user-thread code runs. Cached/identified features are upgrades to the same `AudioFeatures` view — they don't replace the live path, they enrich it. -Keep `AudioParams` focused on existing UI parameter plumbing until the new `AudioDSP` API lands. +### DD2. Single SQLite database +One file at `~/.local/share/qlcplus/audio.db`. Holds tracks, features, fingerprints, and `AudioProfile` configs. No per-file sidecars, no `.qxw` blob inflation. -## Design Decisions (from four rounds of critique) +### DD3. Identification one-shot, position via tiered source -These decisions were resolved after structured critique: Opus 4.7 (plan v1), GPT 5.5 (C++ DSP review), Opus 4.7 (AudioTrigger-as-hub), and GPT 5.5 (AudioProfile decoupling). +Olaf is used **only for identification**, not continuous lock. It runs in a worker thread over a rolling ~5 s window, on demand or every ~60 s, and reports `(track_id, initial_position_ms, confidence)`. Continuous position is owned by a `PositionSource` abstraction with three tiers: (1) DJ-software protocols (OS2L, Pro DJ Link, StagelinQ) when wired; (2) chromagram cross-correlation against cached chroma, ±10% speed search; (3) aubio + internal clock fallback. The highest-priority confident-and-fresh tier wins. This sidesteps Olaf's tempo-shift brittleness because Olaf only has to find one good match, not stay locked through a DJ pitch-bend. -### DD1. Audio DSP configuration lives in document-level Audio Profiles +### DD4. AGPL-3.0 acceptable -Audio Profiles are the configuration abstraction. They are document-model objects (like Functions or FixtureGroups) — they exist independent of the Virtual Console. VCAudioTrigger is the **primary editor and live monitor** for profiles, not the owner. +The combined binary becomes AGPL-3.0 once Essentia is linked. The repository LICENSE and About box are updated. Optional build flag `-Daudio_essentia=OFF` exists for downstream redistributors who must avoid AGPL — those builds lose offline analysis but keep live aubio. -Multiple named profiles per project (e.g., "Default", "Kick Sensitive", "Ambient Smooth"). Functions reference profiles by ID. Profiles can exist without visible VC widgets. +### DD5. Document-level Audio Profiles -### DD2. VCAudioTrigger is the Audio Control Center UI +Profiles live in the SQLite `audio_profiles` table. Functions reference profiles by ID. The `.qxw` file stores profile IDs only; the project is portable when shipped alongside `audio.db` (or features are re-indexed on import). -VCAudioTrigger edits and monitors Audio Profiles. It gains: perceptual band editor, envelope monitor, AGC meter, trigger state lamps, spectral feature readouts. Multiple AudioTrigger widgets can exist — each monitors/edits a profile. +### DD6. VCAudioTrigger is the audio control center -### DD3. Core DSP in C++ `AudioAnalyzer`, configured by Audio Profiles +Single widget for editing/monitoring profiles, browsing the library, and watching recognition state. Multiple instances may coexist; each binds to one profile. -Envelope, AGC, trigger, and spectral feature computation lives in a C++ `AudioAnalyzer` in the engine layer. Each Audio Profile owns one `AudioChannelHandle`. The analyzer is consumer-agnostic; profiles are the configuration owner. +### DD7. Functions reference profiles, not widgets -### DD4. AudioAnalyzer receives internal `AudioFrame`, not `dataProcessed()` signal +`RGBMatrix` has an `audioProfileId` property. Resolution chain: explicit → default-flagged → first → anonymous fallback (analyzer creates an internal default channel). -`dataProcessed(double*, int, double, quint32)` only emits 32 log-spaced band magnitudes plus smoothed power — insufficient for accurate RMS, peak, centroid, rolloff, or flatness. The analyzer receives an internal `AudioFrame` (raw mono samples, FFT bins, raw RMS, peak, sample rate, dtMs) directly from inside `AudioCapture::processData()`. +### DD8. Five perceptual bands replace low/mid/high -### DD5. Functions reference Audio Profiles, not VC widgets +`sub`, `bass`, `lowMid`, `mid`, `high`. Names are configurable in the profile but defaults are verified against sine sweeps. -`RGBMatrix` has an `audioProfileId` property — NOT a widget ID. This preserves the architectural boundary: Functions are VC-independent, runnable headless, importable across projects. Resolution chain: +### DD9. Per-script DSP sliders deleted -1. **Explicit reference**: `audioProfileId` set → use that profile's channel. -2. **Default profile**: If no explicit reference, use the profile flagged `isDefault`. -3. **First found**: If no default flagged, use the lowest-ID profile. -4. **Anonymous fallback**: If no profiles exist, the analyzer creates an internal default channel with sensible defaults. +`AudioParams` retains only non-DSP plumbing. RGBMatrix editor shows: profile selector, intensity scale (0–200%), "Edit profile…" button. -The fallback is a safety net, not the main UX. When an audio script is first added and no profile exists, auto-create a "Default Audio" profile in the document. +### DD10. Per-consumer state via AudioChannel handles -### DD6. Perceptual bands (sub/bass/lowMid/mid/high) replace low/mid/high +Atomic queued config updates applied at frame boundary. Snapshots are immutable value copies. -Five perceptual bands plus convenience aliases (`low = sub+bass`, `mid` unchanged, `high` unchanged) for backward compatibility. Band edges are configurable in the profile/widget UI with verified defaults. Legacy `lowCutBin`/`highCutBin` preserved as read-only derived values. +### DD11. Two `dtMs` values -### DD7. Per-script DSP sliders removed; lightweight mapping controls remain +`audioDtMs` (audio frame) for envelope/AGC/trigger timing. `consumerDtMs` (MasterTimer tick) exposed to JS for visual pacing. -`AudioParams` no longer exposes `gain`, `reactivity`, `floor`, `sensitivity` as DSP controls. Instead, the RGBMatrix editor shows: +### DD12. Trigger state machine — frame-stable -- **Audio Profile selector**: dropdown of available profiles (+ "Create new…") -- **Intensity scale**: lightweight post-DSP brightness multiplier (0–200%) -- **"Edit Audio Profile…"** button: opens the AudioTrigger/Profile editor +`value`, `active`, `firedThisFrame`, `releasedThisFrame`, `heldMs`, `cooldownRemainingMs`. All fields stable across reads in the same frame. -Old per-script slider values are **converted to a generated profile** on XML load (not silently ignored). Deprecated stubs log a one-shot warning for one release cycle. +### DD13. AudioAnalyzer on audio thread, fixed budget -### DD8. Per-consumer state via AudioChannel handles, atomic config updates +No heap allocation per frame. <1 ms per channel. Olaf streaming matcher must also stay <2 ms per frame; if not, move to a dedicated low-priority thread. -Each Audio Profile holds an `AudioChannelHandle`. Config changes from QML are queued via `handle.updateConfig(newConfig)` and applied at the next audio frame boundary. State is owned by `AudioAnalyzer`, exposed via immutable `AudioSnapshot` value objects. +### DD14. No backwards compatibility -### DD9. Two `dtMs` values +`ledfx_compat.js`, `audio_common.js`, legacy per-bar triggers, old `AudioParams` slider semantics, and `BeatTracker` are deleted in M7. No migration code, no XML version shims, no compatibility warnings. The project is young enough to absorb the break. -- `audioDtMs`: time since previous audio frame (~23ms at 43Hz). Used for envelope/AGC/trigger timing in C++. -- `consumerDtMs`: time since previous consumer frame (MasterTimer tick). Available to JS for visual animation pacing. +### DD15. Test corpus from FMA / Jamendo CC -### DD10. Don't layer on legacy smoothed `power` +EDM-leaning Creative Commons tracks pulled into `tests/audio/corpus/` (gitignored, fetched by a script). Used for BPM/key/drop accuracy benchmarks and recognition-lock latency measurements. -New analyzer computes raw RMS/peak from `AudioFrame` and applies its own documented smoothing. Legacy `volume` preserved for backward compat but not used as input to new AGC. +### DD16. Optional Essentia, mandatory aubio + Olaf -### DD11. Legacy per-bar triggers preserved alongside new per-band triggers +Essentia is the only AGPL dependency. aubio (GPL-3) and Olaf (GPL-3) are required at build time. Essentia is gated by `-Daudio_essentia=ON` (default ON). Without Essentia, `analyzed_at`-style fields and structural events stay null; live analysis is unaffected. -Two trigger systems coexist: +### DD17. Decode via Qt's FFmpeg backend, ffmpeg bundled -- **Per-bar triggers** (legacy): per-spectrum-bar DMX/Function/Widget actions with min/max thresholds. -- **Per-band triggers** (new): perceptual-band Schmitt-hysteresis triggers with hold/cooldown, consumed by scripts via `audio.triggers.*`. +Qt 6.5+ MP3 support is provided by FFmpeg. The macOS/Windows installers bundle FFmpeg shared libs via `macdeployqt`/`windeployqt`. On Linux we link against system FFmpeg and document the dependency. -Old bar triggers are not collapsed into five bands. Users opt into new trigger model by editing the profile. +### DD18. Live-features-first sequencing -### DD12. Trigger state machine — frame-stable, not consumed-on-read +The first milestones build the live analyzer, profile/channel system, RGBMatrix wiring, scripts, and VCAudioTrigger UI on **live audio only**. This is shippable on its own and gives users a noticeably better audio-reactive engine before any cached/identified/positioned work begins. Offline analysis (M6), identification (M7), Tier-2 chromagram tracking (M8), and Tier-1 DJ protocols (M9) each plug behind the unchanged `AudioFeatures` view. -Triggers expose `value`, `active`, `firedThisFrame`, `releasedThisFrame`, `heldMs`, `cooldownRemainingMs`. All fields are frame-stable so multiple consumers can read the same state. +### DD19. Tiered PositionSource with calibrated latency offsets -### DD13. Backward-compatible XML with schema versioning +`PositionSource` exposes the highest-priority confident tier. Each tier publishes `confidence` and `lastUpdateMs`; staleness or confidence drop demotes it. Per-source latency offsets are configured in the UI and self-calibrated by cross-correlating Tier-1/2 timestamps against Tier-3 onsets, since DJ-software clocks run ahead of the room PA's actual audio by 5–100 ms. -Audio Profile XML uses a `Version` attribute. Old documents load with default profile configs. Schema version stored in XML — no hidden "has edited" tracking state. Old per-script slider values converted to generated profile on load. +### DD20. Fingerprint engine is replaceable -### DD14. AudioAnalyzer on audio thread with budget constraint +`AudioIdentifier` is an interface; the default implementation is Olaf. A Panako backend ships as a build option for environments where DJ pitch-bend during the identification window is common (Panako handles ±10% time-stretch but pulls in a JVM). Users can switch backends in settings without re-indexing their library if both backends have indexed. -No heap allocation per frame, fixed-size arrays. Budget <1ms per channel. Instrumented from day one. +### DD21. Live latency target -### DD15. Golden tests gate the migration +End-to-end onset latency target: <10 ms (input buffer + FFT hop + analyzer + signal emit). Per-`AudioChannel` analyzer budget: <1 ms. Per-`AudioAnalyzer` shared-feature budget: <0.5 ms. No heap allocation per frame; fixed-size arrays only. Instrumented from M1. -Golden tests capture old VCAudioTrigger output for deterministic inputs. New pipeline must match within tolerance for legacy defaults. +--- -### DD16. Pilot scripts before full port +## Implementation Milestones -Port 2–3 representative scripts and validate visual parity before full migration. +Sequencing: live ships first (M0–M5). Cached/identified/position-tracked features extend the same `AudioFeatures` view in M6–M9. Verification across the corpus is M10. Anything past M5 is optional from a "QLC+ has great audio reactivity" standpoint; everything past M5 is "QLC+ knows what's playing." -### DD17. Keep `ExpFilter` shape for mechanical migration +### M0. Foundation — live dependencies only -`AudioDSP.Filter(decay, rise)` in JS matches `LedFx.ExpFilter` for per-pixel smoothing. +- [ ] Vendor `aubio` as a git submodule under `thirdparty/`. Pin version. +- [ ] CMake option `audio_aubio=ON` (default). `audio_essentia` and `audio_olaf` flags exist but stay OFF until M6/M7. +- [ ] Add `qlcplusaudioanalysis` static library target. +- [ ] `engine/audio/test/` harness scaffolding with a synthetic-frame fixture and a tone/sweep/impulse generator. +- [ ] `scripts/audiobench` CLI that injects synthetic frames into the analyzer and prints features and timing — used to lock down latency targets before any UI work. -### DD18. Phase 2 split for risk reduction +### M1. Live AudioAnalyzer — best-possible, low-latency -VCAudioTrigger evolution split into: backend swap (parity), persistence model, new UI panels, script integration readiness. +- [ ] Define `AudioFrame` (mono samples, FFT bins, raw RMS, peak, sample rate, `audioDtMs`, frame index) in `engine/audio/src/audioframe.h`. +- [ ] Modify `AudioCapture::processData()` to populate `AudioFrame` and pass directly to `LiveAudioAnalyzer`. Tune buffer to 256 samples / 48 kHz (~5 ms) where the platform backend allows. +- [ ] Define `AudioFeatures` and supporting structs (live-only fields populated; `match` and `events.drop` left default). +- [ ] Implement `LiveAudioAnalyzer` shared features: 32 log bands, `bandsDb`, `bandsNormalized`, perceptual bands, `rmsDb`, `peakDb`, `crestFactor`, `spectralFlux`, `spectralCentroidHz`, `spectralRolloffHz`, `spectralFlatness`, `noiseFloorDb`, 12-bin live chroma. +- [ ] Integrate aubio: onset detection (HFC default, configurable), tempo estimation, optional pitch. +- [ ] No heap allocation per frame. Fixed-size arrays. Lock-free SPSC ring for snapshots to consumers. +- [ ] Frame-budget instrumentation: per-frame analyzer time histogram. Assert <1 ms shared, <0.5 ms per-channel. +- [ ] Synthetic tests: silence, white noise, sine sweeps, kick impulse, hat impulse, ramp, threshold hover, variable frame interval. +- [ ] End-to-end live latency measurement (input click → onset signal) on Linux/macOS/Windows. Target <10 ms. + +### M2. AudioProfile + AudioChannel + RGBMatrix wiring ---- +- [ ] `AudioProfile` document-model class: ID, name, isDefault, `AudioChannelConfig`. Persisted under `Doc` for now (SQLite move comes in M6 alongside the library DB). +- [ ] `AudioAnalyzer::createChannel()` / `updateConfig()` / `snapshot()` / `close()` handle API. Atomic queued config updates applied at frame boundary. Immutable `AudioSnapshot` value copies. +- [ ] Per-channel envelope (per-band attack/release), AGC (max gain dB, release ms, noise floor dB), triggers (Schmitt + hold + cooldown), volume smoothing. +- [ ] `RGBMatrix.audioProfileId` property; resolution chain (explicit → default-flagged → first → anonymous fallback). +- [ ] Auto-create "Default Audio" profile on first audio script use. +- [ ] RGBMatrix editor: profile selector, intensity scale (0–200%), "Edit profile…" button. Old per-script DSP sliders removed. +- [ ] Frame budget instrumentation: assert <1 ms with five active channels. -## Implementation Phases - -### Phase 0: Foundation - -- [ ] Audit `AudioCapture::processData()` — identify insertion point for `AudioFrame` data. -- [ ] Audit `AudioParams` — list DSP vs non-DSP properties for removal. -- [ ] Inventory all 28 `audio*.js` scripts — tag AudioParams usage for impact analysis. -- [ ] Fix docs/comments calling spectrum linear or band cuts fixed. -- [ ] Confirm VCAudioTrigger XML round-trips so Version attribute can be added safely. -- [ ] Design `AudioProfile` document-model class: ID, name, isDefault, channel config, XML schema. - -### Phase 1: C++ AudioAnalyzer + AudioChannel - -- [ ] Create `AudioFrame` internal struct (mono samples, FFT bins, raw RMS, peak, sample rate, `audioDtMs`, frame index). -- [ ] Modify `AudioCapture::processData()` to populate `AudioFrame` and pass to `AudioAnalyzer` directly. -- [ ] Create `AudioAnalyzer` with shared features: 32 log bands, `bandsDb`, `bandsNormalized`, `perceptualBands`, `rmsDb`, `peakDb`, `crestFactor`, `spectralFlux`, `spectralCentroidHz`, `spectralRolloffHz`, `spectralFlatness`, `noiseFloorDb`, beat features. -- [ ] Define `AudioChannelConfig` (envelope per band, AGC, triggers, band layout, volume smoothing, noise gate). -- [ ] Implement handle-based API: `createChannel(config)`, `updateConfig()`, `snapshot()`, `close()`. -- [ ] Implement per-channel processing: envelopes, AGC, triggers (Schmitt + hold + cooldown). -- [ ] Define `AudioSnapshot` immutable value object. -- [ ] Build synthetic audio test harness — inject `AudioFrame` directly. -- [ ] Unit test shared features: silence, noise, sweep, impulse, ramp, varying intervals. -- [ ] Unit test per-channel: envelopes, AGC, trigger state machine. -- [ ] Instrument per-frame time per channel; assert <1ms. -- [ ] Implement anonymous default channel fallback. - -### Phase 2A: Audio Profiles in Document Model - -- [ ] Create `AudioProfile` class: ID, name, isDefault, `AudioChannelConfig`, `AudioChannelHandle`. -- [ ] Register `AudioProfile` in `Doc` (map, create/delete/lookup by ID). -- [ ] Implement XML load/save with Version attribute and children: ``, ``, ``, ``, ``. -- [ ] Implement auto-creation of "Default Audio" profile when first audio script is added. -- [ ] Implement migration: old per-script slider values → generated profile on XML load. -- [ ] Unit test: profile creation, config round-trip, migration from old XML. - -### Phase 2B: VCAudioTrigger Backend Swap - -- [ ] Associate VCAudioTrigger with an Audio Profile (create or select). -- [ ] Replace internal `slotSpectrumDataChanged()` normalize/smooth/threshold with `handle.snapshot()` reads. -- [ ] Keep existing per-bar DMX/Function/Widget trigger behavior (legacy path). -- [ ] Golden tests: deterministic inputs produce expected bar values, trigger fires, DMX writes matching old behavior. -- [ ] Expose profile resolution to `Doc` for script engine. - -### Phase 2C: VCAudioTrigger New UI Panels - -- [ ] **Bands panel**: band edge editors with frequency labels, presets (default/strict/wide). -- [ ] **Envelope panel**: per-band attack/release sliders (ms), live mini-graph of envelope vs raw level. -- [ ] **AGC panel**: max gain (dB), release (ms), noise floor (dB), live gain meter. -- [ ] **Triggers panel**: per-band Schmitt thresholds, hold/cooldown ms, live state lamp, fires/sec counter. -- [ ] **Spectral features panel**: live centroid, rolloff, flatness, flux readouts. -- [ ] **Runtime monitor strip**: compact live view (envelope curves, AGC gain, trigger lamps). -- [ ] Keep existing bars panel and per-bar config. - -### Phase 3: Wire RGBScript to Audio Profiles - -- [ ] Implement profile resolution in `rgbscriptv4.cpp` per DD5: explicit → default → first → anonymous. Log once per script start. -- [ ] Add `audioProfileId` property on `RGBMatrix` (saved to function XML). -- [ ] Surface profile selector in RGBMatrix editor: dropdown + "Create new…" + intensity scale + "Edit Audio Profile…". -- [ ] Update `buildAudioDataObject()` to read from `AudioSnapshot`: `audio.bands.*`, `audio.triggers.*`, `audio.volume.*`, `audio.music.*`, `audio.features.*`, `audio.audioDtMs`, `audio.consumerDtMs`. -- [ ] Keep legacy fields (`spectrum`, `volume`, `beat`, `bpm`, `maxMagnitude`) for compat. -- [ ] Strip DSP from `AudioParams`. Deprecated stubs with one-shot warning. -- [ ] Remove per-script audio slider UI from RGBMatrix editor. -- [ ] **Thin vertical slice**: prove end-to-end with ONE script before proceeding (AudioCapture → Analyzer → Profile → VCAudioTrigger monitor → RGBMatrix → enriched audio → script reads bands/triggers). - -### Phase 4: RGBUtil + Non-Audio JS Cleanup - -- [ ] Add `RGBUtil`: `rgb`, `hsv2rgb`, `createMap`, `interpolate`, `simplex2d`, `noiseField2d`. -- [ ] Verify `RGBUtil.rgb()` byte order matches engine pixel format. -- [ ] Add `AudioDSP.Filter(decay, rise)` matching `LedFx.ExpFilter` for JS per-pixel smoothing. -- [ ] Keep temporary `LedFx` shim during transition. -- [ ] Update `audio_common.js` to drop DSP helpers. - -### Phase 5: Port Bundled Scripts (simpler — no per-script DSP) - -Scripts become thin: read pre-computed values, decide visuals. - -- [ ] **Pilot checkpoint**: port 3 representative scripts (trigger, blend, spectrum) and visually compare before continuing. - -| Pattern | Scripts | Main migration | -| --- | --- | --- | -| Trigger-first | `audiostrobe`, `audioshot`, `audiobasslaser`, `audioshockwave` | `audio.triggers.*.firedThisFrame` and `active`. | -| Three-band blend | `audioaurora`, `audiochaser`, `audioenergy`, `audiolava`, `audiofireworks`, `audiohueshift` | `audio.bands.*` from C++ perceptual groups. | -| Single low-energy | `audiomelt`, `audioplasma`, `audiosoap`, `audiotunnel`, `audiovortex`, `audioscan`, `audiocrawler`, `audioglitch` | `audio.bands.sub`/`bass`. | -| Spectrum visuals | `audiospectrum`, `audioequalizer`, `audiosplittower`, `audiowavelength`, `audiopower`, `audiofire`, `audioscroll`, `audioblocks` | `audio.spectrum` from channel. | -| State machine | `audiobuildup` | `audio.features.flux` + triggers. | -| Spatial sim | `audiowater` | Perceptual bands + `audioDtMs`. | - -- [ ] Port trigger-first scripts. -- [ ] Port three-band blend scripts. -- [ ] Port single low-energy driver scripts. -- [ ] Port spectrum visual scripts. -- [ ] Port state machine script. -- [ ] Port spatial simulation script. -- [ ] Replace `LedFx.*` → `RGBUtil.*` / `audio.bands.*` / `audio.triggers.*` / `AudioDSP.Filter`. - -### Phase 6: Delete `ledfx_compat.js` - -- [ ] `rg "LedFx\." resources/rgbscripts` returns no usage. -- [ ] `audio_common.js` has no `LedFx` dependency. -- [ ] `rgbscriptv4.cpp` no longer preloads `ledfx_compat.js`. -- [ ] CMakeLists no longer installs it. -- [ ] Docs updated. Shim removed. File deleted. - -### Phase 7: Verification - -- [ ] All synthetic audio tests pass: - -| Test | Expected signal | +### M3. RGBUtil + thin script vertical slice + +- [ ] Add `RGBUtil`: `rgb`, `hsv2rgb`, `createMap`, `interpolate`, `simplex2d`, `noiseField2d`. Verify byte order matches engine pixel format. +- [ ] Add `AudioDSP.Filter(decayMs, riseMs)` for optional per-pixel exponential smoothing in scripts. +- [ ] Wire `buildAudioDataObject()` in `rgbscriptv4.cpp` to read `AudioSnapshot`: `audio.bands.*`, `audio.triggers.*`, `audio.volume.*`, `audio.music.bpm`, `audio.features.*`, `audio.audioDtMs`, `audio.consumerDtMs`. +- [ ] Port 3 pilot scripts representative of the major patterns (one trigger-first, one three-band blend, one spectrum visual). Visual side-by-side compare against the legacy version. +- [ ] Decision gate: if pilots look or feel worse on live audio than the legacy versions, fix the live analyzer before proceeding. + +### M4. VCAudioTrigger live UI rewrite + +- [ ] Delete the old per-bar trigger UI and `slotSpectrumDataChanged()`-driven DSP. +- [ ] Header: profile selector, live status badge (recognition badge stays placeholder until M7). +- [ ] Bands tab: perceptual band edge editors, frequency labels, presets. +- [ ] Envelope tab: per-band attack/release sliders with live mini-graph. +- [ ] AGC tab: max gain / release / noise-floor + live gain meter. +- [ ] Triggers tab: per-band Schmitt thresholds, hold/cooldown, lamp + fires/sec. +- [ ] Spectral tab: live centroid, rolloff, flatness readouts. +- [ ] Live monitor strip: envelope curves and trigger lamps, ~30 Hz refresh. + +### M5. Port remaining scripts + delete legacy + +- [ ] Port the remaining 25 audio scripts (the 28 audio scripts minus M3 pilots). +- [ ] Delete `ledfx_compat.js`, `audio_common.js`, `BeatTracker`, the old `AudioParams` DSP fields, and the legacy `dataProcessed()` consumer path. +- [ ] `rg "LedFx\." resources/rgbscripts` returns empty. +- [ ] CMakeLists no longer installs the deleted JS files. +- [ ] **Live shippable here.** Tag and ship if the rest of the work slips. + +## M0-M5 Implementation Verification Snapshot + +Verified on 2026-05-05 with GPT-5.5 subagents against the repository state after the initial `AudioFeatures` / `LiveAudioAnalyzer` scaffold. + +Build verification: + +- `cmake -S . -B build -Dqmlui=ON` could not configure in the current runner because Qt development package config files are not installed or not discoverable (`Qt5Config.cmake`, `qt5-config.cmake`, `Qt6Config.cmake`, `qt6-config.cmake` missing). +- `cmake --build build --target qlcplusaudio -j2` was therefore not reached. +- This is an environment gap, not proof that the current source builds. + +Current implementation status: + +| Milestone | Status | Evidence | Remaining gap | +| --- | --- | --- | --- | +| M0 Foundation | Missing | No `thirdparty/` aubio submodule, no `audio_aubio` / `audio_essentia` / `audio_olaf` CMake options, no `qlcplusaudioanalysis` target, no `engine/audio/test/`, no `scripts/audiobench`. | Land the dependency/build/test foundation before expanding analyzer behavior. | +| M1 Live AudioAnalyzer | Partial | `engine/audio/src/audiofeatures.h`, `liveaudioanalyzer.*`, and `AudioCapture::audioFeatures()` exist; `AudioCapture::processData()` now fills a live feature snapshot. | Still missing `AudioFrame`, aubio, 256-frame / 48 kHz latency tuning, noise floor, 12-bin chroma, no-allocation proof, lock-free/SPSC snapshot handoff, timing instrumentation, synthetic tests, and end-to-end latency measurement. | +| M2 AudioProfile + AudioChannel + RGBMatrix wiring | Missing | No `AudioProfile`, `AudioChannelConfig`, `AudioSnapshot`, channel-handle API, `RGBMatrix.audioProfileId`, or profile editor wiring found. | Implement the document model and channel snapshot API before script/UI rewrites. | +| M3 RGBUtil + thin script vertical slice | Missing / blocked | `resources/rgbscripts/ledfx_compat.js` still provides `LedFx`; `rgbscriptv4.cpp` still builds legacy `audio.spectrum`, `volume`, `beat`, `bpm`, `maxMagnitude` data. | Add `RGBUtil`, `AudioDSP.Filter(ms)`, wire `buildAudioDataObject()` to the new snapshot shape, then port three representative pilot scripts. | +| M4 VCAudioTrigger live UI rewrite | Missing | Existing v4/v5 VCAudioTrigger paths still connect to `dataProcessed()` and use spectrum-bar / threshold behavior. | Replace per-bar UI with profile, live bands, envelope, AGC, trigger, spectral, and monitor panels. | +| M5 Port scripts + delete legacy | Missing | `audio_common.js`, `ledfx_compat.js`, `AudioParams`, `LedFx.*`, `BeatTracker`, and legacy `dataProcessed()` consumers remain. | Port all audio scripts and remove legacy DSP/helper/install paths only after M2-M4 are functional. | + +Next steps to make M0-M5 shippable: + +1. **Fix build environment first.** Install or expose Qt 6 development config paths in the runner, then run `cmake -S . -B build -Dqmlui=ON` and `cmake --build build --target qlcplusaudio -j2` before further code changes. +2. **Complete M0.** Add pinned aubio dependency wiring, the audio feature library target, synthetic test scaffolding, and the `audiobench` CLI. +3. **Harden M1 around `AudioFrame`.** Move the current direct `AudioCapture` → `LiveAudioAnalyzer` path to `AudioFrame`, add aubio onset/tempo, live chroma, noise-floor tracking, latency/budget instrumentation, and tests. +4. **Implement M2 model/API.** Add `AudioProfile`, `AudioChannelConfig`, immutable `AudioSnapshot`, channel handles, and `RGBMatrix.audioProfileId` persistence/editor hooks. +5. **Do M3 as a vertical slice.** Add `RGBUtil` / `AudioDSP`, expose the nested audio object to scripts, and port exactly three pilot scripts before touching all scripts. +6. **Rewrite M4 UI after the model is real.** Keep old VCAudioTrigger behavior until profiles and snapshots are usable, then replace it with the live control-center panels. +7. **Finish M5 deletion last.** Remove `ledfx_compat.js`, `audio_common.js`, `BeatTracker`, `AudioParams`, and legacy `dataProcessed()` consumers only after scripts and widgets are fully on `AudioFeatures` / `AudioSnapshot`. + +### M6. Offline feature pipeline — Essentia + SQLite + +- [ ] Add `audio_essentia=ON` build flag. Vendor Essentia. Update LICENSE / About for AGPL. +- [ ] Single SQLite at `~/.local/share/qlcplus/audio.db` with `tracks`, `feature_frames`, `beats`, `onsets`, `structural_events`, `chroma`, `audio_profiles`. Migrate `AudioProfile` storage from `Doc` XML to SQLite. +- [ ] `AudioLibraryIndexer` running Essentia: BPM, beats, key, danceability, multi-band envelopes, spectral shape, MFCC, structural events (drops/builds), 12-bin chroma at 10 Hz. +- [ ] CLI `qlcplus-audio-index` for batch operation. +- [ ] `CachedAudioAnalyzer` that publishes the same `AudioFeatures` shape, populated from SQLite at a given position. +- [ ] FMA/Jamendo corpus fetch script and accuracy spot-checks: BPM ±0.5, key Camelot-neighbor, drops ±1 s on 5 hand-annotated tracks. + +### M7. AudioIdentifier — Olaf one-shot + +- [ ] Add `audio_olaf=ON` build flag. Vendor Olaf. +- [ ] Extend the indexer to write Olaf hashes into `fingerprints` alongside Essentia features. +- [ ] `AudioIdentifier` interface; default `OlafAudioIdentifier` implementation. Worker-thread rolling-buffer match. On-demand + every 60 s + on chroma-tracker drift > 2 s. +- [ ] When identification succeeds, switch `AudioFeatures.source` to `Cached` with `match.identified=true`. On loss, fall back to live. +- [ ] Benchmark: cold ID lock <1.5 s on 90% of corpus; ±100 ms initial position; behaviour under EQ and +5% pitch shift. +- [ ] If Olaf misses commonly under DJ pitch-bend at the identification window, add `PanakoAudioIdentifier` behind `audio_panako=ON` (JVM dependency documented). + +### M8. ChromaPositionTracker — Tier 2 + +- [ ] Implement `ChromaPositionTracker`: live 12-bin chroma window vs cached chroma matrix, normalized cross-correlation, ±10% speed search grid (0.92, 0.94, …, 1.08). +- [ ] Update at ~5 s cadence on a worker thread. Publish `(position_ms, speed_factor, confidence)`. +- [ ] Time-warp cached beat/onset events by `speed_factor` so triggers fire on the right musical moments. +- [ ] Tests: tracker reacquires within ~5 s after a 100 ms manual seek; survives ±5% pitch-bend without losing lock. + +### M9. Tier-1 protocols — OS2L, Pro DJ Link, StagelinQ + +- [ ] `PositionSource` abstraction with priority + confidence + staleness; per-source latency offsets. +- [ ] OS2L listener: TCP JSON over Bonjour/Avahi. Decode `beat`/`btn`/`cmd`. Bind incoming `beat#` to song timeline using cached beat grid + AudioIdentifier track ID. +- [ ] Pro DJ Link client (use existing dysentery/beat-link protocol notes). Provides track + position when CDJ-3000s are on the LAN. +- [ ] StagelinQ client. Provides track + position from Denon/Numark Prime. +- [ ] UI: PositionSource active-tier indicator with calibratable latency offset. + +### M10. MCP tools + verification + +MCP tools (under `mcp/tools/audio_tools.cpp`): + +- [ ] `analyze_audio_file`, `analyze_audio_library`, `get_audio_features`, `get_audio_match_state`, `get_position_source_state`. +- [ ] `list_audio_profiles` / `create_audio_profile` / `update_audio_profile`. +- [ ] Annotations: read-only on getters; idempotent on profile create/update; long-running on indexing. + +Verification matrix: + +| Test | Expected | | --- | --- | -| Silence | Features near zero, AGC clamps, no trigger chatter. | -| White noise | High flatness, no false beat. | -| Sine sweeps | Energy in expected perceptual band. | -| Kick impulse | `triggers.bass.firedThisFrame` once, cooldown prevents chatter. | -| Hat impulse | `triggers.high.firedThisFrame` without bass. | -| Quiet→loud ramp | AGC adapts smoothly. | -| 20ms vs 60ms frames | Envelopes decay by `audioDtMs`. | -| Threshold hover | Schmitt holds cleanly. | -| Multiple profiles | Independent snapshots from same audio. | -| Profile resolution | Explicit → default → first → anonymous chain works. | -| Old-XML round-trip | Pre-upgrade docs load/save; old slider values converted to profile. | - -- [ ] Run parameterized synthetic tests. -- [ ] Run multi-profile and resolution tests. -- [ ] Run XML round-trip and migration tests. -- [ ] Run VCAudioTrigger golden tests. -- [ ] Confirm frame budget <1ms with multiple channels. -- [ ] Confirm runtime monitor updates smoothly. -- [ ] Live test: 3+ ported scripts, varied music. -- [ ] Document migration in release notes. +| Silence | Features near zero, no trigger chatter | +| White noise | High flatness, no false beat | +| Sine sweeps | Energy in expected perceptual band | +| Kick impulse | `triggers.bass.firedThisFrame` once, cooldown holds | +| Hat impulse | `triggers.high.firedThisFrame` without bass | +| Quiet→loud ramp | AGC adapts smoothly | +| Variable frame interval | Envelopes decay by `audioDtMs` | +| Schmitt hover | No chatter | +| Multiple profiles | Independent snapshots from same audio | +| Live latency | <10 ms input-click to onset signal | +| Live frame budget | <1 ms shared, <0.5 ms per channel | +| FMA corpus BPM | ±0.5 BPM on 90% of tracks | +| FMA corpus key | Camelot-neighbor on 80% of tracks | +| FMA corpus drops | ±1 s of hand-annotated drops on 5 tracks | +| Identification cold lock | <1.5 s on 90% of corpus | +| Chroma tracker drift | <50 ms under +5% pitch shift after acquisition | +| OS2L sync | Beat events align with cached beat grid within calibrated offset | --- ## Done Criteria -- [ ] Document-level `AudioProfile` objects hold all DSP config. -- [ ] VCAudioTrigger is primary editor/monitor with envelope curves, AGC meter, trigger lamps, spectral readouts. -- [ ] One `AudioChannelHandle` per profile; multiple profiles coexist. -- [ ] C++ `AudioAnalyzer` computes shared features and per-channel state from `AudioFrame`. -- [ ] Scripts resolve audio via profile chain and log the source. -- [ ] `RGBMatrix` exposes `audioProfileId` with profile selector + intensity scale + "Edit Audio Profile…" in UI. -- [ ] Per-script DSP sliders removed. Old values converted to profile on load. Deprecated stubs warn. -- [ ] No bundled script computes own DSP. No script references `LedFx.*`. -- [ ] `ledfx_compat.js` deleted. `RGBUtil` + `AudioDSP.Filter` available. -- [ ] Audio Profile XML versioned. Legacy per-bar triggers preserved alongside new per-band triggers. -- [ ] Envelopes use `audioDtMs`. Triggers frame-stable. -- [ ] All tests pass. Frame budget <1ms. Docs updated. Release notes written. +### Live milestone (M0–M5, shippable on its own) + +- [ ] `AudioFeatures` is the single shared view consumed by scripts, widgets, and MCP tools. +- [ ] `LiveAudioAnalyzer` produces all live features below 1 ms shared / 0.5 ms per channel. +- [ ] End-to-end live onset latency under 10 ms on Linux, macOS, and Windows. +- [ ] `AudioProfile` document objects own all DSP config; multiple profiles coexist; `RGBMatrix` references one by ID. +- [ ] VCAudioTrigger rebuilt with bands/envelope/AGC/triggers/spectral panels working off live data. +- [ ] All 28 audio scripts ported to read `audio.bands.*`, `audio.triggers.*`, `audio.music.*`. +- [ ] `ledfx_compat.js`, `audio_common.js`, `BeatTracker`, and old `AudioParams` DSP fields deleted. +- [ ] Synthetic test matrix passes. Frame budget instrumentation in CI. + +### Cached + identified + tiered position (M6–M10) + +- [ ] Single SQLite `audio.db` holds tracks, features, chroma, fingerprints, and profiles. +- [ ] `qlcplus-audio-index` indexes a directory with Essentia + Olaf in one pass. +- [ ] `AudioIdentifier` identifies a known track within 1.5 s of session start. +- [ ] `ChromaPositionTracker` keeps position drift under 50 ms during +5% pitch-bend. +- [ ] `PositionSource` correctly hands off between OS2L / Pro DJ Link / StagelinQ / chroma / live tiers under simulated venue conditions. +- [ ] MCP tools exposed for batch analysis, feature lookup, identification state, and position-source state. +- [ ] LICENSE and About box reflect AGPL-3.0 combined work; build flag for Essentia-free builds documented. +- [ ] FMA corpus benchmarks pass. diff --git a/engine/audio/src/CMakeLists.txt b/engine/audio/src/CMakeLists.txt index 7b90e006c6..4089b1fb14 100644 --- a/engine/audio/src/CMakeLists.txt +++ b/engine/audio/src/CMakeLists.txt @@ -5,10 +5,12 @@ add_library(${module_name} audio.cpp audio.h audiocapture.cpp audiocapture.h audiodecoder.cpp audiodecoder.h + audiofeatures.h audioparameters.cpp audioparameters.h audioplugincache.cpp audioplugincache.h audiorenderer.cpp audiorenderer.h beattracker.cpp beattracker.h + liveaudioanalyzer.cpp liveaudioanalyzer.h ) set_property(TARGET ${module_name} PROPERTY POSITION_INDEPENDENT_CODE ON) target_include_directories(${module_name} PUBLIC diff --git a/engine/audio/src/audiocapture.cpp b/engine/audio/src/audiocapture.cpp index 288628624d..8c55a1e393 100644 --- a/engine/audio/src/audiocapture.cpp +++ b/engine/audio/src/audiocapture.cpp @@ -18,6 +18,8 @@ limitations under the License. */ +#include + #include #include #include @@ -31,6 +33,9 @@ #define M_2PI 6.28318530718 /* 2*pi */ +static_assert(AUDIO_FEATURE_BANDS == FREQ_SUBBANDS_MAX_NUMBER, + "AudioFeatures and AudioCapture must use the same live band count"); + AudioCapture::AudioCapture (QObject* parent) : QThread (parent) , m_userStop(true) @@ -123,6 +128,12 @@ double AudioCapture::bandMaxMagnitude(int numBands) const return maxVal; } +AudioFeatures AudioCapture::audioFeatures() const +{ + QMutexLocker locker(&m_mutex); + return m_audioFeatures; +} + int AudioCapture::lowCutBin(int N) { if (N < 3) return 0; @@ -202,6 +213,36 @@ void AudioCapture::stop() double AudioCapture::fillBandsData(int number) { + auto it = m_fftMagnitudeMap.find(number); + if (it == m_fftMagnitudeMap.end()) + return 0.0; + + QVector &bands = it.value().m_fftMagnitudeBuffer; + if (bands.size() != number) + bands = QVector(number); + + return fillLogBands(number, bands.data()); +} + +double AudioCapture::fillLogBands(int number, QVector &bands) const +{ + if (number <= 0) + { + bands.clear(); + return 0.0; + } + + if (bands.size() != number) + bands = QVector(number); + + return fillLogBands(number, bands.data()); +} + +double AudioCapture::fillLogBands(int number, double *bands) const +{ + if (number <= 0 || bands == nullptr) + return 0.0; + // m_fftOutputBuffer contains the real and imaginary data of a spectrum // representing all the frequencies from 0 to m_sampleRate Hz. // Consider the configured spectrum range and calculate average magnitude @@ -215,10 +256,9 @@ double AudioCapture::fillBandsData(int number) const double maxFreq = qMin(double(SPECTRUM_MAX_FREQUENCY), nyquist); const double logRange = (maxFreq > minFreq) ? qLn(maxFreq / minFreq) : 0.0; - if (number <= 0 || maxBin <= 1 || logRange <= 0.0) + if (maxBin <= 1 || logRange <= 0.0) { - if (number > 0 && m_fftMagnitudeMap.contains(number)) - m_fftMagnitudeMap[number].m_fftMagnitudeBuffer.fill(0.0); + std::fill_n(bands, number, 0.0); return 0.0; } @@ -243,16 +283,21 @@ double AudioCapture::fillBandsData(int number) const int bandWidth = endBin - startBin; const double bandMagnitude = magnitudeSum / (double(bandWidth) * M_2PI); - m_fftMagnitudeMap[number].m_fftMagnitudeBuffer[b] = bandMagnitude; + bands[b] = bandMagnitude; if (maxMagnitude < bandMagnitude) maxMagnitude = bandMagnitude; } #else - Q_UNUSED(number) + std::fill_n(bands, number, 0.0); #endif return maxMagnitude; } +double AudioCapture::fillAudioFeatureBands(std::array &bands) const +{ + return fillLogBands(AUDIO_FEATURE_BANDS, bands.data()); +} + void AudioCapture::processData() { unsigned int i, j; @@ -288,10 +333,12 @@ void AudioCapture::processData() // Remove DC, compute RMS in one pass (normalize to [-1,1]) double sumSq = 0.0; + double peak = 0.0; for (i = 0; i < m_bufferSize; ++i) { const double x = (double(m_audioMixdown[i]) - mean) / 32768.0; sumSq += x * x; + peak = qMax(peak, qAbs(x)); m_fftInputBuffer[i] = x; // will be windowed right below } const double rms = qSqrt(sumSq / double(m_bufferSize)); @@ -304,6 +351,7 @@ void AudioCapture::processData() double maxMagnitude = 0.0; quint32 power = smoothPower(0.0); m_signalPower = power; + m_audioFeatures = m_liveAnalyzer.analyzeSilence(); for (int barsNumber : m_fftMagnitudeMap.keys()) { // Ensure the buffer exists and is zeroed @@ -356,6 +404,9 @@ void AudioCapture::processData() // 5) Fill per-band magnitudes and compute power double pwrSum = 0.; double maxMagnitude = 0.; + std::array featureBands {}; + const double featureMaxMagnitude = fillAudioFeatureBands(featureBands); + m_audioFeatures = m_liveAnalyzer.analyze(rms, peak, featureBands, featureMaxMagnitude); for (int barsNumber : m_fftMagnitudeMap.keys()) { maxMagnitude = fillBandsData(barsNumber); // fills & returns max per-band @@ -389,10 +440,28 @@ void AudioCapture::run() { if (readAudio(m_captureSize) == true) { - QMutexLocker locker(&m_mutex); - processData(); - - if (m_beatTracker->processAudio(m_audioBuffer, m_captureSize)) + bool hasBeat = false; + { + QMutexLocker locker(&m_mutex); + processData(); + + hasBeat = m_beatTracker->processAudio(m_audioBuffer, m_captureSize); + // BeatTracker may keep returning a stable BPM even during silence. + // If processData() gated the frame as silent, force beat/BPM back to zero. + if (m_audioFeatures.rmsDb <= -90.0f) + { + m_audioFeatures.beat = false; + m_audioFeatures.bpm = 0.0; + } + else + { + m_audioFeatures.beat = hasBeat; + m_audioFeatures.bpm = m_beatTracker->getCurrentBpm(); + } + } + emit audioFeaturesChanged(); + + if (hasBeat) emit beatDetected(); } else diff --git a/engine/audio/src/audiocapture.h b/engine/audio/src/audiocapture.h index 82e7c157b4..36ccf10265 100644 --- a/engine/audio/src/audiocapture.h +++ b/engine/audio/src/audiocapture.h @@ -22,11 +22,15 @@ #define AUDIOCAPTURE_H #include +#include #include #include #include #include +#include "audiofeatures.h" +#include "liveaudioanalyzer.h" + #ifdef HAS_FFTW3 #include "fftw3.h" #endif @@ -142,6 +146,9 @@ class AudioCapture : public QThread /** Get the maximum magnitude across all bands in a registered band set. */ double bandMaxMagnitude(int numBands) const; + /** Get the latest live audio feature frame. */ + AudioFeatures audioFeatures() const; + protected: void stop(); @@ -155,6 +162,13 @@ class AudioCapture : public QThread /** This is called at every processData to fill a single BandsData structure */ double fillBandsData(int number); + /** Fill logarithmic FFT bands without registering a legacy consumer. */ + double fillLogBands(int number, QVector &bands) const; + double fillLogBands(int number, double *bands) const; + + /** Fill the fixed-size bands used by AudioFeatures. */ + double fillAudioFeatureBands(std::array &bands) const; + /** This is the method where captured audio data is processed in this order * 1) calculates the signal power, which will be the volume bar * 2) perform the FFT @@ -164,11 +178,12 @@ class AudioCapture : public QThread signals: void dataProcessed(double *spectrumBands, int size, double maxMagnitude, quint32 power); + void audioFeaturesChanged(); void volumeChanged(int volume); void beatDetected(); protected: - QMutex m_mutex; + mutable QMutex m_mutex; bool m_userStop, m_pause; unsigned int m_bufferSize, m_captureSize, m_sampleRate, m_channels; @@ -192,6 +207,10 @@ class AudioCapture : public QThread /** Reference to the beat tracking processor */ BeatTracker *m_beatTracker; + + /** Unified live audio feature state */ + LiveAudioAnalyzer m_liveAnalyzer; + AudioFeatures m_audioFeatures; }; /** @} */ diff --git a/engine/audio/src/audiofeatures.h b/engine/audio/src/audiofeatures.h new file mode 100644 index 0000000000..95867563d5 --- /dev/null +++ b/engine/audio/src/audiofeatures.h @@ -0,0 +1,71 @@ +/* + Q Light Controller Plus + audiofeatures.h + + Copyright (c) Massimo Callegari + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0.txt + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*/ + +#ifndef AUDIOFEATURES_H +#define AUDIOFEATURES_H + +#include +#include + +static constexpr int AUDIO_FEATURE_BANDS = 32; + +struct AudioFeatures final +{ + enum Source + { + Live, + Cached + }; + + struct PerceptualBands + { + float sub = 0.0f; + float bass = 0.0f; + float lowMid = 0.0f; + float mid = 0.0f; + float high = 0.0f; + }; + + Source source = Live; + qint64 trackId = -1; + double positionMs = 0.0; + + float rmsDb = -96.0f; + float peakDb = -96.0f; + float crestFactor = 0.0f; + + std::array bandsLog {}; + std::array bandsDb {}; + std::array bandsNormalized {}; + PerceptualBands bands; + + float spectralCentroidHz = 0.0f; + float spectralRolloffHz = 0.0f; + float spectralFlatness = 0.0f; + float spectralFlux = 0.0f; + + bool onset = false; + bool beat = false; + double bpm = 0.0; + + bool matchLocked = false; + double matchConfidence = 0.0; +}; + +#endif // AUDIOFEATURES_H diff --git a/engine/audio/src/liveaudioanalyzer.cpp b/engine/audio/src/liveaudioanalyzer.cpp new file mode 100644 index 0000000000..902a3e1006 --- /dev/null +++ b/engine/audio/src/liveaudioanalyzer.cpp @@ -0,0 +1,166 @@ +/* + Q Light Controller Plus + liveaudioanalyzer.cpp + + Copyright (c) Massimo Callegari + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0.txt + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*/ + +#include + +#include "audiocapture.h" +#include "liveaudioanalyzer.h" + +namespace +{ + float amplitudeToDb(double value) + { + return float(20.0 * qLn(qMax(value, 0.00001585)) / qLn(10.0)); + } + + double bandCenterHz(int index, int count) + { + if (count <= 0) + return 0.0; + + const double minFreq = double(AudioCapture::minFrequency()); + const double maxFreq = double(AudioCapture::maxFrequency()); + const double logRange = qLn(maxFreq / minFreq); + const double start = minFreq * qExp(logRange * (double(index) / double(count))); + const double end = minFreq * qExp(logRange * (double(index + 1) / double(count))); + return qSqrt(start * end); + } + + void addBand(int &count, float *target, float value) + { + *target += value; + count++; + } + + static constexpr float kFluxAverageKeep = 0.9f; + static constexpr float kFluxAverageNew = 1.0f - kFluxAverageKeep; + static constexpr float kOnsetNoiseFloorDb = -54.0f; + static constexpr float kOnsetMinimumFlux = 0.35f; + static constexpr float kOnsetAdaptiveMultiplier = 1.8f; +} + +AudioFeatures LiveAudioAnalyzer::analyze(double rms, + double peak, + const std::array &logBands, + double maxMagnitude) +{ + AudioFeatures features; + const int bandCount = AUDIO_FEATURE_BANDS; + features.rmsDb = amplitudeToDb(rms); + features.peakDb = amplitudeToDb(peak); + features.crestFactor = (rms > 0.0) ? float(peak / rms) : 0.0f; + + double magnitudeSum = 0.0; + double weightedFrequencySum = 0.0; + double rolloffTarget = 0.0; + double rolloffSum = 0.0; + double geometricSum = 0.0; + double arithmeticSum = 0.0; + + int subCount = 0; + int bassCount = 0; + int lowMidCount = 0; + int midCount = 0; + int highCount = 0; + + for (int i = 0; i < bandCount; i++) + { + const double magnitude = qMax(0.0, logBands[i]); + const float normalized = (maxMagnitude > 0.0) ? float(qBound(0.0, magnitude / maxMagnitude, 1.0)) : 0.0f; + const double centerHz = bandCenterHz(i, bandCount); + + features.bandsLog[i] = float(magnitude); + features.bandsNormalized[i] = normalized; + features.bandsDb[i] = amplitudeToDb(normalized); + + magnitudeSum += magnitude; + weightedFrequencySum += magnitude * centerHz; + arithmeticSum += magnitude; + geometricSum += qLn(qMax(magnitude, 0.000000001)); + + if (centerHz < 80.0) + addBand(subCount, &features.bands.sub, normalized); + else if (centerHz < 250.0) + addBand(bassCount, &features.bands.bass, normalized); + else if (centerHz < 500.0) + addBand(lowMidCount, &features.bands.lowMid, normalized); + else if (centerHz < 2000.0) + addBand(midCount, &features.bands.mid, normalized); + else + addBand(highCount, &features.bands.high, normalized); + } + + if (subCount > 0) + features.bands.sub /= float(subCount); + if (bassCount > 0) + features.bands.bass /= float(bassCount); + if (lowMidCount > 0) + features.bands.lowMid /= float(lowMidCount); + if (midCount > 0) + features.bands.mid /= float(midCount); + if (highCount > 0) + features.bands.high /= float(highCount); + + if (magnitudeSum > 0.0) + { + features.spectralCentroidHz = float(weightedFrequencySum / magnitudeSum); + rolloffTarget = magnitudeSum * 0.85; + for (int i = 0; i < bandCount; i++) + { + rolloffSum += qMax(0.0, logBands[i]); + if (rolloffSum >= rolloffTarget) + { + features.spectralRolloffHz = float(bandCenterHz(i, bandCount)); + break; + } + } + } + + if (bandCount > 0 && arithmeticSum > 0.0) + { + const double geometricMean = qExp(geometricSum / double(bandCount)); + const double arithmeticMean = arithmeticSum / double(bandCount); + features.spectralFlatness = float(qBound(0.0, geometricMean / arithmeticMean, 1.0)); + } + + float flux = 0.0f; + for (int i = 0; i < bandCount; i++) + { + const float rise = features.bandsNormalized[i] - m_previousBands[i]; + if (rise > 0.0f) + flux += rise; + m_previousBands[i] = features.bandsNormalized[i]; + } + + features.spectralFlux = flux; + m_fluxAverage = (kFluxAverageKeep * m_fluxAverage) + (kFluxAverageNew * flux); + features.onset = (features.rmsDb > kOnsetNoiseFloorDb && + flux > qMax(kOnsetMinimumFlux, m_fluxAverage * kOnsetAdaptiveMultiplier)); + + return features; +} + +AudioFeatures LiveAudioAnalyzer::analyzeSilence() +{ + AudioFeatures features; + features.bandsDb.fill(-96.0f); + m_previousBands.fill(0.0f); + m_fluxAverage = 0.0f; + return features; +} diff --git a/engine/audio/src/liveaudioanalyzer.h b/engine/audio/src/liveaudioanalyzer.h new file mode 100644 index 0000000000..b1b6448cd6 --- /dev/null +++ b/engine/audio/src/liveaudioanalyzer.h @@ -0,0 +1,42 @@ +/* + Q Light Controller Plus + liveaudioanalyzer.h + + Copyright (c) Massimo Callegari + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0.txt + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*/ + +#ifndef LIVEAUDIOANALYZER_H +#define LIVEAUDIOANALYZER_H + +#include + +#include "audiofeatures.h" + +class LiveAudioAnalyzer final +{ +public: + AudioFeatures analyze(double rms, + double peak, + const std::array &logBands, + double maxMagnitude); + + AudioFeatures analyzeSilence(); + +private: + std::array m_previousBands {}; + float m_fluxAverage = 0.0f; +}; + +#endif // LIVEAUDIOANALYZER_H diff --git a/engine/test/CMakeLists.txt b/engine/test/CMakeLists.txt index 39c39e0641..3a47ad2662 100644 --- a/engine/test/CMakeLists.txt +++ b/engine/test/CMakeLists.txt @@ -60,3 +60,4 @@ add_subdirectory(universe) add_subdirectory(universeperf) add_subdirectory(video) add_subdirectory(iopluginstub) +add_subdirectory(liveaudioanalyzer) diff --git a/engine/test/liveaudioanalyzer/CMakeLists.txt b/engine/test/liveaudioanalyzer/CMakeLists.txt new file mode 100644 index 0000000000..203b78cd70 --- /dev/null +++ b/engine/test/liveaudioanalyzer/CMakeLists.txt @@ -0,0 +1,16 @@ +add_executable(liveaudioanalyzer_test WIN32 + liveaudioanalyzer_test.cpp liveaudioanalyzer_test.h +) + +target_include_directories(liveaudioanalyzer_test PRIVATE + ../../../plugins/interfaces + ../../src + ../../audio/src +) + +target_link_libraries(liveaudioanalyzer_test PRIVATE + Qt${QT_MAJOR_VERSION}::Core + Qt${QT_MAJOR_VERSION}::Gui + Qt${QT_MAJOR_VERSION}::Test + qlcplusaudio +) diff --git a/engine/test/liveaudioanalyzer/data/m0.json b/engine/test/liveaudioanalyzer/data/m0.json new file mode 100644 index 0000000000..22ff07872f --- /dev/null +++ b/engine/test/liveaudioanalyzer/data/m0.json @@ -0,0 +1,6 @@ +{ + "rms": 0.0, + "peak": 0.0, + "maxMagnitude": 0.0, + "bands": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] +} diff --git a/engine/test/liveaudioanalyzer/data/m1.json b/engine/test/liveaudioanalyzer/data/m1.json new file mode 100644 index 0000000000..a6bba67588 --- /dev/null +++ b/engine/test/liveaudioanalyzer/data/m1.json @@ -0,0 +1,6 @@ +{ + "rms": 0.1, + "peak": 0.2, + "maxMagnitude": 1.0, + "bands": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] +} diff --git a/engine/test/liveaudioanalyzer/data/m2.json b/engine/test/liveaudioanalyzer/data/m2.json new file mode 100644 index 0000000000..6d9cbef8f0 --- /dev/null +++ b/engine/test/liveaudioanalyzer/data/m2.json @@ -0,0 +1,6 @@ +{ + "rms": 0.1, + "peak": 0.1, + "maxMagnitude": 1.0, + "bands": [0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0, 0.5, 0] +} diff --git a/engine/test/liveaudioanalyzer/data/m3.json b/engine/test/liveaudioanalyzer/data/m3.json new file mode 100644 index 0000000000..8efc4f5a81 --- /dev/null +++ b/engine/test/liveaudioanalyzer/data/m3.json @@ -0,0 +1,6 @@ +{ + "rms": 0.2, + "peak": 0.4, + "maxMagnitude": 4.0, + "bands": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] +} diff --git a/engine/test/liveaudioanalyzer/liveaudioanalyzer_test.cpp b/engine/test/liveaudioanalyzer/liveaudioanalyzer_test.cpp new file mode 100644 index 0000000000..61f84396ac --- /dev/null +++ b/engine/test/liveaudioanalyzer/liveaudioanalyzer_test.cpp @@ -0,0 +1,155 @@ +/* + Q Light Controller Plus + liveaudioanalyzer_test.cpp +*/ + +#include +#include +#include +#include +#include +#include + +#include "audiocapture.h" +#include "liveaudioanalyzer_test.h" +#include "liveaudioanalyzer.h" + +namespace +{ + struct TestVector + { + double rms = 0.0; + double peak = 0.0; + double maxMagnitude = 0.0; + std::array bands {}; + }; + + TestVector loadVector(const char *relativePath) + { + const QString path = QFINDTESTDATA(QString::fromLatin1(relativePath)); + QVERIFY2(!path.isEmpty(), "Test data file not found via QFINDTESTDATA"); + + QFile file(path); + QVERIFY2(file.open(QIODevice::ReadOnly), qPrintable(QString("Failed to open %1").arg(path))); + + const QJsonDocument doc = QJsonDocument::fromJson(file.readAll()); + QVERIFY2(doc.isObject(), qPrintable(QString("Invalid JSON object in %1").arg(path))); + + const QJsonObject obj = doc.object(); + QVERIFY(obj.contains("rms")); + QVERIFY(obj.contains("peak")); + QVERIFY(obj.contains("maxMagnitude")); + QVERIFY(obj.contains("bands")); + + TestVector vec; + vec.rms = obj.value("rms").toDouble(); + vec.peak = obj.value("peak").toDouble(); + vec.maxMagnitude = obj.value("maxMagnitude").toDouble(); + + const QJsonArray bands = obj.value("bands").toArray(); + QCOMPARE(bands.size(), AUDIO_FEATURE_BANDS); + for (int i = 0; i < AUDIO_FEATURE_BANDS; i++) + vec.bands[size_t(i)] = bands.at(i).toDouble(); + + return vec; + } + + double bandCenterHz(int index, int count) + { + if (count <= 0) + return 0.0; + + const double minFreq = double(AudioCapture::minFrequency()); + const double maxFreq = double(AudioCapture::maxFrequency()); + const double logRange = qLn(maxFreq / minFreq); + const double start = minFreq * qExp(logRange * (double(index) / double(count))); + const double end = minFreq * qExp(logRange * (double(index + 1) / double(count))); + return qSqrt(start * end); + } +} + +void LiveAudioAnalyzer_Test::testUniformFrameBasics() +{ + LiveAudioAnalyzer analyzer; + const TestVector vec = loadVector("data/m1.json"); + + const AudioFeatures features = analyzer.analyze(vec.rms, vec.peak, vec.bands, vec.maxMagnitude); + + QVERIFY(qAbs(features.rmsDb - (-20.0f)) < 0.2f); + QVERIFY(qAbs(features.peakDb - (-13.979f)) < 0.2f); + QVERIFY(qAbs(features.crestFactor - 2.0f) < 0.001f); + + for (int i = 0; i < AUDIO_FEATURE_BANDS; i++) + { + QVERIFY(qAbs(features.bandsLog[size_t(i)] - 1.0f) < 0.0001f); + QVERIFY(qAbs(features.bandsNormalized[size_t(i)] - 1.0f) < 0.0001f); + QVERIFY(qAbs(features.bandsDb[size_t(i)] - 0.0f) < 0.05f); + } + + QVERIFY(qAbs(features.bands.sub - 1.0f) < 0.0001f); + QVERIFY(qAbs(features.bands.bass - 1.0f) < 0.0001f); + QVERIFY(qAbs(features.bands.lowMid - 1.0f) < 0.0001f); + QVERIFY(qAbs(features.bands.mid - 1.0f) < 0.0001f); + QVERIFY(qAbs(features.bands.high - 1.0f) < 0.0001f); + + QVERIFY(features.spectralCentroidHz > float(AudioCapture::minFrequency())); + QVERIFY(features.spectralCentroidHz < float(AudioCapture::maxFrequency())); + QVERIFY(features.spectralRolloffHz > float(AudioCapture::minFrequency())); + QVERIFY(features.spectralRolloffHz < float(AudioCapture::maxFrequency())); + QVERIFY(features.spectralFlatness > 0.99f); + + QCOMPARE(features.spectralFlux, 32.0f); + QVERIFY(features.onset); +} + +void LiveAudioAnalyzer_Test::testSpikeCentroidAndRolloff() +{ + LiveAudioAnalyzer analyzer; + const TestVector vec = loadVector("data/m3.json"); + + const AudioFeatures features = analyzer.analyze(vec.rms, vec.peak, vec.bands, vec.maxMagnitude); + + const int spikeIndex = 10; + const float expectedHz = float(bandCenterHz(spikeIndex, AUDIO_FEATURE_BANDS)); + QVERIFY(qAbs(features.spectralCentroidHz - expectedHz) < 0.01f); + QVERIFY(qAbs(features.spectralRolloffHz - expectedHz) < 0.01f); +} + +void LiveAudioAnalyzer_Test::testHalfNormalizationAndDb() +{ + LiveAudioAnalyzer analyzer; + const TestVector vec = loadVector("data/m2.json"); + + const AudioFeatures features = analyzer.analyze(vec.rms, vec.peak, vec.bands, vec.maxMagnitude); + + // Even indices are 0.5, odd indices are 0.0 + QVERIFY(qAbs(features.bandsNormalized[0] - 0.5f) < 0.0001f); + QVERIFY(qAbs(features.bandsNormalized[1] - 0.0f) < 0.0001f); + QVERIFY(qAbs(features.bandsDb[0] - (-6.0206f)) < 0.2f); + QVERIFY(qAbs(features.bandsDb[1] - (-96.0f)) < 0.01f); +} + +void LiveAudioAnalyzer_Test::testAnalyzeSilenceResetsHistory() +{ + LiveAudioAnalyzer analyzer; + const TestVector uniform = loadVector("data/m1.json"); + + const AudioFeatures first = analyzer.analyze(uniform.rms, uniform.peak, uniform.bands, uniform.maxMagnitude); + QCOMPARE(first.spectralFlux, 32.0f); + + const AudioFeatures second = analyzer.analyze(uniform.rms, uniform.peak, uniform.bands, uniform.maxMagnitude); + QCOMPARE(second.spectralFlux, 0.0f); + QVERIFY(!second.onset); + + const AudioFeatures silence = analyzer.analyzeSilence(); + QCOMPARE(silence.rmsDb, -96.0f); + for (int i = 0; i < AUDIO_FEATURE_BANDS; i++) + QCOMPARE(silence.bandsDb[size_t(i)], -96.0f); + + const AudioFeatures afterSilence = analyzer.analyze(uniform.rms, uniform.peak, uniform.bands, uniform.maxMagnitude); + QCOMPARE(afterSilence.spectralFlux, 32.0f); + QVERIFY(afterSilence.onset); +} + +QTEST_MAIN(LiveAudioAnalyzer_Test) + diff --git a/engine/test/liveaudioanalyzer/liveaudioanalyzer_test.h b/engine/test/liveaudioanalyzer/liveaudioanalyzer_test.h new file mode 100644 index 0000000000..91981fd02e --- /dev/null +++ b/engine/test/liveaudioanalyzer/liveaudioanalyzer_test.h @@ -0,0 +1,22 @@ +/* + Q Light Controller Plus + liveaudioanalyzer_test.h +*/ + +#ifndef LIVEAUDIOANALYZER_TEST_H +#define LIVEAUDIOANALYZER_TEST_H + +#include + +class LiveAudioAnalyzer_Test : public QObject +{ + Q_OBJECT + +private slots: + void testUniformFrameBasics(); + void testSpikeCentroidAndRolloff(); + void testHalfNormalizationAndDb(); + void testAnalyzeSilenceResetsHistory(); +}; + +#endif // LIVEAUDIOANALYZER_TEST_H