Add WAV/MP3 input with automatic 48 kHz resampling and stereo upmix by Copilot · Pull Request #15 · audiohacking/acestep.cpp

Copilot · 2026-03-07T18:49:22Z

The --src-audio (cover mode) and neural-codec --encode paths only accepted WAV at exactly 48 kHz. This adds transparent WAV + MP3 support at any sample rate, auto-resampled to 48 kHz and always delivered as stereo — exactly what the VAE encoder requires — with no ffmpeg pre-conversion needed.

New: `src/audio.h`

Single header providing read_audio(path, T_audio, n_channels):

Format detected by extension: .mp3 → dr_mp3, anything else → dr_wav
Linear resampler (audio_resample_linear) is channel-agnostic; only runs when sr ≠ 48000
Always returns interleaved stereo [T × 2] — mono input is upmixed (L = R), N-channel input uses the first two channels; *n_channels is always 2 on success
Uses free() to release dr_libs buffers (both dr_wav and dr_mp3 use the system allocator)
Returns malloc'd buffer; caller frees

New: `thirdparty/`

dr_wav.h v0.14.5 — WAV decode (public domain / MIT-0, mackron/dr_libs)
dr_mp3.h v0.7.3 — MP3 decode via minimp3 (public domain / MIT-0)

Zero new link-time dependencies — both are single-header, included once per translation unit via #define DR_*_IMPLEMENTATION inside audio.h.

Tool changes

neural-codec.cpp: encode path switches read_wav() → read_audio(); help text updated
dit-vae.cpp: --src-audio switches to read_audio(); help text updated
CMakeLists.txt: thirdparty/ added as SYSTEM include in the shared link_ggml_backends macro (vendor warnings suppressed)
No existing source files (vae-enc.h etc.) were modified

Example

# Before: required exact 48 kHz stereo WAV, manual ffmpeg conversion otherwise
# After: any of these just work
./dit-vae --src-audio reference.mp3   ...
./dit-vae --src-audio reference.wav   ...  # any sample rate, mono or stereo
./neural-codec --vae vae.gguf --encode -i song.mp3 -o song.latent

New example

examples/cover.sh + examples/cover.json — demonstrates cover-mode generation from a WAV or MP3 reference track with inline usage notes.

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

…ols and cover example Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>

Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>

Updated the cover script to remove LLM step and clarify audio processing.

Updated the cover JSON to reflect new attributes and changes.

resync with master

lmangani · 2026-03-08T13:11:59Z

@copilot review the following report. remember we should convert stereo->stereo and mono->stereo and NOT touch the acestep-cpp existing files which work when invoked manually.

Known issues in `src/audio.h` (pending upstream fix)

Bug: mono audio is not upmixed to stereo before encoding

read_audio() returns a native-channel-count buffer ([T x n_channels] floats),
but vae_enc_compute() in vae-enc.h always reads two channels:

// vae-enc.h (hardcodes stereo access — UB when n_channels == 1)
for (int c = 0; c < 2; c++) {
    for (int t = 0; t < T_audio; t++) {
        m->scratch_in[c * T_audio + t] = audio[t * 2 + c];
    }
}

For stereo inputs (most user uploads) this works correctly.
For mono inputs the second channel index reads out-of-bounds memory.

Required fix in src/audio.h — always return interleaved stereo [T x 2].
Add this block after resampling completes (before the final return out):

// Upmix mono -> stereo, or use first two channels of N-ch audio.
if ((int) channels != 2) {
    int    n_ch_src = (int) channels;
    float *stereo   = (float *) malloc((size_t) T_raw * 2 * sizeof(float));
    if (!stereo) {
        fprintf(stderr, "[Audio] Out of memory converting to stereo\n");
        free(out);
        return NULL;
    }
    for (int t = 0; t < T_raw; t++) {
        float L = out[(size_t) t * n_ch_src + 0];
        float R = (n_ch_src > 1) ? out[(size_t) t * n_ch_src + 1] : L;
        stereo[t * 2 + 0] = L;
        stereo[t * 2 + 1] = R;
    }
    free(out);
    out = stereo;
    fprintf(stderr, "[Audio] Converted %dch -> stereo\n", n_ch_src);
}
*n_channels = 2;

Also: replace drwav_free(raw, NULL) with free(raw) in the resampling
branch — both dr_wav and dr_mp3 use the system allocator by default, so
free() is always safe regardless of which decoder produced the buffer.

Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>

Copilot · 2026-03-08T13:15:30Z

@copilot review the following report. remember we should convert stereo->stereo and mono->stereo and NOT touch the acestep-cpp existing files which work when invoked manually.

##...

Fixed in 7ea3602.

read_audio() now always returns interleaved stereo [T * 2] and sets *n_channels = 2:

Mono → stereo: L = R = channel[0] (upmix after resampling, so both resampling and upmix work for mono at any rate)
N > 2 channels → stereo: uses first two channels only
drwav_free(raw, NULL) → free(raw): both dr_libs use the system allocator by default, so this is always correct

No existing files (vae-enc.h etc.) were touched.

resync with master

Initial plan

28857a5

Copilot AI assigned Copilot and lmangani Mar 7, 2026

Copilot started work on behalf of lmangani March 7, 2026 18:49 View session

Copilot AI and others added 2 commits March 7, 2026 19:02

Add WAV/MP3 input with 48kHz resampling: audio.h, dr_libs, updated to…

a891de0

…ols and cover example Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>

Remove build-ci/ artifacts from tracking, add build-*/ to .gitignore

2695913

Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add support for WAV and MP3 input conversion to 48kHz~~ Add WAV/MP3 input with automatic 48 kHz resampling Mar 7, 2026

Copilot finished work on behalf of lmangani March 7, 2026 19:06

lmangani added 3 commits March 7, 2026 20:08

Revise cover.sh to streamline audio processing steps

0d2999f

Updated the cover script to remove LLM step and clarify audio processing.

Modify cover.json with new caption and attributes

5fff90c

Updated the cover JSON to reflect new attributes and changes.

Merge pull request #16 from ServeurpersoCom/master

360e324

resync with master

Copilot started work on behalf of lmangani March 8, 2026 13:12 View session

audio.h: always return stereo [T*2], upmix mono, fix free(raw)

7ea3602

Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>

Copilot AI changed the title ~~Add WAV/MP3 input with automatic 48 kHz resampling~~ Add WAV/MP3 input with automatic 48 kHz resampling and stereo upmix Mar 8, 2026

Copilot finished work on behalf of lmangani March 8, 2026 13:16

Merge pull request #17 from ServeurpersoCom/master

54d868f

resync with master

lmangani closed this Mar 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add WAV/MP3 input with automatic 48 kHz resampling and stereo upmix#15

Add WAV/MP3 input with automatic 48 kHz resampling and stereo upmix#15
Copilot wants to merge 8 commits intomasterfrom
copilot/add-wav-mp3-conversion

Copilot AI commented Mar 7, 2026 •

edited

Loading

Uh oh!

lmangani commented Mar 8, 2026

Uh oh!

Copilot AI commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New: src/audio.h

New: thirdparty/

Tool changes

Example

New example

Uh oh!

lmangani commented Mar 8, 2026

Known issues in src/audio.h (pending upstream fix)

Uh oh!

Copilot AI commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 7, 2026 •

edited

Loading

New: `src/audio.h`

New: `thirdparty/`

Known issues in `src/audio.h` (pending upstream fix)