The error GGML_ASSERT(ggml_can_repeat(b, a)) failed occurs because the XL model’s architecture (e.g., higher hidden dimensions) creates tensor shapes that the current ace-synth binary cannot handle. While standard models fit the hardcoded memory/buffer assumptions, the XL model triggers a dimension mismatch during the dit_ggml_generate phase.
Key reasons:
Architectural Incompatibility: The software lacks support for the larger tensors/layers found in the XL model.
Backend Constraints: Your specific hardware (M1 Pro) may be hitting memory or graph-processing limitations triggered by the XL model’s complexity.
Recommended: Ensure your ace-synth binary is updated to a version that explicitly supports the acestep-v15-xl architecture.
Would you like me to help you draft a bug report to the developers to resolve this incompatibility?
log file:
=== Job _46tlwcd [3:11:32 PM] — FAILED ===
=== Job job_1779693092461_46tlwcd started — mode: cover ===
Request JSON: {
"caption": "pop music",
"lyrics": "[Verse 1]\nWe don't need no education\nWe don't need no thought control\nNo dark sarcasm in the classroom\nTeachers, leave them kids alone\n\n[Chorus]\nHey! Teacher! Leave them kids alone!\nAll in all, it's just another brick in the wall\nAll in all, you're just another brick in the wall\n\n[Verse 2]\nWe don't need no education\nWe don't need no thought control\nNo dark sarcasm in the classroom\nTeachers, leave them kids alone\n\n[Chorus]\nHey! Teacher! Leave them kids alone!\nAll in all, it's just another brick in the wall\nAll in all, you're just another brick in the wall\n\n[Guitar Solo]\n\n[Bridge]\n(Chorus: Hey! Teacher! Leave them kids alone!)\n(Chorus: Hey! Teacher! Leave them kids alone!)\n\n[Outro]\nJust another brick in the wall\nJust another brick in the wall",
"seed": -1,
"inference_steps": 12,
"guidance_scale": 9,
"shift": 3,
"audio_cover_strength": 0.5
}
--- Running ace-synth ---
$ /Applications/ACE-Step UI.app/Contents/Resources/bin/ace-synth --request /Users/xbaby/Music/ACEStep/_tmp_job_1779693092461_46tlwcd/request.json --text-encoder /Users/xbaby/Documents/AiModel/ACE/Qwen3-Embedding-0.6B-Q8_0.gguf --dit /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf --vae /Users/xbaby/Documents/AiModel/ACE/vae-BF16.gguf --src-audio /Users/xbaby/Music/ACEStep/reference-tracks/a8c136cd-8c75-4a1d-af17-f73e6ef5979f/1779625529334.mp3
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.024 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name: MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: has tensor = false
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: use fusion = true
ggml_metal_init: use concurrency = true
ggml_metal_init: use graph optimize = true
[Load] DiT backend: MTL0 (CPU threads: 5)
[Load] Backend init: 88.0 ms
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[DiT] Self-attn: Q+K fused, V separate
[DiT] Cross-attn: all separate
[DiT] MLP: gate+up fused
[Load] null_condition_emb found (CFG available)
[WeightCtx] Loaded 478 tensors, 1770.1 MB into backend
[Load] DiT: 24 layers, H=2048, Nh=16/8, D=128
[Load] DiT weight load: 3595.8 ms
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[Load] silence_latent: [15000, 64] from GGUF
[GGUF] /Users/xbaby/Documents/AiModel/ACE/vae-BF16.gguf: 365 tensors, data at offset 30048
[Load] VAE backend: MTL0 (shared)
[VAE] Backend: MTL0, Weight buffer: 161.1 MB
[VAE] Loaded: 5 blocks, upsample=1920x, F32 activations
[Load] VAE weights: 937.1 ms
[BPE] Loaded from GGUF: 151643 vocab, 151387 merges
[Load] BPE tokenizer: 56.9 ms
[Load] TextEncoder backend: MTL0 (shared)
[GGUF] /Users/xbaby/Documents/AiModel/ACE/Qwen3-Embedding-0.6B-Q8_0.gguf: 310 tensors, data at offset 5337664
[Load] TextEncoder: 28L, H=1024, Nh=16/8
[Qwen3] Attn: Q+K+V fused
[Qwen3] MLP: gate+up fused
[WeightCtx] Loaded 310 tensors, 742.7 MB into backend
[Load] TextEncoder: 2326.8 ms
[Load] CondEncoder backend: MTL0 (shared)
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[Load] LyricEncoder: 8L
[Qwen3] Attn: Q+K fused, V separate
[Qwen3] MLP: gate+up fused
[Load] TimbreEncoder: 4L
[Qwen3] Attn: Q+K fused, V separate
[Qwen3] MLP: gate+up fused
[WeightCtx] Loaded 140 tensors, 359.7 MB into backend
[Load] CondEncoder: lyric(8L), timbre(4L), text_proj, null_cond
[Load] ConditionEncoder: 1378.5 ms
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[WeightCtx] Loaded 30 tensors, 64.7 MB into backend
[Load] Detokenizer: FSQ(6->2048) + 2L encoder(S=5, 2048->64)
[Load] Detokenizer: 213.3 ms
[Ace-Synth] All models loaded, turbo=yes
[MP3] read /Users/xbaby/Music/ACEStep/reference-tracks/a8c136cd-8c75-4a1d-af17-f73e6ef5979f/1779625529334.mp3: 9501696 samples, 48000 Hz, 2 ch
[Cover] Source audio: 197.95s @ 48kHz
[Request 1/1] /Users/xbaby/Music/ACEStep/_tmp_job_1779693092461_46tlwcd/request.json (batch=1)
[Request] parsed /Users/xbaby/Music/ACEStep/_tmp_job_1779693092461_46tlwcd/request.json (7 fields)
[Request] seed=-1
[Request] caption: pop music
[Request] lyrics: 750 bytes
[Request] bpm=0 dur=0 key= ts= lang=
[Request] lm: temp=0.85 cfg=2.0 top_p=0.90 top_k=0
[Request] dit: steps=12 guidance=9.0 shift=3.0
[Request] audio_codes: (none)
[GGUF] /Users/xbaby/Documents/AiModel/ACE/vae-BF16.gguf: 365 tensors, data at offset 30048
[Load] VAE-Enc backend: MTL0 (shared)
[VAE-Enc] Backend: MTL0, Weight buffer: 160.8 MB
[VAE-Enc] Loaded: 5 blocks, downsample=1920x, F32 activations
[VAE-Enc] Tiled encode: 39 tiles (chunk=491520, overlap=122880, stride=245760 audio samples)
[VAE-Enc] Graph: 347 nodes, T_audio=368640
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_im2col_1d_f16', name = 'kernel_im2col_1d_f16'
ggml_metal_library_compile_pipeline: loaded kernel_im2col_1d_f16 0x100a31430 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f16_f16_short', name = 'kernel_mul_mv_f16_f16_short_nsg=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f16_f16_short_nsg=1 0x100a31c30 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_bin_fuse_f32_f32_f32', name = 'kernel_bin_fuse_f32_f32_f32_op=0_nf=1_rb=0'
ggml_metal_library_compile_pipeline: loaded kernel_bin_fuse_f32_f32_f32_op=0_nf=1_rb=0 0x100a32730 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_snake_f32', name = 'kernel_snake_f32'
ggml_metal_library_compile_pipeline: loaded kernel_snake_f32 0x100a32f30 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f16', name = 'kernel_mul_mm_f16_f16_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f16_bci=0_bco=0 0x100a33a30 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_bin_fuse_f32_f32_f32_4', name = 'kernel_bin_fuse_f32_f32_f32_4_op=0_nf=1_rb=0'
ggml_metal_library_compile_pipeline: loaded kernel_bin_fuse_f32_f32_f32_4_op=0_nf=1_rb=0 0x100a34030 | th_max = 1024 | th_width = 32
[VAE-Enc] Downsample factor: 0.000521 (expected ~1/1920)
[VAE-Enc] Graph: 347 nodes, T_audio=491520
[VAE-Enc] Graph: 347 nodes, T_audio=285696
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f16', name = 'kernel_mul_mm_f16_f16_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f16_bci=0_bco=1 0x100a2fbb0 | th_max = 896 | th_width = 32
[VAE-Enc] Tiled encode done: 39 tiles -> T_latent=4948 (197.95s @ 48kHz)
[Cover] Encoded: T_cover=4948 (197.92s), 71335.5 ms
[Pipeline] WARNING: turbo model, forcing guidance_scale=1.0 (was 9.0)
[Pipeline] T=4948, S=2474
[Pipeline] seed=7507090758389762514, steps=12, guidance=1.0, shift=3.0, duration=197.9s
[Pipeline] caption: 56 tokens, lyrics: 217 tokens
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_bf16', name = 'kernel_get_rows_bf16'
ggml_metal_library_compile_pipeline: loaded kernel_get_rows_bf16 0xa65288000 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rms_norm_mul_f32_4', name = 'kernel_rms_norm_mul_f32_4'
ggml_metal_library_compile_pipeline: loaded kernel_rms_norm_mul_f32_4 0xa65288300 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q8_0_f32', name = 'kernel_mul_mm_q8_0_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q8_0_f32_bci=0_bco=1 0xa65288600 | th_max = 896 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32'
ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32 0xa65288900 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32_imrope=0'
ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32_imrope=0 0xa65288c00 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_pad', name = 'kernel_flash_attn_ext_pad_mask=1_ncpsg=64'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_pad_mask=1_ncpsg=64 0xa65288f00 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_blk', name = 'kernel_flash_attn_ext_blk_nqptg=8_ncpsg=64'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_blk_nqptg=8_ncpsg=64 0xa65289200 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_f32_dk128_dv128', name = 'kernel_flash_attn_ext_f32_dk128_dv128_mask=1_sinks=0_bias=0_scap=0_kvpad=1_bcm=0_ns10=1024_ns20=1024_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_f32_dk128_dv128_mask=1_sinks=0_bias=0_scap=0_kvpad=1_bcm=0_ns10=1024_ns20=1024_nsg=4 0xa65289500 | th_max = 512 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_swiglu_f32', name = 'kernel_swiglu_f32'
ggml_metal_library_compile_pipeline: loaded kernel_swiglu_f32 0xa65289800 | th_max = 1024 | th_width = 32
[Encode] TextEncoder (56 tokens): 230.8 ms
[Encode] Lyric vocab lookup (217 tokens): 1.7 ms
[Timbre] Using source latents (750 frames, 30.0s)
[CondEnc] Lyric sliding mask: 217x217, window=128
[CondEnc] Timbre sliding mask: 750x750, window=128
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q6_K_f32', name = 'kernel_mul_mm_q6_K_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q6_K_f32_bci=0_bco=1 0xa65289b00 | th_max = 896 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q4_K_f32', name = 'kernel_mul_mm_q4_K_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q4_K_f32_bci=0_bco=1 0xa65289e00 | th_max = 896 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_f32_dk128_dv128', name = 'kernel_flash_attn_ext_f32_dk128_dv128_mask=1_sinks=0_bias=0_scap=0_kvpad=1_bcm=1_ns10=1024_ns20=1024_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_f32_dk128_dv128_mask=1_sinks=0_bias=0_scap=0_kvpad=1_bcm=1_ns10=1024_ns20=1024_nsg=4 0xa6528a100 | th_max = 512 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_pad', name = 'kernel_flash_attn_ext_pad_mask=0_ncpsg=64'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_pad_mask=0_ncpsg=64 0xa6528a400 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_f32_dk128_dv128', name = 'kernel_flash_attn_ext_f32_dk128_dv128_mask=0_sinks=0_bias=0_scap=0_kvpad=1_bcm=0_ns10=1024_ns20=1024_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_f32_dk128_dv128_mask=0_sinks=0_bias=0_scap=0_kvpad=1_bcm=0_ns10=1024_ns20=1024_nsg=4 0xa6528a700 | th_max = 576 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_bf16_f32', name = 'kernel_mul_mm_bf16_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_bf16_f32_bci=0_bco=1 0xa6528aa00 | th_max = 1024 | th_width = 32
[Encode] Packed: lyric=217 + timbre=1 + text=56 = 274 tokens
[Encode] ConditionEncoder: 628.4 ms, enc_S=274
[Cover] audio_cover_strength=0.50 -> switch at step 6/12
[Context Batch0] Philox noise seed=7507090758389762514, [4948, 64]
[DiT] Starting: T=4948, S=2474, enc_S=274, steps=12, batch=1 (cover)
[DiT] Batch N=1, T=4948, S=2474, enc_S=274
/Users/runner/work/acestep.cpp/acestep.cpp/ggml/src/ggml.c:1989: GGML_ASSERT(ggml_can_repeat(b, a)) failed
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: ggml-org/llama.cpp#17869
0 libggml-base.0.9.7.dylib 0x00000001004cd36c ggml_print_backtrace + 276
1 libggml-base.0.9.7.dylib 0x00000001004cd558 ggml_abort + 156
2 libggml-base.0.9.7.dylib 0x00000001004cf86c ggml_add_inplace + 0
3 ace-synth 0x0000000100107cb4 _ZL17dit_ggml_generateP7DiTGGMLPKfS2_S2_iiiiS2_PffPK11DebugDumperS2_i + 1004
4 ace-synth 0x0000000100102e64 _Z18ace_synth_generateP8AceSynthPK10AceRequestPKfiiP8AceAudio + 19368
5 ace-synth 0x00000001000e4bc0 main + 3500
6 dyld 0x000000018903be00 start + 6992
=== Job job_1779693092461_46tlwcd FAILED: ace-synth failed: ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.024 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name: MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: has tensor = false
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: use fusion = true
ggml_metal_init: use concurrency = true
ggml_metal_init: use graph optimize = true
[Load] DiT backend: MTL0 (CPU threads: 5)
[Load] Backend init: 88.0 ms
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[DiT] Self-attn: Q+K fused, V separate
[DiT] Cross-attn: all separate
[DiT] MLP: gate+up fused
[Load] null_condition_emb found (CFG available)
[WeightCtx] Loaded 478 tensors, 1770.1 MB into backend
[Load] DiT: 24 layers, H=2048, Nh=16/8, D=128
[Load] DiT weight load: 3595.8 ms
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[Load] silence_latent: [15000, 64] from GGUF
[GGUF] /Users/xbaby/Documents/AiModel/ACE/vae-BF16.gguf: 365 tensors, data at offset 30048
[Load] VAE backend: MTL0 (shared)
[VAE] Backend: MTL0, Weight buffer: 161.1 MB
[VAE] Loaded: 5 blocks, upsample=1920
The error GGML_ASSERT(ggml_can_repeat(b, a)) failed occurs because the XL model’s architecture (e.g., higher hidden dimensions) creates tensor shapes that the current ace-synth binary cannot handle. While standard models fit the hardcoded memory/buffer assumptions, the XL model triggers a dimension mismatch during the dit_ggml_generate phase.
Key reasons:
Architectural Incompatibility: The software lacks support for the larger tensors/layers found in the XL model.
Backend Constraints: Your specific hardware (M1 Pro) may be hitting memory or graph-processing limitations triggered by the XL model’s complexity.
Recommended: Ensure your ace-synth binary is updated to a version that explicitly supports the acestep-v15-xl architecture.
Would you like me to help you draft a bug report to the developers to resolve this incompatibility?
log file:
=== Job _46tlwcd [3:11:32 PM] — FAILED ===
=== Job job_1779693092461_46tlwcd started — mode: cover ===
Request JSON: {
"caption": "pop music",
"lyrics": "[Verse 1]\nWe don't need no education\nWe don't need no thought control\nNo dark sarcasm in the classroom\nTeachers, leave them kids alone\n\n[Chorus]\nHey! Teacher! Leave them kids alone!\nAll in all, it's just another brick in the wall\nAll in all, you're just another brick in the wall\n\n[Verse 2]\nWe don't need no education\nWe don't need no thought control\nNo dark sarcasm in the classroom\nTeachers, leave them kids alone\n\n[Chorus]\nHey! Teacher! Leave them kids alone!\nAll in all, it's just another brick in the wall\nAll in all, you're just another brick in the wall\n\n[Guitar Solo]\n\n[Bridge]\n(Chorus: Hey! Teacher! Leave them kids alone!)\n(Chorus: Hey! Teacher! Leave them kids alone!)\n\n[Outro]\nJust another brick in the wall\nJust another brick in the wall",
"seed": -1,
"inference_steps": 12,
"guidance_scale": 9,
"shift": 3,
"audio_cover_strength": 0.5
}
--- Running ace-synth ---
$ /Applications/ACE-Step UI.app/Contents/Resources/bin/ace-synth --request /Users/xbaby/Music/ACEStep/_tmp_job_1779693092461_46tlwcd/request.json --text-encoder /Users/xbaby/Documents/AiModel/ACE/Qwen3-Embedding-0.6B-Q8_0.gguf --dit /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf --vae /Users/xbaby/Documents/AiModel/ACE/vae-BF16.gguf --src-audio /Users/xbaby/Music/ACEStep/reference-tracks/a8c136cd-8c75-4a1d-af17-f73e6ef5979f/1779625529334.mp3
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.024 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name: MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: has tensor = false
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: use fusion = true
ggml_metal_init: use concurrency = true
ggml_metal_init: use graph optimize = true
[Load] DiT backend: MTL0 (CPU threads: 5)
[Load] Backend init: 88.0 ms
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[DiT] Self-attn: Q+K fused, V separate
[DiT] Cross-attn: all separate
[DiT] MLP: gate+up fused
[Load] null_condition_emb found (CFG available)
[WeightCtx] Loaded 478 tensors, 1770.1 MB into backend
[Load] DiT: 24 layers, H=2048, Nh=16/8, D=128
[Load] DiT weight load: 3595.8 ms
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[Load] silence_latent: [15000, 64] from GGUF
[GGUF] /Users/xbaby/Documents/AiModel/ACE/vae-BF16.gguf: 365 tensors, data at offset 30048
[Load] VAE backend: MTL0 (shared)
[VAE] Backend: MTL0, Weight buffer: 161.1 MB
[VAE] Loaded: 5 blocks, upsample=1920x, F32 activations
[Load] VAE weights: 937.1 ms
[BPE] Loaded from GGUF: 151643 vocab, 151387 merges
[Load] BPE tokenizer: 56.9 ms
[Load] TextEncoder backend: MTL0 (shared)
[GGUF] /Users/xbaby/Documents/AiModel/ACE/Qwen3-Embedding-0.6B-Q8_0.gguf: 310 tensors, data at offset 5337664
[Load] TextEncoder: 28L, H=1024, Nh=16/8
[Qwen3] Attn: Q+K+V fused
[Qwen3] MLP: gate+up fused
[WeightCtx] Loaded 310 tensors, 742.7 MB into backend
[Load] TextEncoder: 2326.8 ms
[Load] CondEncoder backend: MTL0 (shared)
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[Load] LyricEncoder: 8L
[Qwen3] Attn: Q+K fused, V separate
[Qwen3] MLP: gate+up fused
[Load] TimbreEncoder: 4L
[Qwen3] Attn: Q+K fused, V separate
[Qwen3] MLP: gate+up fused
[WeightCtx] Loaded 140 tensors, 359.7 MB into backend
[Load] CondEncoder: lyric(8L), timbre(4L), text_proj, null_cond
[Load] ConditionEncoder: 1378.5 ms
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[WeightCtx] Loaded 30 tensors, 64.7 MB into backend
[Load] Detokenizer: FSQ(6->2048) + 2L encoder(S=5, 2048->64)
[Load] Detokenizer: 213.3 ms
[Ace-Synth] All models loaded, turbo=yes
[MP3] read /Users/xbaby/Music/ACEStep/reference-tracks/a8c136cd-8c75-4a1d-af17-f73e6ef5979f/1779625529334.mp3: 9501696 samples, 48000 Hz, 2 ch
[Cover] Source audio: 197.95s @ 48kHz
[Request 1/1] /Users/xbaby/Music/ACEStep/_tmp_job_1779693092461_46tlwcd/request.json (batch=1)
[Request] parsed /Users/xbaby/Music/ACEStep/_tmp_job_1779693092461_46tlwcd/request.json (7 fields)
[Request] seed=-1
[Request] caption: pop music
[Request] lyrics: 750 bytes
[Request] bpm=0 dur=0 key= ts= lang=
[Request] lm: temp=0.85 cfg=2.0 top_p=0.90 top_k=0
[Request] dit: steps=12 guidance=9.0 shift=3.0
[Request] audio_codes: (none)
[GGUF] /Users/xbaby/Documents/AiModel/ACE/vae-BF16.gguf: 365 tensors, data at offset 30048
[Load] VAE-Enc backend: MTL0 (shared)
[VAE-Enc] Backend: MTL0, Weight buffer: 160.8 MB
[VAE-Enc] Loaded: 5 blocks, downsample=1920x, F32 activations
[VAE-Enc] Tiled encode: 39 tiles (chunk=491520, overlap=122880, stride=245760 audio samples)
[VAE-Enc] Graph: 347 nodes, T_audio=368640
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_im2col_1d_f16', name = 'kernel_im2col_1d_f16'
ggml_metal_library_compile_pipeline: loaded kernel_im2col_1d_f16 0x100a31430 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mv_f16_f16_short', name = 'kernel_mul_mv_f16_f16_short_nsg=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mv_f16_f16_short_nsg=1 0x100a31c30 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_bin_fuse_f32_f32_f32', name = 'kernel_bin_fuse_f32_f32_f32_op=0_nf=1_rb=0'
ggml_metal_library_compile_pipeline: loaded kernel_bin_fuse_f32_f32_f32_op=0_nf=1_rb=0 0x100a32730 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_snake_f32', name = 'kernel_snake_f32'
ggml_metal_library_compile_pipeline: loaded kernel_snake_f32 0x100a32f30 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f16', name = 'kernel_mul_mm_f16_f16_bci=0_bco=0'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f16_bci=0_bco=0 0x100a33a30 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_bin_fuse_f32_f32_f32_4', name = 'kernel_bin_fuse_f32_f32_f32_4_op=0_nf=1_rb=0'
ggml_metal_library_compile_pipeline: loaded kernel_bin_fuse_f32_f32_f32_4_op=0_nf=1_rb=0 0x100a34030 | th_max = 1024 | th_width = 32
[VAE-Enc] Downsample factor: 0.000521 (expected ~1/1920)
[VAE-Enc] Graph: 347 nodes, T_audio=491520
[VAE-Enc] Graph: 347 nodes, T_audio=285696
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f16_f16', name = 'kernel_mul_mm_f16_f16_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f16_f16_bci=0_bco=1 0x100a2fbb0 | th_max = 896 | th_width = 32
[VAE-Enc] Tiled encode done: 39 tiles -> T_latent=4948 (197.95s @ 48kHz)
[Cover] Encoded: T_cover=4948 (197.92s), 71335.5 ms
[Pipeline] WARNING: turbo model, forcing guidance_scale=1.0 (was 9.0)
[Pipeline] T=4948, S=2474
[Pipeline] seed=7507090758389762514, steps=12, guidance=1.0, shift=3.0, duration=197.9s
[Pipeline] caption: 56 tokens, lyrics: 217 tokens
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_bf16', name = 'kernel_get_rows_bf16'
ggml_metal_library_compile_pipeline: loaded kernel_get_rows_bf16 0xa65288000 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rms_norm_mul_f32_4', name = 'kernel_rms_norm_mul_f32_4'
ggml_metal_library_compile_pipeline: loaded kernel_rms_norm_mul_f32_4 0xa65288300 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q8_0_f32', name = 'kernel_mul_mm_q8_0_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q8_0_f32_bci=0_bco=1 0xa65288600 | th_max = 896 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32'
ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32 0xa65288900 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32_imrope=0'
ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32_imrope=0 0xa65288c00 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_pad', name = 'kernel_flash_attn_ext_pad_mask=1_ncpsg=64'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_pad_mask=1_ncpsg=64 0xa65288f00 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_blk', name = 'kernel_flash_attn_ext_blk_nqptg=8_ncpsg=64'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_blk_nqptg=8_ncpsg=64 0xa65289200 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_f32_dk128_dv128', name = 'kernel_flash_attn_ext_f32_dk128_dv128_mask=1_sinks=0_bias=0_scap=0_kvpad=1_bcm=0_ns10=1024_ns20=1024_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_f32_dk128_dv128_mask=1_sinks=0_bias=0_scap=0_kvpad=1_bcm=0_ns10=1024_ns20=1024_nsg=4 0xa65289500 | th_max = 512 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_swiglu_f32', name = 'kernel_swiglu_f32'
ggml_metal_library_compile_pipeline: loaded kernel_swiglu_f32 0xa65289800 | th_max = 1024 | th_width = 32
[Encode] TextEncoder (56 tokens): 230.8 ms
[Encode] Lyric vocab lookup (217 tokens): 1.7 ms
[Timbre] Using source latents (750 frames, 30.0s)
[CondEnc] Lyric sliding mask: 217x217, window=128
[CondEnc] Timbre sliding mask: 750x750, window=128
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q6_K_f32', name = 'kernel_mul_mm_q6_K_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q6_K_f32_bci=0_bco=1 0xa65289b00 | th_max = 896 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q4_K_f32', name = 'kernel_mul_mm_q4_K_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q4_K_f32_bci=0_bco=1 0xa65289e00 | th_max = 896 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_f32_dk128_dv128', name = 'kernel_flash_attn_ext_f32_dk128_dv128_mask=1_sinks=0_bias=0_scap=0_kvpad=1_bcm=1_ns10=1024_ns20=1024_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_f32_dk128_dv128_mask=1_sinks=0_bias=0_scap=0_kvpad=1_bcm=1_ns10=1024_ns20=1024_nsg=4 0xa6528a100 | th_max = 512 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_pad', name = 'kernel_flash_attn_ext_pad_mask=0_ncpsg=64'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_pad_mask=0_ncpsg=64 0xa6528a400 | th_max = 1024 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_flash_attn_ext_f32_dk128_dv128', name = 'kernel_flash_attn_ext_f32_dk128_dv128_mask=0_sinks=0_bias=0_scap=0_kvpad=1_bcm=0_ns10=1024_ns20=1024_nsg=4'
ggml_metal_library_compile_pipeline: loaded kernel_flash_attn_ext_f32_dk128_dv128_mask=0_sinks=0_bias=0_scap=0_kvpad=1_bcm=0_ns10=1024_ns20=1024_nsg=4 0xa6528a700 | th_max = 576 | th_width = 32
ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_bf16_f32', name = 'kernel_mul_mm_bf16_f32_bci=0_bco=1'
ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_bf16_f32_bci=0_bco=1 0xa6528aa00 | th_max = 1024 | th_width = 32
[Encode] Packed: lyric=217 + timbre=1 + text=56 = 274 tokens
[Encode] ConditionEncoder: 628.4 ms, enc_S=274
[Cover] audio_cover_strength=0.50 -> switch at step 6/12
[Context Batch0] Philox noise seed=7507090758389762514, [4948, 64]
[DiT] Starting: T=4948, S=2474, enc_S=274, steps=12, batch=1 (cover)
[DiT] Batch N=1, T=4948, S=2474, enc_S=274
/Users/runner/work/acestep.cpp/acestep.cpp/ggml/src/ggml.c:1989: GGML_ASSERT(ggml_can_repeat(b, a)) failed
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: ggml-org/llama.cpp#17869
0 libggml-base.0.9.7.dylib 0x00000001004cd36c ggml_print_backtrace + 276
1 libggml-base.0.9.7.dylib 0x00000001004cd558 ggml_abort + 156
2 libggml-base.0.9.7.dylib 0x00000001004cf86c ggml_add_inplace + 0
3 ace-synth 0x0000000100107cb4 _ZL17dit_ggml_generateP7DiTGGMLPKfS2_S2_iiiiS2_PffPK11DebugDumperS2_i + 1004
4 ace-synth 0x0000000100102e64 _Z18ace_synth_generateP8AceSynthPK10AceRequestPKfiiP8AceAudio + 19368
5 ace-synth 0x00000001000e4bc0 main + 3500
6 dyld 0x000000018903be00 start + 6992
=== Job job_1779693092461_46tlwcd FAILED: ace-synth failed: ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.024 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name: MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: has tensor = false
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: use fusion = true
ggml_metal_init: use concurrency = true
ggml_metal_init: use graph optimize = true
[Load] DiT backend: MTL0 (CPU threads: 5)
[Load] Backend init: 88.0 ms
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[DiT] Self-attn: Q+K fused, V separate
[DiT] Cross-attn: all separate
[DiT] MLP: gate+up fused
[Load] null_condition_emb found (CFG available)
[WeightCtx] Loaded 478 tensors, 1770.1 MB into backend
[Load] DiT: 24 layers, H=2048, Nh=16/8, D=128
[Load] DiT weight load: 3595.8 ms
[GGUF] /Users/xbaby/Documents/AiModel/ACE/acestep-v15-xl-turbo-Q4_K_M.gguf: 830 tensors, data at offset 69088
[Load] silence_latent: [15000, 64] from GGUF
[GGUF] /Users/xbaby/Documents/AiModel/ACE/vae-BF16.gguf: 365 tensors, data at offset 30048
[Load] VAE backend: MTL0 (shared)
[VAE] Backend: MTL0, Weight buffer: 161.1 MB
[VAE] Loaded: 5 blocks, upsample=1920