Skip to content

{2025.06}[2025b] GROMACS 2025.4 with CUDA-12.9.1#1482

Draft
bedroge wants to merge 2 commits intoEESSI:mainfrom
bedroge:gromacs_2025.3_cuda
Draft

{2025.06}[2025b] GROMACS 2025.4 with CUDA-12.9.1#1482
bedroge wants to merge 2 commits intoEESSI:mainfrom
bedroge:gromacs_2025.3_cuda

Conversation

@bedroge
Copy link
Copy Markdown
Collaborator

@bedroge bedroge commented Apr 22, 2026

Requires:

9 out of 87 required modules missing:

* Catch2/2.13.10-GCCcore-14.3.0 (Catch2-2.13.10-GCCcore-14.3.0.eb)
* gfbf/2025b (gfbf-2025b.eb)
* hypothesis/6.136.6-GCCcore-14.3.0 (hypothesis-6.136.6-GCCcore-14.3.0.eb)
* spin/0.14-GCCcore-14.3.0 (spin-0.14-GCCcore-14.3.0.eb)
* pybind11/3.0.0-GCC-14.3.0 (pybind11-3.0.0-GCC-14.3.0.eb)
* SciPy-bundle/2025.07-gfbf-2025b (SciPy-bundle-2025.07-gfbf-2025b.eb)
* networkx/3.5-gfbf-2025b (networkx-3.5-gfbf-2025b.eb)
* mpi4py/4.1.0-gompi-2025b (mpi4py-4.1.0-gompi-2025b.eb)
* GROMACS/2025.3-foss-2025b-CUDA-12.9.1 (GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb)

@bedroge bedroge added accel:nvidia 2025.06-software.eessi.io 2025.06 version of software.eessi.io labels Apr 22, 2026
@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Apr 23, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/amd/zen5,accel=nvidia/cc120

@eessi-bot-rug
Copy link
Copy Markdown

eessi-bot-rug Bot commented Apr 23, 2026

New job on instance eessi-bot-rug for repository eessi.io-2025.06-software
Building on: amd-zen5 and accelerator nvidia/cc120
Building for: x86_64/amd/zen5 and accelerator nvidia/cc120
Job dir: /scratch/hb-eessibot/SHARED/jobs/2026.04/pr_1482/28614014

date job status comment
Apr 23 08:55:57 UTC 2026 submitted job id 28614014 awaits release by job manager
Apr 23 08:56:57 UTC 2026 released job awaits launch by Slurm scheduler
Apr 23 09:19:02 UTC 2026 running job 28614014 is running
Apr 23 09:49:33 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-28614014.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen5-accel-nvidia-cc120-17769376410.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120
no other files in tarball
Apr 23 09:49:33 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /b88eedf0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /8c8bf48b @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /6d7a17a9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /e5a16ba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /634d019c @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /e9b09ad8 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /b1ea69c1 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /a317b8da @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /a102bba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /7bd54429 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /84994f87 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /d58e51e9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-28614014.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Apr 23, 2026

The build succeeded, but it fails in the CUDA sanity check:

== 2026-04-23 11:47:13,762 easyblock.py:3849 INFO CUDA sanity check detailed report:
12 files missing one or more CUDA compute capabilities:
  lib/libgromacs.so.10.0.0
  lib/libgromacs.so.10
  lib/libgromacs.so
  lib/libgromacs_mpi.so.10.0.0
  lib/libgromacs_mpi.so.10
  lib/libgromacs_mpi.so
  lib64/libgromacs.so.10.0.0
  lib64/libgromacs.so.10
  lib64/libgromacs.so
  lib64/libgromacs_mpi.so.10.0.0
  lib64/libgromacs_mpi.so.10
  lib64/libgromacs_mpi.so
12 files with device code for more CUDA Compute Capabilities than requested:
  lib/libgromacs.so.10.0.0
  lib/libgromacs.so.10
  lib/libgromacs.so
  lib/libgromacs_mpi.so.10.0.0
  lib/libgromacs_mpi.so.10
  lib/libgromacs_mpi.so
  lib64/libgromacs.so.10.0.0
  lib64/libgromacs.so.10
  lib64/libgromacs.so
  lib64/libgromacs_mpi.so.10.0.0
  lib64/libgromacs_mpi.so.10
  lib64/libgromacs_mpi.so
12 files missing PTX code for the highest configured CUDA Compute Capability:
  lib/libgromacs.so.10.0.0
  lib/libgromacs.so.10
  lib/libgromacs.so
  lib/libgromacs_mpi.so.10.0.0
  lib/libgromacs_mpi.so.10
  lib/libgromacs_mpi.so
  lib64/libgromacs.so.10.0.0
  lib64/libgromacs.so.10
  lib64/libgromacs.so
  lib64/libgromacs_mpi.so.10.0.0
  lib64/libgromacs_mpi.so.10
  lib64/libgromacs_mpi.so

I guess it may be related to the 120f that we're using, as the binaries do seem to have support for sm_120:

Fatbin elf code:
================
arch = sm_120

@bedroge bedroge changed the title {2025.06}[2025b] GROMACS 2025.3 with CUDA-12.9.1 {2025.06}[2025b] GROMACS 2025.4 with CUDA-12.9.1 Apr 24, 2026
@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Apr 24, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/amd/zen5,accel=nvidia/cc120

@eessi-bot-rug
Copy link
Copy Markdown

eessi-bot-rug Bot commented Apr 24, 2026

New job on instance eessi-bot-rug for repository eessi.io-2025.06-software
Building on: amd-zen5 and accelerator nvidia/cc120
Building for: x86_64/amd/zen5 and accelerator nvidia/cc120
Job dir: /scratch/hb-eessibot/SHARED/jobs/2026.04/pr_1482/28630885

date job status comment
Apr 24 07:43:34 UTC 2026 submitted job id 28630885 awaits release by job manager
Apr 24 07:44:50 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 07:46:53 UTC 2026 running job 28630885 is running
Apr 24 08:17:23 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-28630885.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen5-accel-nvidia-cc120-17770184860.tar.zstsize: 32 MiB (33916242 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_081438UTC
other under 2025.06/software/linux/x86_64/amd/zen5/accel/nvidia/cc120
no other files in tarball
Apr 24 08:17:23 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /b88eedf0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /8c8bf48b @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /6d7a17a9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node %device_type=gpu /e5a16ba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node %device_type=gpu /634d019c @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node %device_type=gpu /e9b09ad8 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /b1ea69c1 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /a317b8da @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /a102bba0 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_node /7bd54429 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_node /84994f87 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_node /d58e51e9 @BotBuildTests:gpu_rtx_pro_6000+default [Skipping GPU test : only 1 GPU available for this test case]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-28630885.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Apr 24, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/intel/cascadelake,accel=nvidia/cc70
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/amd/zen3,accel=nvidia/cc80
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc90
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-rug for:arch=x86_64/intel/skylake_avx512,accel=nvidia/cc70

@eessi-bot-surf
Copy link
Copy Markdown

eessi-bot-surf Bot commented Apr 24, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.04/pr_1482/22223686

date job status comment
Apr 24 11:43:06 UTC 2026 submitted job id 22223686 will be eligible to start in about 20 seconds
Apr 24 11:43:20 UTC 2026 received job awaits launch by Slurm scheduler
Apr 24 11:45:00 UTC 2026 running job 22223686 is running
Apr 24 13:01:50 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-22223686.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-icelake-accel-nvidia-cc80-17770356230.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80
no other files in tarball
Apr 24 13:01:50 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node %device_type=gpu /15d6e239 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node %device_type=gpu /5471f15a @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node %device_type=gpu /1dc400ef @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node %device_type=gpu /9715dde6 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node /ed938ed4 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node /8d24cea9 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node /946648aa @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node /9eb3f1e9 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-22223686.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-surf
Copy link
Copy Markdown

eessi-bot-surf Bot commented Apr 24, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.04/pr_1482/22223689

date job status comment
Apr 24 11:43:12 UTC 2026 submitted job id 22223689 will be eligible to start in about 20 seconds
Apr 24 11:43:24 UTC 2026 received job awaits launch by Slurm scheduler
Apr 24 11:43:47 UTC 2026 running job 22223689 is running
Apr 25 11:44:10 UTC 2026 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job22223689.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Apr 25 11:44:10 UTC 2026 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job22223689.test does not exist in job directory, or parsing it failed.

@gpu-bot-ugent
Copy link
Copy Markdown

gpu-bot-ugent Bot commented Apr 24, 2026

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: intel-cascadelake and accelerator nvidia/cc70
Building for: x86_64/intel/cascadelake and accelerator nvidia/cc70
Job dir: /scratch/gent/vo/002/gvo00211/SHARED/jobs/2026.04/pr_1482/40819786

date job status comment
Apr 24 11:43:12 UTC 2026 submitted job id 40819786 awaits release by job manager
Apr 24 11:44:50 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 11:46:54 UTC 2026 running job 40819786 is running
Apr 24 13:09:46 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-40819786.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-cascadelake-accel-nvidia-cc70-17770361340.tar.zstsize: 31 MiB (32681382 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_130833UTC
other under 2025.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70
no other files in tarball
Apr 24 13:09:46 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-40819786.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-jsc
Copy link
Copy Markdown

eessi-bot-jsc Bot commented Apr 24, 2026

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace and accelerator nvidia/cc90
Building for: aarch64/nvidia/grace and accelerator nvidia/cc90
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.04/pr_1482/14684643

date job status comment
Apr 24 11:43:15 UTC 2026 submitted job id 14684643 awaits release by job manager
Apr 24 11:44:07 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 11:45:10 UTC 2026 running job 14684643 is running
Apr 24 12:39:58 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-14684643.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc90-17770337320.tar.gzsize: 32 MiB (34349669 bytes)
entries: 760
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_122736UTC
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90
no other files in tarball
Apr 24 12:39:58 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 18/30 test case(s) from 30 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14684643.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-rug
Copy link
Copy Markdown

eessi-bot-rug Bot commented Apr 24, 2026

New job on instance eessi-bot-rug for repository eessi.io-2025.06-software
Building on: intel-skylake_avx512 and accelerator nvidia/cc70
Building for: x86_64/intel/skylake_avx512 and accelerator nvidia/cc70
Job dir: /scratch/hb-eessibot/SHARED/jobs/2026.04/pr_1482/28636437

date job status comment
Apr 24 11:43:16 UTC 2026 submitted job id 28636437 awaits release by job manager
Apr 24 11:44:00 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 12:08:04 UTC 2026 running job 28636437 is running
Apr 24 13:09:06 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-28636437.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-skylake_avx512-accel-nvidia-cc70-17770359480.tar.zstsize: 31 MiB (32689771 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_130523UTC
other under 2025.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc70
no other files in tarball
Apr 24 13:09:06 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node %device_type=gpu /495ccd0c @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node %device_type=gpu /61fda20d @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node %device_type=gpu /e3d4ae3b @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node %device_type=gpu /ce7fe725 @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node %device_type=gpu /5c339fc9 @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node %device_type=gpu /b4bd1071 @BotBuildTests:gpu_v100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node /c3881e1d @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node /5f02f86c @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node /530b49da @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_2_node /f49f730d @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_2_node /c412ac42 @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_2_node /18861056 @BotBuildTests:gpu_v100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-28636437.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@gpu-bot-ugent
Copy link
Copy Markdown

gpu-bot-ugent Bot commented Apr 24, 2026

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: amd-zen3 and accelerator nvidia/cc80
Building for: x86_64/amd/zen3 and accelerator nvidia/cc80
Job dir: /scratch/gent/vo/002/gvo00211/SHARED/jobs/2026.04/pr_1482/15689215

date job status comment
Apr 24 11:43:18 UTC 2026 submitted job id 15689215 awaits release by job manager
Apr 24 11:44:46 UTC 2026 released job awaits launch by Slurm scheduler
Apr 24 11:48:58 UTC 2026 running job 15689215 is running
Apr 24 12:55:31 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-15689215.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen3-accel-nvidia-cc80-17770352040.tar.zstsize: 34 MiB (36599032 bytes)
entries: 760
modules under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
GROMACS/2025.4-foss-2025b-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
GROMACS/2025.4-foss-2025b-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/reprod
GROMACS/2025.4-foss-2025b-CUDA-12.9.1/20260424_125258UTC
other under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Apr 24 12:55:31 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-15689215.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented Apr 24, 2026

@casparvl The icelake cc80 build with the Surf bot failed because of:

[1777032154.196060] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)
[1777032154.196107] [gcn12:54944:0]           mpool.c:269  UCX  ERROR Failed to allocate memory pool (name=rc_recv_desc) chunk: Input/output error
[1777032154.196266] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)

Have you encountered this before?

@boegel
Copy link
Copy Markdown
Contributor

boegel commented Apr 24, 2026

@casparvl The icelake cc80 build with the Surf bot failed because of:

[1777032154.196060] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)
[1777032154.196107] [gcn12:54944:0]           mpool.c:269  UCX  ERROR Failed to allocate memory pool (name=rc_recv_desc) chunk: Input/output error
[1777032154.196266] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)

Have you encountered this before?

ulimit -l being set to 8MB causing trouble doesn't seem too crazy to me...

Maybe we just need to add ulimit -l unlimited to bot build (job) script?

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 1, 2026

@casparvl The icelake cc80 build with the Surf bot failed because of:

[1777032154.196060] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)
[1777032154.196107] [gcn12:54944:0]           mpool.c:269  UCX  ERROR Failed to allocate memory pool (name=rc_recv_desc) chunk: Input/output error
[1777032154.196266] [gcn12:54944:0]           ib_md.c:287  UCX  ERROR ibv_reg_mr(address=0x7febfda00000, length=37748736, access=0xf) failed: Cannot allocate memory : Please set max lo
cked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)

Have you encountered this before?

ulimit -l being set to 8MB causing trouble doesn't seem too crazy to me...

Maybe we just need to add ulimit -l unlimited to bot build (job) script?

I tried various things with an interactive job on Snellius, but ulimit -l always printed unlimited. Looking at the logs and the job details again, I found that the job itself is also reported as OUT_OF_MEMORY, so maybe that's the real issue here. sacct shows that the job requested 120GB though, which should be more than enough for GROMACS? Also, MaxRSS is 15916602K, i.e. ~16GB, so I don't understand this... From what I can see in the logs, it doesn't seem to use /dev/shm either.

edit: the zen4 job also ran out of memory according to Slurm, but somehow kept running and then timed out after a day.

@casparvl do you have any idea what's going on?

@casparvl
Copy link
Copy Markdown
Collaborator

casparvl commented May 4, 2026

The only thing I can think of: these nodes don't have local disks, so /tmp is essentially also in memory. Could it be that we are writing a lot there? I mean, it'd have to be a whole lot.

@casparvl
Copy link
Copy Markdown
Collaborator

casparvl commented May 4, 2026

edit: the zen4 job also ran out of memory according to Slurm, but somehow kept running and then timed out after a day.

I've seen this happen before. If you have, say, 3 processes running, OOM killer might kill one, leave 2 stray processes that just wait for the other one to do something. And that then runs indefinitely. SLURM doesn't end the job, since you still have running processes.

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 8, 2026

I've done an interactive build on an A100 node on Snellius with my personal account and on top of EESSI (without a container), that worked fine:

== Build succeeded for 1 out of 1 (total: 30 mins 20 secs)
== Summary:
   * [SUCCESS] GROMACS/2025.4-foss-2025b-CUDA-12.9.1

No memory issues, and the max memory usage was like ~4GB. I'll do another one with the container.

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 8, 2026

Let me just try this again as well:

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80

@eessi-bot-surf
Copy link
Copy Markdown

eessi-bot-surf Bot commented May 8, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.05/pr_1482/22588119

date job status comment
May 08 14:27:42 UTC 2026 submitted job id 22588119 will be eligible to start in about 20 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025.06-software.eessi.io 2025.06 version of software.eessi.io accel:nvidia

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants