add citor runtime#14
Conversation
Adds the four canonical benchmarks (fib, skynet, nqueens, matmul) wired against citor v0.4.5 via CPM. Mirrors libfork's source shape: same N, same recursion structure, same validators, same warmup + timing scope. Linux + Windows presets only; citor does not currently support ARM or macOS.
|
Tried on a Ryzen 9 9950X3D, clang 21 -O3 -march=znver5 -mavx2 -mfma, libtcmalloc_minimal:
Peak Memory Usage (Max RSS)
|
|
@tzcnt please review |
|
Note: citor's default affinity caps at physical cores, so on an SMT box |
|
So the benchmarks sweep from 1 to 32 threads on the 5950X for example. We'd want to set Actually since you don't support ARM at the moment I think we can just check if |
|
Overall the performance is quite impressive. I'll do a full run on all my hardware and update the results ASAP. One thing to note, it hits a stack overflow when running |
|
Pushed the affinity fix you suggested. Each bench picks On fib(45): confirmed, real citor bug, and you diagnosed it correctly, a worker in the fork-join drain runs a stolen task that forks and re-enters the drain, so deep trees descend without unwinding. As a temporary workaround you can bump citor's worker stack with I'll land the proper depth-cap fix in a separate PR. |
Select PerCpuSmtPair when the requested thread count equals hardware_concurrency() so the full-thread sweep point uses every logical CPU; citor's default PerCpu caps workers at the physical-core count. Set CITOR_WORKER_STACK_KIB=65536 so deep recursive fork-join does not overflow the worker stack on high core counts.
|
I've set |
Adds the four canonical benchmarks (fib, skynet, nqueens, matmul) wired against citor v0.4.5 via CPM. Mirrors libfork's source shape: same N, same recursion structure, same validators, same warmup + timing scope.
Linux + Windows presets only, citor does not currently support ARM or macOS.