refactor: stop allocating on tuple and list hashing by robinbb · Pull Request #14542 · ocaml/dune

robinbb · 2026-05-15T02:31:31Z

Idea: see if we can avoid doing allocations in hash functions.

Warning: I do not understand whether there are ramifications for deviating from Poly.hash like this. (Maybe certain hashes need to match somehow, for example.)

I wonder if we should merge this or something like this if only to set the precedent that hash functions must not allocate.

Summary

Tuple.T2.hash, Tuple.T3.hash, and List.hash each allocated per call — a fresh inner tuple for the tuple variants, a fresh int list of the input's length for List.hash — before handing the value to Stdlib.Hashtbl.hash. Replace each with a non-allocating multiplicative combiner ((acc * 31) + x, the same shape as Java's Objects.hash).

Same 'a -> 'a -> int API; callers don't change. Hash values shift, but these functions feed Hashtbl / Memo bucket selectors only — no persistent serialization, no cross-process invariants.

Inspired by #14534.

Measured impact: −868k allocated words ≈ −7 MB per dune-on-dune null build (−0.115%), consistent across 5 paired runs. minor_collections and major_collections unchanged. Details in this comment.

Copilot

Pull request overview

This PR refactors Stdune tuple and list hash helpers to avoid building intermediate tuples/lists before hashing, replacing polymorphic hashing with arithmetic combiners.

Changes:

Replaced Tuple.T2.hash and Tuple.T3.hash tuple allocation with multiplicative integer combination.
Replaced List.hash’s mapped-list allocation with a left fold.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`otherlibs/stdune/src/tuple.ml`	Updates tuple hash implementations to combine component hashes directly.
`otherlibs/stdune/src/list.ml`	Updates list hashing to fold over element hashes without constructing an intermediate list.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

`Tuple.T2.hash`, `Tuple.T3.hash`, and `List.hash` each allocated per call: - `Poly.hash (f a, g b)` allocates a fresh inner tuple before `Stdlib.Hashtbl.hash` walks it. - `Stdlib.Hashtbl.hash (map ~f xs)` allocates a fresh `int list` of length |xs| before walking it. Replace each with a non-allocating multiplicative combiner (`(acc * 31) + new_element`) — the same shape Java's `Objects.hash` uses and standard for ad-hoc hash combination. Same `'a -> 'a -> int` API; callers don't change. Hash values shift, but these functions feed `Hashtbl` / `Memo` bucket selectors only — no persistent serialization, no cross-process invariants, so the shift is invisible to consumers. Signed-off-by: Robin Bate Boerop <me@robinbb.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

robinbb · 2026-05-15T03:20:00Z

+  | x :: xs -> hash_loop f ((acc * 31) + f x) xs
+;;
+
+let hash f xs = hash_loop f 1 xs


The math is right: 31 ≡ -1 (mod 32) collapses 31^k into 2 buckets mod 32, and the old Hashtbl.hash (map …) had a Murmur finalizer we don't preserve. Concrete pathology requires bool list/unit list (or other domains where every element hashes to 0). Grepping the codebase, every existing List.hash callsite hashes Lib/Path/Lib_name/Dune_project/String lists, so the failure mode isn't reachable from current callers.

Java's Arrays.hashCode uses this same recurrence with init=1 and no finalizer; widely deployed without anyone treating it as a critical defect.

A simple h lxor (h lsr 16) finalizer happens to be a no-op for the small values our tests pin (high bits zero), so a meaningful fix would need a Murmur-style mixer — meaningful scope creep for a refactor-titled PR. Deferring; happy to do this in a separate PR if profiling surfaces a real hot-spot.

robinbb · 2026-05-15T03:28:12Z

Allocation measurement

To check the magnitude of the effect on dune itself, ran a paired dune-on-dune null build experiment.

Setup. dune-old built from upstream/main at 080d3e680d (parent of this PR's commits); dune-new built from this PR's tip c566d4445a. Workload: a clone of the dune source tree, fully built — dune build is a no-op rebuild that exercises the memo cache scan and rule-evaluation path without doing compilation work. Measurement: OCAMLRUNPARAM=v=0x400 <dune> build, picking the parent-process allocation block by max allocated_words. 5 paired runs alternating old/new.

Results.

run	old `allocated_words`	new `allocated_words`	Δ
1	757,503,985	756,686,557	−817,428
2	757,587,297	756,756,752	−830,545
3	757,503,216	756,687,417	−815,799
4	757,655,668	756,604,684	−1,050,984
5	757,587,355	756,759,976	−827,379

Mean Δ: −868,427 words ≈ −7.0 MB per null build (−0.115%). All 5 paired deltas have the same sign; range of deltas (~235k) is well below the magnitude (~870k). minor_collections and major_collections were unchanged between binaries, so the effect is purely on minor-heap throughput, not GC pressure.

Treating this as a defensive/clarity change rather than a perf win — the saving is real but unlikely to be observable in wall time on a realistic workload.

robinbb self-assigned this May 15, 2026

robinbb requested a review from Copilot May 15, 2026 02:38

Copilot started reviewing on behalf of robinbb May 15, 2026 02:39 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread otherlibs/stdune/src/list.ml Outdated

Comment thread otherlibs/stdune/src/list.ml Outdated

robinbb force-pushed the robinbb-stdune-hash-nonalloc branch from cc6bccb to e193cc5 Compare May 15, 2026 02:54

robinbb requested a review from Copilot May 15, 2026 02:55

Copilot started reviewing on behalf of robinbb May 15, 2026 02:56 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread otherlibs/stdune/src/list.ml

robinbb mentioned this pull request May 15, 2026

fix: Option.hash distinguishes None from Some x when (f x = 0) #14543

Draft

robinbb force-pushed the robinbb-stdune-hash-nonalloc branch from e193cc5 to c566d44 Compare May 15, 2026 03:08

robinbb requested a review from Copilot May 15, 2026 03:09

Copilot started reviewing on behalf of robinbb May 15, 2026 03:10 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: stop allocating on tuple and list hashing#14542

refactor: stop allocating on tuple and list hashing#14542
robinbb wants to merge 1 commit into
ocaml:mainfrom
robinbb:robinbb-stdune-hash-nonalloc

robinbb commented May 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

robinbb May 15, 2026 •

edited

Loading

Uh oh!

robinbb commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robinbb commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

robinbb May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robinbb commented May 15, 2026

Allocation measurement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robinbb commented May 15, 2026 •

edited

Loading

robinbb May 15, 2026 •

edited

Loading