Skip to content

refactor: stop allocating on tuple and list hashing#14542

Draft
robinbb wants to merge 1 commit into
ocaml:mainfrom
robinbb:robinbb-stdune-hash-nonalloc
Draft

refactor: stop allocating on tuple and list hashing#14542
robinbb wants to merge 1 commit into
ocaml:mainfrom
robinbb:robinbb-stdune-hash-nonalloc

Conversation

@robinbb
Copy link
Copy Markdown
Collaborator

@robinbb robinbb commented May 15, 2026

Idea: see if we can avoid doing allocations in hash functions.

Warning: I do not understand whether there are ramifications for deviating from Poly.hash like this. (Maybe certain hashes need to match somehow, for example.)

I wonder if we should merge this or something like this if only to set the precedent that hash functions must not allocate.

Summary

Tuple.T2.hash, Tuple.T3.hash, and List.hash each allocated per call — a fresh inner tuple for the tuple variants, a fresh int list of the input's length for List.hash — before handing the value to Stdlib.Hashtbl.hash. Replace each with a non-allocating multiplicative combiner ((acc * 31) + x, the same shape as Java's Objects.hash).

Same 'a -> 'a -> int API; callers don't change. Hash values shift, but these functions feed Hashtbl / Memo bucket selectors only — no persistent serialization, no cross-process invariants.

Inspired by #14534.

Measured impact: −868k allocated words ≈ −7 MB per dune-on-dune null build (−0.115%), consistent across 5 paired runs. minor_collections and major_collections unchanged. Details in this comment.

@robinbb robinbb self-assigned this May 15, 2026
@robinbb robinbb requested a review from Copilot May 15, 2026 02:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Stdune tuple and list hash helpers to avoid building intermediate tuples/lists before hashing, replacing polymorphic hashing with arithmetic combiners.

Changes:

  • Replaced Tuple.T2.hash and Tuple.T3.hash tuple allocation with multiplicative integer combination.
  • Replaced List.hash’s mapped-list allocation with a left fold.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
otherlibs/stdune/src/tuple.ml Updates tuple hash implementations to combine component hashes directly.
otherlibs/stdune/src/list.ml Updates list hashing to fold over element hashes without constructing an intermediate list.

Comment thread otherlibs/stdune/src/list.ml Outdated
Comment thread otherlibs/stdune/src/list.ml Outdated
@robinbb robinbb force-pushed the robinbb-stdune-hash-nonalloc branch from cc6bccb to e193cc5 Compare May 15, 2026 02:54
@robinbb robinbb requested a review from Copilot May 15, 2026 02:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread otherlibs/stdune/src/list.ml
`Tuple.T2.hash`, `Tuple.T3.hash`, and `List.hash` each allocated
per call:

- `Poly.hash (f a, g b)` allocates a fresh inner tuple before
  `Stdlib.Hashtbl.hash` walks it.
- `Stdlib.Hashtbl.hash (map ~f xs)` allocates a fresh `int list`
  of length |xs| before walking it.

Replace each with a non-allocating multiplicative combiner
(`(acc * 31) + new_element`) — the same shape Java's
`Objects.hash` uses and standard for ad-hoc hash combination.

Same `'a -> 'a -> int` API; callers don't change. Hash values
shift, but these functions feed `Hashtbl` / `Memo` bucket
selectors only — no persistent serialization, no cross-process
invariants, so the shift is invisible to consumers.

Signed-off-by: Robin Bate Boerop <me@robinbb.com>
@robinbb robinbb force-pushed the robinbb-stdune-hash-nonalloc branch from e193cc5 to c566d44 Compare May 15, 2026 03:08
@robinbb robinbb requested a review from Copilot May 15, 2026 03:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

| x :: xs -> hash_loop f ((acc * 31) + f x) xs
;;

let hash f xs = hash_loop f 1 xs
Copy link
Copy Markdown
Collaborator Author

@robinbb robinbb May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The math is right: 31 ≡ -1 (mod 32) collapses 31^k into 2 buckets mod 32, and the old Hashtbl.hash (map …) had a Murmur finalizer we don't preserve. Concrete pathology requires bool list/unit list (or other domains where every element hashes to 0). Grepping the codebase, every existing List.hash callsite hashes Lib/Path/Lib_name/Dune_project/String lists, so the failure mode isn't reachable from current callers.

Java's Arrays.hashCode uses this same recurrence with init=1 and no finalizer; widely deployed without anyone treating it as a critical defect.

A simple h lxor (h lsr 16) finalizer happens to be a no-op for the small values our tests pin (high bits zero), so a meaningful fix would need a Murmur-style mixer — meaningful scope creep for a refactor-titled PR. Deferring; happy to do this in a separate PR if profiling surfaces a real hot-spot.

@robinbb
Copy link
Copy Markdown
Collaborator Author

robinbb commented May 15, 2026

Allocation measurement

To check the magnitude of the effect on dune itself, ran a paired dune-on-dune null build experiment.

Setup. dune-old built from upstream/main at 080d3e680d (parent of this PR's commits); dune-new built from this PR's tip c566d4445a. Workload: a clone of the dune source tree, fully built — dune build is a no-op rebuild that exercises the memo cache scan and rule-evaluation path without doing compilation work. Measurement: OCAMLRUNPARAM=v=0x400 <dune> build, picking the parent-process allocation block by max allocated_words. 5 paired runs alternating old/new.

Results.

run old allocated_words new allocated_words Δ
1 757,503,985 756,686,557 −817,428
2 757,587,297 756,756,752 −830,545
3 757,503,216 756,687,417 −815,799
4 757,655,668 756,604,684 −1,050,984
5 757,587,355 756,759,976 −827,379

Mean Δ: −868,427 words ≈ −7.0 MB per null build (−0.115%). All 5 paired deltas have the same sign; range of deltas (~235k) is well below the magnitude (~870k). minor_collections and major_collections were unchanged between binaries, so the effect is purely on minor-heap throughput, not GC pressure.

Treating this as a defensive/clarity change rather than a perf win — the saving is real but unlikely to be observable in wall time on a realistic workload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants