Skip to content

goodhistogram: add QueryQuantiles for alloc-free live quantile reads#5

Merged
kyle-a-wong merged 4 commits into
cockroachdb:mainfrom
kyle-a-wong:kwong/query-quantiles
May 18, 2026
Merged

goodhistogram: add QueryQuantiles for alloc-free live quantile reads#5
kyle-a-wong merged 4 commits into
cockroachdb:mainfrom
kyle-a-wong:kwong/query-quantiles

Conversation

@kyle-a-wong
Copy link
Copy Markdown
Contributor

Summary

Adds Histogram.QueryQuantiles(dst, qs []float64) []float64, an alloc-free quantile-read path that reads atomic counters in place rather than materializing a Snapshot. The result reflects the same eventual consistency Snapshot() already accepts.

This targets hot-path consumers that read quantiles per recorded sample (e.g. an "is this query slow?" detector keyed by SQL fingerprint), where the existing Snapshot + ValuesAtQuantiles path's per-call allocations dominate the cost.

Approach

  • Two-pass walk over the bucket array. Pass 1 sums in-range counts to a total; pass 2 walks forward with a 3-bucket sliding window of (prev, curr, next) counts to compute trapezoidal boundary densities on the fly, avoiding the avgDensity / boundaryDensity scratch slices used by ValuesAtQuantiles.
  • qs must be sorted ascending. With a stack-backed dst (e.g. var buf [4]float64; h.QueryQuantiles(buf[:0], qs)) the call is fully alloc-free.
  • Boundary-density behavior at the rightmost bucket (dR=0) matches the existing ValuesAtQuantiles for parity. Likely a separate, pre-existing bug; preserving it here keeps results bit-for-bit identical and is best fixed in its own PR.
  • Record is unchanged; no contention regression.

Numbers (Apple M3 Pro)

3 quantiles, n=10k populated:

Method ns/op B/op allocs/op
Snapshot + ValuesAtQuantiles 815 2912 10
QueryQuantiles 288 0 0

2 quantiles, n=1k and n=100k both: 779 → 282 ns/op, 2840 → 0 B/op, 9 → 0 allocs.

Test plan

  • TestQueryQuantilesAgreesWithSnapshot — 12 distributions × 10 quantiles, exact agreement with Snapshot.ValuesAtQuantiles
  • TestQueryQuantilesEdges — empty, only-underflow, only-overflow, empty-qs
  • TestQueryQuantilesAllocFree — verifies 0 allocs/op when dst has cap
  • BenchmarkQueryPath and BenchmarkQueryPathThreeQuantiles for the numbers above
  • Full go test ./... passes

@cockroachlabs-cla-agent
Copy link
Copy Markdown

cockroachlabs-cla-agent Bot commented May 11, 2026

CLA assistant check
All committers have signed the CLA.

Add Histogram.QueryQuantiles(dst, qs), which estimates quantile values by
reading atomic counters in place rather than copying them into a Snapshot.
The result reflects the same eventual consistency Snapshot already accepts
(counters read independently and may observe a slightly inconsistent total).

This targets hot-path consumers that read quantiles per recorded sample
(e.g. an "is this query slow?" detector keyed by SQL fingerprint), where
the existing Snapshot + ValuesAtQuantiles path's per-call allocations
dominate the cost.

The walk uses a 3-bucket sliding window of (prev, curr, next) counts to
compute trapezoidal boundary densities on the fly, avoiding the n-sized
avgDensity / boundaryDensity scratch slices used by ValuesAtQuantiles.
The qs slice must be sorted ascending; with a stack-backed dst, the call
is fully alloc-free.

Boundary-density behavior at the rightmost bucket (dR=0) matches the
existing ValuesAtQuantiles for parity; this is preserved deliberately so
results agree across the two methods. The agreement test covers all 12
distributions in the existing benchmark suite.

Per-call benchmark on Apple M3 Pro, n=10k populated, 3 quantiles:

  Snapshot+ValuesAtQuantiles    815 ns/op   2912 B/op   10 allocs/op
  QueryQuantiles                288 ns/op      0 B/op    0 allocs/op
@kyle-a-wong kyle-a-wong force-pushed the kwong/query-quantiles branch from f26b5e3 to 8b8db39 Compare May 11, 2026 13:29
@angles-n-daemons
Copy link
Copy Markdown
Contributor

This is a question I haven't thought much about - but do you think there's any reason we need snapshots to do quantile estimation?

If snapshots are inherently inconsistent, and there's no locks on the structure, should we just move fully over to direct quantile querying? Does this affect the API that prometheus users are familiar with in any meaningful way?

@kyle-a-wong
Copy link
Copy Markdown
Contributor Author

but do you think there's any reason we need snapshots to do quantile estimation?

Im not sure if we need snapshots to do quantile estimations, but i would assume we still want to support doing quantile estimations of snapshots right?

@angles-n-daemons
Copy link
Copy Markdown
Contributor

Yeah that seems to make sense to me

Copy link
Copy Markdown
Contributor

@angles-n-daemons angles-n-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup — sliding window is a clean improvement over the scratch-slice version. A few thoughts inline. Two whole-PR notes:

  • gofmt -l . flags both new files (spaces instead of tabs) — gofmt -w should sort it.
  • No -race-friendly test exercising Record and QueryQuantiles concurrently. Even a smoke test would help lock in the lock-free contract against future regressions.

Comment thread quantile_live.go Outdated
// qs MUST be sorted in ascending order. dst must have cap >= len(qs); pass a
// stack-backed slice (e.g. var buf [4]float64; h.QueryQuantiles(buf[:0], qs))
// to make the call fully alloc-free.
func (h *Histogram) QueryQuantiles(dst, qs []float64) []float64 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename to fit alongside the existing ValueAtQuantile / ValuesAtQuantiles? Something like ValuesAtQuantilesInto(dst, qs) keeps the relationship to the snapshot version obvious and follows the Go convention of *Into for caller-provided buffers. Two different verbs for the same conceptual op makes the API harder to discover.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 47a1510 — renamed to ValuesAtQuantilesInto. Agreed the verb mismatch made the relationship hard to find in godoc.

Comment thread quantile_live.go Outdated
}
dR = (currD + nextD) / 2.0
}
// dR remains 0 at i==n-1 to match existing behavior.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth fixing in the same PR rather than mirroring it. The dR=0 quirk biases the reported value low by ~20% of bucket width or more in the rightmost bucket — and p99 in long-tailed latency distributions almost always lands exactly there. If the natural near-term consumer is something like AnomalyDetector (per-statement comparison against reported p99), low-biased p99 means more false-positive slow-query flags in production.

One-line fix in quantile.go (add boundaryDensity[n] = avgDensity[n-1] after the loop), plus mirroring it here (set dR = currD when i == n-1, parallel to the existing i == 0 → dL = currD case). The "exact parity" tests would need to update, but they're effectively pinned to the wrong invariant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f4b1ed8 (root fix in quantile.go for both ValueAtQuantile and ValuesAtQuantiles, mirrored in ValuesAtQuantilesInto). Worth noting the existing snapshot quantile tests in histogram_test.go didn't actually pin the old behavior — they all still pass — so the "tests pinned to the wrong invariant" concern didn't materialize. Nothing in the tree was depending on the bias.

Comment thread quantile_live_test.go Outdated
got := h.QueryQuantiles(buf[:0], qs)

for i, q := range qs {
if math.Abs(got[i]-want[i]) > 1e-6*math.Max(1, math.Abs(want[i])) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tolerance is 1e-6 here, but the PR description claims bit-for-bit parity. Both paths feed the same arguments to trapezoidalSolve in the same order, so equality should hold exactly — would tighten this to got[i] != want[i]. (Or if you keep the tolerance, soften the parity claim in the description.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 40fad13 — tightened to got[i] != want[i]. Confirmed both paths still agree exactly after the dR fix landed in f4b1ed8.

kyle-a-wong and others added 3 commits May 18, 2026 10:13
…ation

The boundary-density loops in ValueAtQuantile/ValuesAtQuantiles never set
boundaryDensity[n] — the case n: branch was unreachable because for i :=
range n only iterates 0..n-1. As a result, the right edge of the
rightmost bucket was treated as having density zero, biasing interpolated
values low (~20% of bucket width or more) right where p99 of long-tailed
latency distributions lands.

Set boundaryDensity[n] = avgDensity[n-1] explicitly, parallel to the
existing boundaryDensity[0] = avgDensity[0]. Mirror the same fix in
ValuesAtQuantilesInto (dR = currD at i == n-1) so the two paths stay in
parity.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Fits the existing Snapshot API surface (ValueAtQuantile,
ValuesAtQuantiles) and follows the Go convention of an *Into suffix for
functions that write into a caller-provided buffer. Two different verbs
for the same conceptual operation made the relationship harder to find
in godoc.

Also gofmt the test file (spaces -> tabs) — the rename touches enough
of it that bundling the format fix here keeps later commits clean.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Both ValuesAtQuantilesInto and Snapshot.ValuesAtQuantiles call
trapezoidalSolve with identical arguments in the same order, so the
parity test can assert exact equality instead of an epsilon tolerance.

Add TestValuesAtQuantilesIntoConcurrentWithRecord: 4 writer goroutines
and 4 reader goroutines running for 100ms, intended for -race. Pins the
lock-free contract so a future regression (e.g. accidentally sharing
scratch state across callers) gets caught by CI.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@kyle-a-wong
Copy link
Copy Markdown
Contributor Author

Both whole-PR notes addressed. gofmt -w landed naturally as part of f4b1ed8 / 47a1510 (the renames and edits touched the whole files). Added TestValuesAtQuantilesIntoConcurrentWithRecord in 40fad13 — 4 writers + 4 readers for 100ms, passes under go test -race ./....

Note: gofmt -l . still flags benchmark_test.go, but that's pre-existing on main and outside this PR's scope — happy to fix separately.

Copy link
Copy Markdown
Contributor

@angles-n-daemons angles-n-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, walked through the commits — all four look good, tests pass under -race.

Ah good to know on the dR test invariant — i was assuming the parity tests had been pinned to the buggy behavior, but you're right that they weren't tight enough to lock it in. Fix landed clean.

Opened #6 for the benchmark_test.go gofmt fix to keep it out of this PR's scope.

Approving.

@kyle-a-wong kyle-a-wong merged commit 92f4812 into cockroachdb:main May 18, 2026
3 checks passed
@kyle-a-wong kyle-a-wong deleted the kwong/query-quantiles branch May 18, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants