Skip to content

goodhistogram: add windowed histogram with Prometheus integration#4

Merged
angles-n-daemons merged 1 commit into
bdillmann/prometheus-collectorfrom
bdillmann/windowed-histogram
May 4, 2026
Merged

goodhistogram: add windowed histogram with Prometheus integration#4
angles-n-daemons merged 1 commit into
bdillmann/prometheus-collectorfrom
bdillmann/windowed-histogram

Conversation

@angles-n-daemons
Copy link
Copy Markdown
Contributor

Summary

Stacks on #2.

Cumulative histograms are the right primitive for Prometheus scraping, but operators and dashboards need rolling-window quantiles to see recent behavior. Windowed maintains a single cumulative histogram with two baseline snapshots rotated on a configurable interval. The windowed view is computed by subtracting the older baseline from the current cumulative state, covering 1-2x the window duration. Recording cost is identical to a plain Histogram (~20ns, lock-free); the mutex only protects baseline rotation.

WindowedCollector and WindowedVec provide Prometheus integration for single and labeled windowed histograms, following the same adapter pattern as PrometheusCollector and HistogramVec.

@kyle-a-wong kyle-a-wong self-requested a review April 27, 2026 15:00
Comment thread windowed.go Outdated
}

// tick performs the actual baseline rotation. Must be called with w.mu held.
func (w *Windowed) tick() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we call this tickLocked just to make it clear that it is expected that a lock was already acquired?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, doing this now

Comment thread windowed.go Outdated
w.prevBaseline = empty
w.curBaseline = empty
w.nextTick = time.Now().Add(w.interval)
w.mu.Unlock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why we dont defer this after taking the lock?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason, changing this now

Comment thread windowed_vec.go Outdated
Comment thread windowed_vec.go Outdated
labelNames []string

mu sync.RWMutex
histograms map[string]*labeledWindowed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If a lock needs to be acquired to access it, i like to put it in an embeded struct, something like:

mu struct {
  sync.RWMutex
  histograms map[string]*labeledWindowed
}

that way, the histograms is accessed like w.mu.histograms, which makes it a little more clear that a lock needs to be acquired. Not gonna block on it, but just food for thought if you wanna make the change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i'll do that now

@angles-n-daemons angles-n-daemons force-pushed the bdillmann/prometheus-collector branch from 5748700 to 495e8da Compare April 29, 2026 14:18
@angles-n-daemons angles-n-daemons force-pushed the bdillmann/windowed-histogram branch from 8967ea7 to 4056c53 Compare April 29, 2026 14:19
@angles-n-daemons angles-n-daemons force-pushed the bdillmann/prometheus-collector branch from 495e8da to 0b75663 Compare April 29, 2026 14:25
@angles-n-daemons angles-n-daemons force-pushed the bdillmann/windowed-histogram branch from 4056c53 to eb18b9c Compare April 29, 2026 14:26
@angles-n-daemons angles-n-daemons force-pushed the bdillmann/prometheus-collector branch from 0b75663 to 824c291 Compare April 29, 2026 14:28
@angles-n-daemons angles-n-daemons force-pushed the bdillmann/windowed-histogram branch from eb18b9c to 5732bff Compare April 29, 2026 14:29
@angles-n-daemons angles-n-daemons force-pushed the bdillmann/prometheus-collector branch from 824c291 to 765e86b Compare April 29, 2026 14:41
@angles-n-daemons angles-n-daemons force-pushed the bdillmann/windowed-histogram branch from 5732bff to b3ed9f0 Compare April 29, 2026 14:42
Comment thread windowed.go Outdated
type Windowed struct {
h *Histogram // cumulative — the only histogram, never swapped or reset

mu sync.Mutex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry i missed this with the last review: can you do the same thing here that you did with windowedVec where the things that require locks are in the mutex struct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, doing this now.

Comment thread windowed.go Outdated
Comment on lines +71 to +73
cur := w.h.Snapshot()
w.mu.Lock()
base := w.prevBaseline
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing claude pointed out:

  Subtle race in WindowedSnapshot (worth fixing)

  w.maybeTick()
  cur := w.h.Snapshot()        // T1
  w.mu.Lock()
  base := w.prevBaseline       // T2
  w.mu.Unlock()
  return cur.Sub(&base)
  If another goroutine calls Tick() while h.Snapshot() is running at T1, the new curBaseline may capture per-bucket counts that are higher than the corresponding counts in our
  partially-snapshotted cur (snapshots are not atomic across buckets). If a second Tick() then runs before we read prevBaseline at T2, that newer snapshot becomes our base, and
  cur.Sub(&base) underflows uint64 per-bucket counts and TotalCount. With frequent Tick()s this is reachable, and the test TestWindowedConcurrentRecordAndTick doesn't exercise
  WindowedSnapshot (it does check it, but only via the cumulative side, not the diff side).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, yeah that seems to make sense. We'll change that now

Cumulative histograms are the right primitive for Prometheus scraping,
but operators and dashboards need rolling-window quantiles to see recent
behavior. Windowed maintains a single cumulative histogram with two
baseline snapshots rotated on a configurable interval. The windowed
view is computed by subtracting the older baseline from the current
cumulative state, covering 1-2x the window duration. Recording cost
is identical to a plain Histogram (~20ns, lock-free); the mutex only
protects baseline rotation.

WindowedCollector and WindowedVec provide Prometheus integration for
single and labeled windowed histograms, following the same adapter
pattern as PrometheusCollector and HistogramVec.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@angles-n-daemons angles-n-daemons force-pushed the bdillmann/windowed-histogram branch from b3ed9f0 to 7e72115 Compare May 4, 2026 14:12
@angles-n-daemons angles-n-daemons merged commit 796027d into bdillmann/prometheus-collector May 4, 2026
1 check passed
@angles-n-daemons angles-n-daemons deleted the bdillmann/windowed-histogram branch May 4, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants