perf(monitor): reduce per-tuple work in hot path #87

smudge · 2026-01-30T22:04:26Z

Say you have a jobs table with 1M rows, 4 named priority ranges, and 2 queues.

Rather than:

GROUP BY (CASE ... END) AS priority, queue, COUNT(*) (1M rows)

This does:

[outer] GROUP BY (CASE ... END) AS priority, queue, SUM(count) (~8 rows)
- [inner] GROUP BY priority, queue, COUNT(*) (1M rows)

The latter looks more complex, but it benefits from pulling priority directly out of the index without computing the CASE ... END 1 million times. (Instead, it performs the more complex CASE evaluation on a much, much smaller set!)

In practice, I am seeing this shave ~2-3 seconds off of each delayed:monitor query in a production table with 10M+ rows. For tables with far fewer rows, the difference is negligible. (And, as always, your mileage may vary!)

/no-platform

Say you have a jobs table with 1M rows, 4 named priority ranges, and 2 queues. Rather than: - `GROUP BY (CASE ... END) AS priority, queue, COUNT(*)` (1M rows) This does: - [outer] `GROUP BY (CASE ... END) AS priority, queue, SUM(count)` (~8 rows) - [inner] `GROUP BY priority, queue, COUNT(*)` (1M rows) The latter _looks_ more complex, but it benefits from pulling `priority` directly out of the index without computing the `CASE ... END` 1 million times. (Instead, it performs the more complex `CASE` evaluation on a much, much smaller set!) In practice, I am seeing this shave ~2-3 seconds off of `delayed:monitor` queries in a production table with ~50M rows, but of course your mileage may vary!

smudge · 2026-01-30T22:20:37Z

spec/delayed/__snapshots__/monitor_spec.rb.snap

+                    ->  Sort  (cost=...)
+                          Output: delayed_jobs.priority, delayed_jobs.queue
+                          Sort Key: delayed_jobs.priority, delayed_jobs.queue
+                          ->  Index Scan using delayed_jobs_priority on public.delayed_jobs  (cost=...)


Several of these cases cover the behavior of these new queries against the "legacy" index that shipped prior to 2.0.0

Interestingly, this was previously a seq scan but now decides to use the (legacy) delayed_jobs_priority index in the inner loop!

smudge · 2026-01-30T22:31:43Z

spec/delayed/__snapshots__/monitor_spec.rb.snap

+                    Group Key: delayed_jobs.priority, delayed_jobs.queue
+                    ->  Sort  (cost=...)
+                          Output: delayed_jobs.priority, delayed_jobs.queue
+                          Sort Key: delayed_jobs.priority, delayed_jobs.queue


In a real-world context, I do not see this "Sort", because PG chooses HashAggregate (which does not require pre-sorted data). Here, I disabled hash aggregates within the test suite purely to force determinism.

That said, looking up at the plan for failed_count, it doesn't actually apply a sort even with a GroupAggregate:

-> GroupAggregate (cost=...) Output: delayed_jobs.priority, delayed_jobs.queue, count(*) Group Key: delayed_jobs.priority, delayed_jobs.queue -> Index Only Scan using idx_delayed_jobs_failed on public.delayed_jobs (cost=...) Output: delayed_jobs.priority, delayed_jobs.queue

This is because idx_delayed_jobs_failed arrives pre-sorted with priority, queue, whereas idx_delayed_jobs_live is sorted with priority, run_at, queue, and therefore requires another sort before the GroupAggregate. If I were to force this GroupAggregate strategy in production, the Sort would be extremely inefficient and spill to disk, so there is something to be said for rethinking column order in our indexes, even if PG is smart enough to avoid this for larger tables.

And FWIW The reason I did not put queue first in the indexes is because it's possible to run variants of these queries that do not filter by queue, which are broadly speaking unable to make use of such an index ordering (some exceptions made in newer PG versions, but I wasn't observing this in real-world testing). So you would either need a second index, or would need to decide up front what type of index to build, potentially requiring a rebuild if you decide to change your worker/monitor queue assignments.

TL;DR though, outside of the scope of this particular PR, I am strongly considering adjustments to the idx_delayed_jobs_live column order alongside a somewhat more opinionated pickup strategy (to avoid the problem of needing to decide up front whether or not your workers/monitor will support filtering on queue).

smudge requested a review from effron January 30, 2026 22:04

smudge added 2 commits January 30, 2026 17:17

release: v2.0.3

49eaf17

smudge force-pushed the monitor-perf/priority-grouping branch from b56ffed to 7c9d2ab Compare January 30, 2026 22:17

smudge commented Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(monitor): reduce per-tuple work in hot path #87

perf(monitor): reduce per-tuple work in hot path #87

Uh oh!

smudge commented Jan 30, 2026

Uh oh!

smudge Jan 30, 2026

Uh oh!

smudge Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf(monitor): reduce per-tuple work in hot path #87

Are you sure you want to change the base?

perf(monitor): reduce per-tuple work in hot path #87

Uh oh!

Conversation

smudge commented Jan 30, 2026

Uh oh!

smudge Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

smudge Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

smudge Jan 30, 2026 •

edited

Loading