Skip to content

Conversation

@smudge
Copy link
Member

@smudge smudge commented Jan 30, 2026

Say you have a jobs table with 1M rows, 4 named priority ranges, and 2 queues.

Rather than:

  • GROUP BY (CASE ... END) AS priority, queue, COUNT(*) (1M rows)

This does:

  • [outer] GROUP BY (CASE ... END) AS priority, queue, SUM(count) (~8 rows)
    • [inner] GROUP BY priority, queue, COUNT(*) (1M rows)

The latter looks more complex, but it benefits from pulling priority directly out of the index without computing the CASE ... END 1 million times. (Instead, it performs the more complex CASE evaluation on a much, much smaller set!)

In practice, I am seeing this shave ~2-3 seconds off of each delayed:monitor query in a production table with 10M+ rows. For tables with far fewer rows, the difference is negligible. (And, as always, your mileage may vary!)

/no-platform

@smudge smudge requested a review from effron January 30, 2026 22:04
Say you have a jobs table with 1M rows, 4 named priority ranges, and 2 queues.

Rather than:
- `GROUP BY (CASE ... END) AS priority, queue, COUNT(*)` (1M rows)

This does:
- [outer] `GROUP BY (CASE ... END) AS priority, queue, SUM(count)` (~8 rows)
  - [inner] `GROUP BY priority, queue, COUNT(*)` (1M rows)

The latter _looks_ more complex, but it benefits from pulling `priority`
directly out of the index without computing the `CASE ... END` 1 million
times. (Instead, it performs the more complex `CASE` evaluation on a
much, much smaller set!)

In practice, I am seeing this shave ~2-3 seconds off of
`delayed:monitor` queries in a production table with ~50M rows, but
of course your mileage may vary!
@smudge smudge force-pushed the monitor-perf/priority-grouping branch from b56ffed to 7c9d2ab Compare January 30, 2026 22:17
-> Sort (cost=...)
Output: delayed_jobs.priority, delayed_jobs.queue
Sort Key: delayed_jobs.priority, delayed_jobs.queue
-> Index Scan using delayed_jobs_priority on public.delayed_jobs (cost=...)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several of these cases cover the behavior of these new queries against the "legacy" index that shipped prior to 2.0.0

Interestingly, this was previously a seq scan but now decides to use the (legacy) delayed_jobs_priority index in the inner loop!

Group Key: delayed_jobs.priority, delayed_jobs.queue
-> Sort (cost=...)
Output: delayed_jobs.priority, delayed_jobs.queue
Sort Key: delayed_jobs.priority, delayed_jobs.queue
Copy link
Member Author

@smudge smudge Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a real-world context, I do not see this "Sort", because PG chooses HashAggregate (which does not require pre-sorted data). Here, I disabled hash aggregates within the test suite purely to force determinism.

That said, looking up at the plan for failed_count, it doesn't actually apply a sort even with a GroupAggregate:

              ->  GroupAggregate  (cost=...)
                    Output: delayed_jobs.priority, delayed_jobs.queue, count(*)
                    Group Key: delayed_jobs.priority, delayed_jobs.queue
                    ->  Index Only Scan using idx_delayed_jobs_failed on public.delayed_jobs  (cost=...)
                          Output: delayed_jobs.priority, delayed_jobs.queue

This is because idx_delayed_jobs_failed arrives pre-sorted with priority, queue, whereas idx_delayed_jobs_live is sorted with priority, run_at, queue, and therefore requires another sort before the GroupAggregate. If I were to force this GroupAggregate strategy in production, the Sort would be extremely inefficient and spill to disk, so there is something to be said for rethinking column order in our indexes, even if PG is smart enough to avoid this for larger tables.

And FWIW The reason I did not put queue first in the indexes is because it's possible to run variants of these queries that do not filter by queue, which are broadly speaking unable to make use of such an index ordering (some exceptions made in newer PG versions, but I wasn't observing this in real-world testing). So you would either need a second index, or would need to decide up front what type of index to build, potentially requiring a rebuild if you decide to change your worker/monitor queue assignments.

TL;DR though, outside of the scope of this particular PR, I am strongly considering adjustments to the idx_delayed_jobs_live column order alongside a somewhat more opinionated pickup strategy (to avoid the problem of needing to decide up front whether or not your workers/monitor will support filtering on queue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant