Skip to content

release: v1.2.0 — Performance at Scale#28

Merged
vishaltps merged 7 commits intomainfrom
fix/performance-large-datasets
Mar 7, 2026
Merged

release: v1.2.0 — Performance at Scale#28
vishaltps merged 7 commits intomainfrom
fix/performance-large-datasets

Conversation

@vishaltps
Copy link
Owner

Summary

  • Resolves gateway timeouts on overview page with large datasets (millions of rows in solid_queue_jobs)
  • Eliminates all expensive queries on the jobs table — stats derived entirely from execution tables
  • Adds config.show_chart toggle to optionally disable chart queries

Problem

Issue #27 — Users with ~4M rows in solid_queue_jobs experienced gateway timeouts (502/504) when loading the overview page. Root causes:

  1. Stats calculation: 3× COUNT(*) on solid_queue_jobs taking 52+ seconds each
  2. Chart data: Plucking all timestamps into Ruby memory, then iterating in a loop
  3. Filters: .pluck(:job_id) loading millions of IDs into Ruby arrays for WHERE IN
  4. Queue stats: N+1 queries — 3 COUNT queries per queue row in QueuesPresenter

Solution

1. Stats from execution tables only

Replaced COUNT(*) on solid_queue_jobs with counts from small execution tables (ready_executions, scheduled_executions, claimed_executions, failed_executions). Dashboard now shows "Active Jobs" instead of "Total Jobs" and "Completed".

2. SQL GROUP BY chart bucketing

Replaced in-memory timestamp bucketing with SQL GROUP BY using computed bucket index. Cross-DB compatible (PostgreSQL + SQLite).

3. Subquery filters

Changed all .pluck(:job_id) to .select(:job_id) so filtering stays as a DB subquery instead of loading arrays into memory.

4. Pre-aggregated queue stats

Replaced per-queue COUNT queries with 3 GROUP BY queue_name queries in the controller, passed to presenter.

5. config.show_chart toggle

New configuration option to disable chart queries entirely for users who don't need the visualization.

Benchmark Results (100,000 jobs on PostgreSQL)

Pain Point Before After Speedup
Stats calculation 75.18ms 3.10ms 24.3x
Chart data (1d) 147.96ms 22.02ms 6.7x
Chart data (1w) 771.35ms 89.27ms 8.6x
Filter (pluck → subquery) 7.43ms 1.57ms 4.7x
Queue stats (N+1) 36.51ms 7.22ms 5.1x

Note: Speedup grows dramatically with more rows. At 4M rows (the reporter's scale), the old stats calculation took 52+ seconds causing gateway timeouts. The new approach stays constant regardless of solid_queue_jobs table size.

Changes

  • app/services/solid_queue_monitor/stats_calculator.rb — Rewritten to use execution tables only
  • app/presenters/solid_queue_monitor/stats_presenter.rb — "Active Jobs" replaces "Total Jobs"/"Completed"
  • app/controllers/solid_queue_monitor/base_controller.rb — All .pluck(:job_id).select(:job_id)
  • app/controllers/solid_queue_monitor/queues_controller.rb — Pre-aggregate queue stats with GROUP BY
  • app/controllers/solid_queue_monitor/in_progress_jobs_controller.rb — Fix pluck in filters
  • app/presenters/solid_queue_monitor/queues_presenter.rb — Use pre-aggregated stats hash
  • app/services/solid_queue_monitor/chart_data_service.rb — SQL GROUP BY bucketing (cross-DB)
  • app/controllers/solid_queue_monitor/overview_controller.rb — Conditional chart loading
  • lib/solid_queue_monitor.rb — Add show_chart config attribute
  • lib/generators/solid_queue_monitor/templates/initializer.rb — Add show_chart config option

Breaking Change

Dashboard "Total Jobs" and "Completed" stats are replaced with "Active Jobs" (sum of ready + scheduled + in-progress + failed). This avoids expensive COUNT(*) on the jobs table that caused timeouts at scale.

Testing

  • 255 examples, 0 failures, 5 pending
  • New spec: no_unbounded_pluck_spec.rb — source-code scanning test preventing pluck regression
  • Rewritten chart_data_service_spec.rb — behaviour-based tests on SQLite
  • Updated stats_calculator_spec.rb — verifies SolidQueue::Job never receives :count
  • Updated overview_spec.rb — tests show_chart = false skips chart queries
  • Manual testing on PostgreSQL with 100K+ jobs (benchmark script)
  • Cross-DB: tests pass on SQLite, benchmarked on PostgreSQL

Checklist

  • Code follows project patterns
  • Self-review completed
  • Tests pass (255 examples, 0 failures)
  • No debug statements left
  • Version bumped to 1.2.0
  • CHANGELOG updated
  • README updated with Performance at Scale section
  • ROADMAP updated

Closes #27

Replace COUNT(*) queries on solid_queue_jobs with counts from
ready_executions, scheduled_executions, claimed_executions, and
failed_executions. Replaces "Total Jobs" and "Completed" dashboard
stats with "Active Jobs" (sum of ready + scheduled + in-progress + failed).

Resolves gateway timeouts on overview page with millions of rows.
Fixes #27
- Change .pluck(:job_id) to .select(:job_id) in all filter methods to
  keep filtering as DB subqueries instead of loading IDs into memory
- Pre-aggregate queue stats with 3 GROUP BY queries, eliminating
  per-queue COUNT queries in QueuesPresenter
- Add spec to prevent unbounded pluck regression
Replace in-memory timestamp bucketing (pluck all timestamps, iterate
in Ruby) with SQL GROUP BY using computed bucket index. Works on both
PostgreSQL and SQLite with adapter-aware expressions.
When show_chart is false, ChartDataService is not instantiated and
the chart section is not rendered, eliminating chart queries entirely
for users who don't need the visualization.
- Add "Performance at Scale" section to README
- Add [Unreleased] section to CHANGELOG with breaking change note
- Add Large Dataset Performance to ROADMAP as done
- Include implementation plan in docs/plans/
Bump version to 1.2.0, set CHANGELOG release date, update README
gem version reference.
- Fix RSpec/MessageSpies: use have_received instead of receive
- Fix Layout/HashAlignment, Layout/ArgumentAlignment auto-corrections
- Fix Style/ConditionalAssignment in base_controller
- Disable false-positive Style/HashTransformKeys (pluck returns Array)
- Disable RSpec/DescribeClass for source-scanning spec
- Update Gemfile.lock with version 1.2.0
@vishaltps vishaltps merged commit c9039bf into main Mar 7, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance issues on large Solid Queue datasets (overview counts, chart aggregation, and queue page N+1 counters)

1 participant