Skip to content

perf: test report cache priming, pagination, and query optimization#3505

Open
stbenjam wants to merge 15 commits intoopenshift:mainfrom
stbenjam:perf-tests
Open

perf: test report cache priming, pagination, and query optimization#3505
stbenjam wants to merge 15 commits intoopenshift:mainfrom
stbenjam:perf-tests

Conversation

@stbenjam
Copy link
Copy Markdown
Member

@stbenjam stbenjam commented May 6, 2026

Cache priming for test results

The test report cache loader now primes both collapsed and non-collapsed results in Redis, eliminating cold-cache latency for the most common test report views. Previously only collapsed results were primed, leaving non-collapsed queries (the detailed per-variant NURP+ view) to hit the database on first access.

To avoid unnecessary work, the cache primer now only targets OCP development releases (identified by having the payloadTags capability and no GA date set). GA releases and non-OCP products (OKD, etc.) are skipped.

Server-side pagination

The /api/tests endpoint previously returned all matching rows (up to 50k+ for uncollapsed views), causing the tests page to barely load. This adds server-side pagination following the existing pattern used by the job runs endpoint.

When perPage/page query parameters are present, the backend:

  • Applies ORDER BY, COUNT, LIMIT, and OFFSET at the SQL level
  • Returns a PaginationResult envelope with rows, total_rows, page_size, page
  • Bypasses the cache (paginated queries are fast with LIMIT/OFFSET)

When pagination params are absent, existing behavior is preserved for backward compatibility.

Frontend changes switch the DataGrid to paginationMode="server" and send perPage/page params, following the JobRunsTable pattern.

Filter pushdown (~830x improvement for filtered queries)

When collapse=false, TestsByNURPAndStandardDeviation builds a query that self-joins prow_test_report_7d_matview 3 times. Name/variant filters were only applied to the outermost query, causing subqueries to scan all rows for the release. Filters are now pushed into the stats and pass_rates subqueries, allowing index use on cache misses and paginated queries.

Replaces #3290 (rebased on main; original e2e failure was an unrelated sippy-load-job timeout).

Summary by CodeRabbit

Release Notes

  • New Features

    • Added server-side pagination to the Tests API endpoint with perPage and page parameters for efficient data retrieval
    • Test results table now supports server-side pagination for improved performance with large datasets
    • Pagination responses include total row count information for navigation
  • Documentation

    • Updated Tests API documentation with pagination parameters and new response structure

stbenjam and others added 2 commits May 6, 2026 17:06
When collapse=false, TestsByNURPAndStandardDeviation builds a query
that self-joins prow_test_report_7d_matview 3 times:

  1. Outer query - gets the raw rows
  2. pass_rates subquery - computes per-variant percentages
  3. stats subquery - computes AVG/STDDEV across variants

The name/variant filters were only applied to the outermost query.
Subqueries 2 and 3 scanned all rows for the release to compute
aggregates for every test, even when only a single test was requested.

For release 4.22 with a name filter, this meant:

  |                    | Before (outer only)     | After (pushed down) |
  |--------------------|-------------------------|---------------------|
  | Stats subquery     | Seq Scan, 1.28M rows    | Index Scan, 142     |
  | Estimated cost     | 802,603 - 1,137,530     | 7.53 - 1,371        |
  | Speedup            | -                       | ~830x               |

TestsByNURPAndStandardDeviation now accepts optional filter functions
(variadic, backward-compatible) that are applied to both the stats and
pass_rates subqueries. The filter is still also applied to the outer
query, so results are identical.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Variant-specific filters (e.g., NOT has entry "never-stable") must not
be pushed into the stats subquery, which computes AVG/STDDEV across all
variants for a test. Filtering out variants there would skew the
delta_from_*_average and standard deviation calculations.

Split SubqueryFilter into a struct with a VariantOnly flag and an
isVariantFilter helper. At the call site, the rawFilter is further
split: name filters go to both stats and passRates subqueries (safe,
just narrows to the matching test), while variant filters go only to
passRates (preserving cross-variant stats semantics).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds server-side pagination, sorting, and filtering to the Tests API (Postgres and BigQuery), exports SubqueryFilter for per-variant vs global DB filters, introduces cache priming utilities and a test-report-cache loader, validates pagination input (with tests), and wires frontend TestTable for server-driven pagination and cache-aware results.

Changes

Tests API, DB Query Filters, Pagination, and Cache Priming

Layer / File(s) Summary
Type Definition & Signature
pkg/db/query/test_queries.go
Adds exported SubqueryFilter (Apply func(*gorm.DB) *gorm.DB, VariantOnly bool) and updates TestsByNURPAndStandardDeviation(..., subqueryFilters ...SubqueryFilter) signature.
Query Logic
pkg/db/query/test_queries.go
Applies SubqueryFilters selectively: VariantOnly filters apply to passRates subquery; non-variant filters apply to both stats and passRates.
API Types & Helpers
pkg/api/tests.go
Extends TestResultsSpec with Pagination *apitype.Pagination, SortField string, Sort apitype.Sort; adds TestsAPIResult.filter, TestResultsSpec.matview(), TestResultsCacheDuration constant, and PrimeTestResultsCache.
Postgres Pagination & Result Shaping
pkg/api/tests.go
DB path updated to count total rows, apply ORDER/LIMIT/OFFSET when Pagination provided, set TotalRows, and suppress overall summary when paginating.
BigQuery Pagination Envelope
pkg/api/tests.go
BigQuery path now supports sorting and optional pagination envelope (rows, total_rows, page_size, page) when pagination requested; preserves legacy non-paginated response when omitted.
Subquery Filter Wiring
pkg/api/tests.go, pkg/db/query/test_queries.go
Constructs subqueryFilters from incoming filters and passes them into TestsByNURPAndStandardDeviation.
Server Routing & Validation
pkg/sippyserver/server.go, pkg/sippyserver/parameters.go
/api/tests handlers require non-empty release, parse/validate pagination via getPaginationParams (adds maxPerPage = 1000, perPage and page validation), return 400 on invalid params, and forward parsed pagination to printer functions.
Pagination Parsing Tests
pkg/sippyserver/parameters_test.go
Adds TestGetPaginationParams covering absent/default/valid/error scenarios for pagination parsing and validation.
Frontend: Request & State
sippy-ng/src/tests/TestTable.js
Replaces local rows with apiResult ({ rows, total_rows }), binds page query param, adds pageFlip state, includes perPage and page in API requests, and stores server response into apiResult.
Frontend: UI Hooks & Grid Wiring
sippy-ng/src/tests/TestTable.js
Resets page to 0 on view change, search, filter or sort; clears apiResult on location changes; StyledDataGrid uses apiResult.rows and apiResult.total_rows for server-side pagination; downloadDataFunc returns apiResult.rows.
Docs
pkg/api/README.md
Documents perPage and page parameters, explains pagination envelope and shows example paginated response; retains legacy example when perPage omitted.
Cache Priming & Loader
pkg/dataloader/testreportcacheloader/..., cmd/sippy/load.go
Adds testreportcacheloader with New, Load, developmentReleases, and Errors; test for developmentReleases; wires loader into cmd/sippy load.
Imports & Minor Wiring
pkg/api/tests.go, cmd/sippy/load.go
Adds gorm import for subquery building and imports for new loader; uses TestResultsCacheDuration in cache orchestration.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Frontend as TestTable
  participant Server
  participant DB
  participant Cache

  Client->>Frontend: change page/filter/sort
  Frontend->>Server: GET /api/tests?release=X&page=Y&perPage=Z&...
  Server->>Server: parse & validate pagination/spec
  Server->>Cache: check cached paginated/unfiltered results
  alt cache miss
    Server->>DB: call TestsByNURPAndStandardDeviation(..., subqueryFilters...)
    DB-->>Server: filtered rows (+ total_rows if counted)
    Server->>Cache: store results (use TestResultsCacheDuration)
  end
  Server-->>Frontend: JSON { rows, total_rows, page_size, page }
  Frontend-->>Client: render rows and pagination UI
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Go Error Handling ⚠️ Warning Violations found: Nil pointer dereferences without checks in testreportcacheloader.Load(), error wrapping using %v instead of %w in pkg/api/tests.go, and unwrapped errors in parameters.go. Add nil checks in Load(). Use %w in fmt.Errorf for errors.
Test Coverage For New Features ⚠️ Warning Pure functions lack test coverage: TestResultsSpec.matview() and TestsAPIResult.filter() should be tested. testreportcacheloader.Load() has no tests. TestTable.js pagination logic lacks tests. Add unit tests for pure functions: matview(), filter(). Add mocked integration tests for Load(). Add React component tests for TestTable.js pagination logic (pageFlip, changePage).
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Sql Injection Prevention ✅ Passed Pagination validated 1-1000. Sorting in-memory. Parameters use SafeRead regex. Filters use parameterized queries. Tables hardcoded. No unsafe SQL concatenation.
Excessive Css In React Should Use Styles ✅ Passed All inline styles in TestTable.js have 1 property each, below the 3-4 threshold. Dynamic styles use helper functions; complex styling uses useStyles(). Component complies.
Single Responsibility And Clear Naming ✅ Passed PR maintains SRP. Types have ≤5 fields, methods use action-oriented names, functions have justified domain-specific parameters, and packages are properly separated. No generic naming patterns.
Stable And Deterministic Test Names ✅ Passed The PR does not contain Ginkgo tests. Test files added use standard Go testing framework with table-driven tests that have stable, deterministic names containing no dynamic values.
Test Structure And Quality ✅ Passed Custom check for Ginkgo test structure is not applicable. PR adds standard Go tests using testing.T and testify, not Ginkgo. Repository contains no Ginkgo imports or patterns.
Microshift Test Compatibility ✅ Passed This is the Sippy CI analytics project, not an OpenShift e2e test suite. No Ginkgo tests are added; only Go unit tests. Check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added. This PR modifies sippy (test reporting system) with standard Go unit tests using testing package, not Ginkgo. Check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains only application code changes. No manifests, operators, controllers, or scheduling constraints modified.
Ote Binary Stdout Contract ✅ Passed OTE Binary Stdout Contract is not applicable. Sippy is a prow job analysis tool, not an OTE binary. The PR modifies Sippy APIs and cache priming without any stdout writes in process-level code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR does not add Ginkgo e2e tests. Two unit test files added use standard Go testing.T, not Ginkgo. Check is not applicable.
Title check ✅ Passed The title accurately summarizes the three main changes: cache priming, pagination, and query optimization. It is concise, specific, and clearly conveys the primary purpose of the pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from deepsm007 and smg247 May 6, 2026 17:08
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/db/query/test_queries.go (1)

247-260: ⚡ Quick win

Split the godoc so it documents the right exported symbols.

Inserting SubqueryFilter here makes the existing TestsByNURPAndStandardDeviation doc block attach to SubqueryFilter, so the type now starts with the function description and the exported function no longer has its own godoc. Please give SubqueryFilter a short type comment and move the analytics-query description back above TestsByNURPAndStandardDeviation.

As per coding guidelines, "Name each function succinctly but accurately indicating its purpose relative to its package or receiver. When adding new functions, types, or fields, include a brief godoc if the name alone would not make the purpose obvious to someone unfamiliar with the feature."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/db/query/test_queries.go` around lines 247 - 260, The file's godoc for
TestsByNURPAndStandardDeviation was accidentally attached to the SubqueryFilter
type; add a short type comment above SubqueryFilter (one sentence describing it
as a wrapper for filter functions with metadata) and move the existing longer
analytics-query comment back so it immediately precedes the
TestsByNURPAndStandardDeviation function declaration; ensure SubqueryFilter has
its own concise godoc and TestsByNURPAndStandardDeviation retains the original
multi-line doc block.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/db/query/test_queries.go`:
- Around line 247-260: The file's godoc for TestsByNURPAndStandardDeviation was
accidentally attached to the SubqueryFilter type; add a short type comment above
SubqueryFilter (one sentence describing it as a wrapper for filter functions
with metadata) and move the existing longer analytics-query comment back so it
immediately precedes the TestsByNURPAndStandardDeviation function declaration;
ensure SubqueryFilter has its own concise godoc and
TestsByNURPAndStandardDeviation retains the original multi-line doc block.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 24a16f3d-8b6b-4be1-9e37-fab115d7b00a

📥 Commits

Reviewing files that changed from the base of the PR and between 60b66e6 and ee1ea29.

📒 Files selected for processing (2)
  • pkg/api/tests.go
  • pkg/db/query/test_queries.go

@openshift-merge-bot openshift-merge-bot Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 6, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

The /api/tests endpoint previously returned all matching rows (up to
50k+ for uncollapsed views), causing the frontend to barely load. This
adds server-side pagination following the existing pattern used by the
job runs endpoint.

When perPage/page query parameters are present, the backend now:
- Applies ORDER BY, COUNT, LIMIT, and OFFSET at the SQL level
- Returns a PaginationResult envelope with rows, total_rows, page_size, page
- Bypasses the cache (paginated queries are fast with LIMIT/OFFSET)

When pagination params are absent, existing behavior is preserved for
backward compatibility.

Frontend changes switch the DataGrid to paginationMode="server" and
send perPage/page params, following the JobRunsTable pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@stbenjam stbenjam changed the title perf: push filters into test report subqueries (~830x improvement) TRT-2575: perf: push filters into test report subqueries + server-side pagination May 6, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 6, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 6, 2026

@stbenjam: This pull request references TRT-2575 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

When collapse=false, TestsByNURPAndStandardDeviation builds a query that self-joins prow_test_report_7d_matview 3 times:

  1. Outer query - gets the raw rows
  2. pass_rates subquery - computes per-variant percentages
  3. stats subquery - computes AVG/STDDEV across variants

The name/variant filters were only applied to the outermost query. Subqueries 2 and 3 scanned all rows for the release to compute aggregates for every test, even when only a single test was requested.

For release 4.22 with a name filter, this meant:

Before (outer only) After (pushed down)
Stats subquery Seq Scan, 1.28M rows Index Scan, 142
Estimated cost 802,603 - 1,137,530 7.53 - 1,371
Speedup - ~830x

TestsByNURPAndStandardDeviation now accepts optional filter functions (variadic, backward-compatible) that are applied to both the stats and pass_rates subqueries. The filter is still also applied to the outer query, so results are identical.

Replaces #3290 (rebased on main; original e2e failure was an unrelated sippy-load-job timeout).

Summary by CodeRabbit

  • New Features
  • Enhanced test results filtering and query capabilities to support more granular analytics.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

- Validate sort direction against allowlist (asc/desc) instead of
  interpolating raw user input into ORDER BY clause
- Add bounds validation for perPage (1-1000) and page (>= 0) in
  getPaginationParams to prevent DoS via unbounded queries
- Check COUNT query error instead of silently ignoring it
- Fix BigQuery path to return PaginationResult envelope when
  pagination params are present (frontend expects this format)
- Move COUNT before ORDER BY to avoid unnecessary overhead
- Inline isVariantFilter (trivial single-field access)
- Separate SubqueryFilter doc comment from function doc block
- Update API docs in pkg/api/README.md with new pagination params
  and response format
- Add unit tests for getPaginationParams bounds validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@stbenjam
Copy link
Copy Markdown
Member Author

stbenjam commented May 6, 2026

Review Panel Verdict

Disposition: APPROVE — all BLOCKING findings from initial review resolved


Specialist Findings

Architecture Reviewer: Cross-file impact is clean. SubqueryFilter is a justified abstraction for pushing filters into subqueries. Pagination path correctly bypasses cache (each page/sort combo has low cache reuse). downloadDataFunc now exports only the current page — acceptable tradeoff. No god functions, no circular dependencies.

Security & Supply Chain Reviewer: Sort direction uses an allowlist (SortAscending/SortDescending only — arbitrary strings cannot reach SQL). Sort field uses pq.QuoteIdentifier() which double-quote-escapes identifiers, preventing SQL injection. perPage bounded to 1–1000, page must be non-negative. No new dependencies added (lib/pq already vendored). No lockfile, build pipeline, or supply chain changes.

UX & API Reviewer: Backward compatible — when perPage is omitted, the legacy array response is preserved. When present, response is wrapped in PaginationResult envelope (rows, total_rows, page_size, page). Error messages for invalid pagination are clear (e.g., "perPage must be between 1 and 1000"). BigQuery path also wraps response in PaginationResult when pagination params are present. API docs updated in pkg/api/README.md.

Codebase Consistency Reviewer: getPaginationParams follows the same pattern as other parameter parsers in parameters.go. Test file parameters_test.go uses table-driven tests with testify assertions, consistent with the codebase. SubqueryFilter type is consistent with existing GORM query patterns. PaginationResult reuses the existing type from pkg/apis/api/types.go. isVariantFilter was inlined per suggestion.

QA Engineer: parameters_test.go covers 10 edge cases for getPaginationParams: no params, valid params, perPage without page, zero/negative perPage, negative page, exceeds max, non-numeric values, and boundary (max=1000). Sort direction validation is implicitly tested via the allowlist pattern (only SortAscending passes the guard). Backend pagination integration with the database requires integration tests (out of scope for this PR).

Devil's Advocate: Sort direction injection — resolved, allowlist prevents arbitrary tokens. BigQuery response shape — resolved, wraps in PaginationResult. COUNT + OFFSET TOCTOU — acceptable for materialized views (refreshed periodically, not subject to phantom rows). Double-fetch race in frontend (no AbortController) — pre-existing pattern across all tables in the codebase, not introduced by this PR. ROW_NUMBER() OVER() without ORDER BY gives non-deterministic IDs — harmless for React keys. Could not construct a failure scenario with the current validation.

Technical Writer: API docs in pkg/api/README.md updated: perPage (1–1000) and page (0+) parameters added to the Tests endpoint table. Pagination envelope format documented with example JSON. limit marked as legacy. Example response section renamed to clarify it shows the legacy format. No stale docs remain.

DBA Expert (300 years PostgreSQL): Filter pushdown into stats/passRates subqueries is the key performance win (~830x for filtered queries). pq.QuoteIdentifier is the correct defense for identifier injection. Sort direction allowlist prevents syntax injection. COUNT(*) error is now checked and propagated. perPage capped at 1000 prevents full-table-scan via pagination API. LIMIT/OFFSET on materialized views is safe from phantom-row issues. Deep pagination degrades linearly with OFFSET, but acceptable for UI use (max page 1000 * 1000 = 1M offset, well within matview size). PostgreSQL cannot push LIMIT through the nested subquery joins, but the filter pushdown ensures the subqueries themselves are fast.


Panel Synthesis

All eight specialists converged on the same five BLOCKING findings in the initial review, and all five have been resolved:

  1. Sort direction SQL injection → Fixed with allowlist validation (only asc/desc reach SQL)
  2. BigQuery response incompatibility → Fixed, both paths return PaginationResult when pagination params present
  3. Unbounded perPage/page → Fixed, perPage validated to 1–1000, page >= 0
  4. COUNT error silently ignored → Fixed, error checked and propagated
  5. API docs not updated → Fixed, README.md updated with new parameters and response format

No remaining BLOCKING findings. The SUGGESTION items (inline isVariantFilter, separate SubqueryFilter doc comment) were also addressed. The DBA Expert confirmed the pagination approach is sound for materialized views and the filter pushdown is the correct optimization.


Required Actions Before Merge

None.


Optional Follow-ups

  • Consider adding an AbortController to frontend fetch calls to cancel in-flight requests on rapid page changes (pre-existing pattern across all tables, not specific to this PR)
  • Deep pagination could use keyset/cursor pagination if matview sizes grow significantly (current OFFSET approach is fine for UI use)
  • COUNT(*) could be cached or use pg_class.reltuples approximation for very large result sets if count queries become a bottleneck

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/api/tests.go`:
- Around line 401-409: The BigQuery branch sets TotalRows to len(testsResult)
after pagination, returning the page count rather than the full dataset count;
update the logic in the handler (the BigQuery path in pkg/api/tests.go where
RespondWithJSON is called for pagination) to compute and return the true total
row count by either running a separate COUNT(*) query before applying
limit/offset (mirror the Postgres path) or by using any available pre-limit
count variable (e.g., a totalCount/rowsBeforeLimit value if present) and set
apitype.PaginationResult.TotalRows to that value instead of len(testsResult);
ensure this runs only when pagination != nil and does not change the existing
paged Rows/Page/PageSize fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: adf153c4-f9ab-461c-9a0e-18064adad5ef

📥 Commits

Reviewing files that changed from the base of the PR and between ee1ea29 and 5ef2468.

📒 Files selected for processing (7)
  • pkg/api/README.md
  • pkg/api/tests.go
  • pkg/db/query/test_queries.go
  • pkg/sippyserver/parameters.go
  • pkg/sippyserver/parameters_test.go
  • pkg/sippyserver/server.go
  • sippy-ng/src/tests/TestTable.js

Comment thread pkg/api/tests.go
The BQ path was setting TotalRows to len(testsResult) after limit,
which gave the page count instead of the dataset total. Now captures
the total before applying pagination slice.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@stbenjam
Copy link
Copy Markdown
Member Author

stbenjam commented May 6, 2026

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 6, 2026
stbenjam and others added 3 commits May 6, 2026 14:43
The paginated tests API path was bypassing the cache and running the
expensive three-layer nested subquery on every page/sort change. Now
both paginated and non-paginated paths share the same cached result
set (1 hour TTL), with sorting and pagination applied in memory. The
collapsed result set is ~5k rows, making in-memory operations trivial.

This also removes the separate COUNT(*) query that was doubling the
DB work per paginated request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cache the full unfiltered test result set and apply filters in memory.
This means any filter, sort, or page change is served instantly from
cache without hitting the database. The cache TTL is increased to 4
hours to match the cache primer schedule.

Also adds a test-report-cache data loader that can be used with the
cache primer cronjob (--loader=test-report-cache) to warm the test
results cache for all releases on both default and twoDay periods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sults

Only prime cache for OCP development releases (no GA date, has
payloadTags capability) to avoid wasting time on GA/OKD releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@stbenjam stbenjam changed the title TRT-2575: perf: push filters into test report subqueries + server-side pagination TRT-2575: perf: test report query optimization, pagination, and cache priming May 6, 2026
@stbenjam stbenjam changed the title TRT-2575: perf: test report query optimization, pagination, and cache priming TRT-2575: perf: test report cache priming, pagination, and query optimization May 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/api/tests.go (1)

485-490: 💤 Low value

Consider adding a brief godoc for the matview() method.

While the method is simple, a short comment clarifying its purpose (e.g., // matview returns the materialized view name based on the spec's period.) would improve readability for unfamiliar readers. As per coding guidelines, include a brief godoc if the name alone would not make the purpose obvious.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/tests.go` around lines 485 - 490, Add a brief godoc comment above the
TestResultsSpec.matview() method explaining what it returns and when (e.g.,
"matview returns the materialized view name based on the spec's Period, choosing
2-day or 7-day view"). Update the comment to reference the behavior using the
Period field and the returned constants testReport2dMatView and
testReport7dMatView, keeping it concise and placed immediately above the func
declaration for matview().
pkg/dataloader/testreportcacheloader/testreportcacheloader.go (1)

14-27: 💤 Low value

Consider adding brief godoc for the exported New function.

While the implementation is straightforward and follows established loader patterns, the coding guidelines suggest including a brief godoc if the name alone would not make the purpose obvious. A short comment like // New creates a testReportCacheLoader that primes cache for development releases. would help unfamiliar readers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dataloader/testreportcacheloader/testreportcacheloader.go` around lines
14 - 27, Add a brief godoc comment above the exported New function describing
its purpose and behavior (for example: it creates a testReportCacheLoader that
primes the cache for development releases); place the comment immediately above
the New function declaration in the testreportcacheloader package and reference
the returned type testReportCacheLoader and the parameters (dbc, cacheClient,
releases) so readers understand what the constructor does.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/api/tests.go`:
- Around line 485-490: Add a brief godoc comment above the
TestResultsSpec.matview() method explaining what it returns and when (e.g.,
"matview returns the materialized view name based on the spec's Period, choosing
2-day or 7-day view"). Update the comment to reference the behavior using the
Period field and the returned constants testReport2dMatView and
testReport7dMatView, keeping it concise and placed immediately above the func
declaration for matview().

In `@pkg/dataloader/testreportcacheloader/testreportcacheloader.go`:
- Around line 14-27: Add a brief godoc comment above the exported New function
describing its purpose and behavior (for example: it creates a
testReportCacheLoader that primes the cache for development releases); place the
comment immediately above the New function declaration in the
testreportcacheloader package and reference the returned type
testReportCacheLoader and the parameters (dbc, cacheClient, releases) so readers
understand what the constructor does.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 943dbb47-6c35-4c48-9d01-e609f69cc228

📥 Commits

Reviewing files that changed from the base of the PR and between 50f2488 and ae7d76f.

📒 Files selected for processing (4)
  • cmd/sippy/load.go
  • pkg/api/tests.go
  • pkg/dataloader/testreportcacheloader/testreportcacheloader.go
  • pkg/dataloader/testreportcacheloader/testreportcacheloader_test.go

PrimeTestResultsCache now bypasses the cache read path entirely,
always regenerating from the database and writing the fresh result.
Previously it went through GetDataFromCacheOrMatview which would
return stale cached data if the matview hadn't been refreshed yet.

Also adds hack/bench-test-api.sh for comparing prod vs local API
response times across various query patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
stbenjam and others added 2 commits May 6, 2026 20:06
The production API is too slow to complete within curl's timeout,
making the comparison benchmark impractical.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The cache primer runs every 4 hours, so a 4-hour TTL risked cache
expiry on the boundary before the next primer run. Extending to 5
hours ensures primed entries never expire between runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/api/tests.go (2)

421-437: 💤 Low value

Consider extracting pagination helper to reduce duplication.

The pagination logic (lines 421-437) is nearly identical to the Postgres path (lines 377-393). Per coding guidelines, check pkg/util/ for existing helpers or consider extracting a shared pagination function.

♻️ Example helper extraction
// In pkg/util or locally in this file:
func paginate[T any](items []T, pagination *apitype.Pagination) ([]T, int64) {
    totalRows := int64(len(items))
    start := pagination.Page * pagination.PerPage
    end := start + pagination.PerPage
    if start > int(totalRows) {
        start = int(totalRows)
    }
    if end > int(totalRows) {
        end = int(totalRows)
    }
    return items[start:end], totalRows
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/tests.go` around lines 421 - 437, The pagination slicing logic in the
in-memory branch duplicates the Postgres path; extract a reusable helper (e.g.,
paginate[T any](items []T, pagination *apitype.Pagination) ([]T, int64)) into
pkg/util or top of this file and replace the block in the function that builds
the apitype.PaginationResult (where we currently compute totalRows, start, end
and slice sorted[start:end]) to call that helper, then pass the returned rows
and totalRows into RespondWithJSON for consistency with the Postgres path.

230-241: 💤 Low value

Filter errors are silently ignored.

When f.Filter(t) returns an error (e.g., due to an invalid filter field), the test is silently excluded from results. Consider logging at debug level to aid troubleshooting.

♻️ Suggested improvement
 func (tests TestsAPIResult) filter(f *filter.Filter) TestsAPIResult {
 	if f == nil || len(f.Items) == 0 {
 		return tests
 	}
 	var result TestsAPIResult
 	for _, t := range tests {
-		if match, err := f.Filter(t); err == nil && match {
+		match, err := f.Filter(t)
+		if err != nil {
+			log.WithError(err).Debugf("filter error for test %s", t.Name)
+			continue
+		}
+		if match {
 			result = append(result, t)
 		}
 	}
 	return result
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/tests.go` around lines 230 - 241, In TestsAPIResult.filter, errors
returned by f.Filter(t) are currently ignored; update the loop in the
TestsAPIResult.filter method to log any non-nil error from f.Filter(t) at debug
level (including the error and identifying info about t) before continuing so
failures to evaluate a filter are visible for troubleshooting while preserving
the current behavior of skipping non-matching items; i.e., when handling the
result of f.Filter(t), if err != nil call your package's debug logger (e.g.,
log.Debugf or the existing logger) with the error and t, then continue.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/api/tests.go`:
- Around line 421-437: The pagination slicing logic in the in-memory branch
duplicates the Postgres path; extract a reusable helper (e.g., paginate[T
any](items []T, pagination *apitype.Pagination) ([]T, int64)) into pkg/util or
top of this file and replace the block in the function that builds the
apitype.PaginationResult (where we currently compute totalRows, start, end and
slice sorted[start:end]) to call that helper, then pass the returned rows and
totalRows into RespondWithJSON for consistency with the Postgres path.
- Around line 230-241: In TestsAPIResult.filter, errors returned by f.Filter(t)
are currently ignored; update the loop in the TestsAPIResult.filter method to
log any non-nil error from f.Filter(t) at debug level (including the error and
identifying info about t) before continuing so failures to evaluate a filter are
visible for troubleshooting while preserving the current behavior of skipping
non-matching items; i.e., when handling the result of f.Filter(t), if err != nil
call your package's debug logger (e.g., log.Debugf or the existing logger) with
the error and t, then continue.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2eeee8b4-20e8-4802-b8ec-4371f48873fb

📥 Commits

Reviewing files that changed from the base of the PR and between ae7d76f and 118186f.

📒 Files selected for processing (1)
  • pkg/api/tests.go

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

stbenjam and others added 2 commits May 6, 2026 20:18
The API defaults IncludeOverall to true when collapse is false, but
the primer was leaving it as false. This caused a cache key mismatch,
resulting in cache misses for uncollapsed requests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The old code path through GetDataFromCacheOrMatview handled nil cache
gracefully, but the direct write path panicked on nil. Return an
error instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@stbenjam
Copy link
Copy Markdown
Member Author

stbenjam commented May 6, 2026

Benchmark: Local API with cache priming (release 5.0)

Query Cached Uncached
collapsed (default) 0.130s 3.426s
uncollapsed 10.146s 2m2s
collapsed + twoDay 0.114s -
uncollapsed + twoDay 9.696s -
collapsed + filter name contains sig-node 0.225s -
uncollapsed + filter name contains sig-node 17.426s -
collapsed + filter runs > 14 0.219s -
uncollapsed + paginated page 0 8.358s -

Collapsed results see a ~26x improvement from caching. Uncollapsed results are ~12x faster cached vs uncached (2+ minutes down to ~10s). The uncollapsed cached time is dominated by deserializing the ~1GB JSON blob; the uncollapsed uncached query is a full self-joining matview scan that takes over 2 minutes.

Add WithCompression() option to CacheSet that gzip-compresses the
JSON before writing to Redis. Both cache read paths auto-detect gzip
via magic header bytes and decompress transparently.

Only the test results cache primer uses compression for now, as the
uncollapsed result set is ~1GB of JSON. All other callers are
unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

2 similar comments
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@dgoodwin
Copy link
Copy Markdown
Contributor

dgoodwin commented May 7, 2026

This is huge and that benchmark comment answers the main question I had, having the option to see the uncompressed list again would be handy, I had to give up on that some time ago.

@dgoodwin
Copy link
Copy Markdown
Contributor

dgoodwin commented May 7, 2026

/lgtm
/hold

Just making sure I don't step on TRT's toes if they want a crack at this. Thanks for improving this, will help daily.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 7, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stbenjam stbenjam changed the title TRT-2575: perf: test report cache priming, pagination, and query optimization perf: test report cache priming, pagination, and query optimization May 7, 2026
@openshift-ci-robot openshift-ci-robot removed the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@stbenjam: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

Details

In response to this:

Cache priming for test results

The test report cache loader now primes both collapsed and non-collapsed results in Redis, eliminating cold-cache latency for the most common test report views. Previously only collapsed results were primed, leaving non-collapsed queries (the detailed per-variant NURP+ view) to hit the database on first access.

To avoid unnecessary work, the cache primer now only targets OCP development releases (identified by having the payloadTags capability and no GA date set). GA releases and non-OCP products (OKD, etc.) are skipped.

Server-side pagination

The /api/tests endpoint previously returned all matching rows (up to 50k+ for uncollapsed views), causing the tests page to barely load. This adds server-side pagination following the existing pattern used by the job runs endpoint.

When perPage/page query parameters are present, the backend:

  • Applies ORDER BY, COUNT, LIMIT, and OFFSET at the SQL level
  • Returns a PaginationResult envelope with rows, total_rows, page_size, page
  • Bypasses the cache (paginated queries are fast with LIMIT/OFFSET)

When pagination params are absent, existing behavior is preserved for backward compatibility.

Frontend changes switch the DataGrid to paginationMode="server" and send perPage/page params, following the JobRunsTable pattern.

Filter pushdown (~830x improvement for filtered queries)

When collapse=false, TestsByNURPAndStandardDeviation builds a query that self-joins prow_test_report_7d_matview 3 times. Name/variant filters were only applied to the outermost query, causing subqueries to scan all rows for the release. Filters are now pushed into the stats and pass_rates subqueries, allowing index use on cache misses and paginated queries.

Replaces #3290 (rebased on main; original e2e failure was an unrelated sippy-load-job timeout).

Summary by CodeRabbit

Release Notes

  • New Features

  • Added server-side pagination to the Tests API endpoint with perPage and page parameters for efficient data retrieval

  • Test results table now supports server-side pagination for improved performance with large datasets

  • Pagination responses include total row count information for navigation

  • Documentation

  • Updated Tests API documentation with pagination parameters and new response structure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@stbenjam
Copy link
Copy Markdown
Member Author

stbenjam commented May 7, 2026

I am not sure this is really the right approach, the caching is clunky and bad, @smg247 is going to look at it, if we can't figure out something better we could try to take this but it is really putting a tuxedo on a toad.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 7, 2026

@stbenjam: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants