Skip to content

perf: change singleevents default order to occurreddate desc with supporting indices DHIS2-20991#23093

Merged
teleivo merged 1 commit intomasterfrom
DHIS2-20991
Mar 12, 2026
Merged

perf: change singleevents default order to occurreddate desc with supporting indices DHIS2-20991#23093
teleivo merged 1 commit intomasterfrom
DHIS2-20991

Conversation

@teleivo
Copy link
Copy Markdown
Contributor

@teleivo teleivo commented Feb 27, 2026

Change the default single event order from eventid desc to occurreddate desc, eventid desc and introduce OrderJdbcClause for consistent order clause building across all tracker JDBC stores.

The previous default order (eventid desc) is the primary key, which is not an orderable field in the API -- clients could not explicitly request it. The Capture app and Android SDK always send order=occurredAt:desc, so every request paid for an explicit order that should have been the default. Making occurreddate desc, eventid desc the default means requests without an explicit order param get the same indexed scan. The eventid tie-breaker ensures deterministic pagination.

Default order SQL

Before (master):

order by ev_id desc

After (PR):

order by ev_occurreddate desc, ev_id desc

Tie-breaker direction matching (OrderJdbcClause)

OrderJdbcClause is a shared utility used by all four tracker JDBC stores. When a user specifies an order like occurredAt:asc, the tie-breaker column (eventid) now matches that direction (asc). This enables a single forward or backward index scan on the composite index.

Before (order=occurredAt:asc):

order by ev_occurreddate asc, ev_id desc   -- mixed directions, cannot use single index scan

After:

order by ev_occurreddate asc, ev_id asc    -- same direction, single backward index scan

PostgreSQL B-tree indices support scanning in both directions. An index on (a, b) can serve both order by a desc, b desc (forward scan) and order by a asc, b asc (backward scan). But mixed directions like a asc, b desc require a sort -- the index cannot deliver rows in that order.

Index change (V2_43_56)

Extends the V2_43_50 occurreddate index with eventid as trailing column to cover the new default order without a sort:

drop index if exists in_singleevent_programstageid_occurreddate;
create index in_singleevent_programstageid_occurreddate
    on singleevent (programstageid, occurreddate, eventid);

singleevent indices after this PR:

Index Columns Changed
singleevent_pkey (eventid)
unique_singleevent_uid (uid)
in_singleevent_programstageid_occurreddate (programstageid, occurreddate, eventid) V2_43_56 -- was (programstageid, occurreddate)
in_singleevent_programstageid_organisationunitid (programstageid, organisationunitid)
in_singleevent_programstageid_assigneduserid (programstageid, assigneduserid)

Test Data

Sierra Leone database with generated test data:

Table Rows Table size Total (incl. indices)
singleevent 10,317,816 4.3 GB 5.5 GB

Generated by inserting additional data into the Sierra Leone base database for the Malaria programme (VBqh0ynB2wv, single events).

Test Users

User Org unit scope Facilities Events in scope
admin root (level 1), F_TRACKED_ENTITY_INSTANCE_SEARCH_IN_ALL_ORGUNITS authority, COC sharing enforced 1,166 10,317,816
super root (level 1), ALL authority (superuser), COC sharing skipped 1,166 10,317,816
restricted capture: Ngelehun CHC (level 4), search: 2 facilities (level 4) 2 17,574
district capture: Ngelehun CHC (level 4), search: Bo district (level 2) 125 1,092,678

Performance

EXPLAIN ANALYZE execution time (ms), warmup=2. 10M single events in one programme stage.

Summary

Default order (the change in this PR) -- fast for all users. Admin/super have no org unit filter (10M events), restricted/district have a search scope filter (17K/1M events). The composite index delivers rows in occurreddate, eventid order without a sort:

Request params User Events in scope Master (ms) PR (ms) Diff
(no order, no filters) admin 10M 10.8 10.0 -7%
(no order, no filters) super 10M 15.7 9.9 -37%
(no order, no filters) restricted 17K 13.3 17.1 +29%
(no order, no filters) district 1M 10.2 11.2 +10%

Default order + date range -- improved. occurredAfter=2024-07-01&occurredBefore=2024-12-31 narrows to ~8,800 events. The composite index covers both the range filter and the sort:

Request params User Master (ms) PR (ms) Diff
orgUnit=DiszpKrYNg8&orgUnitMode=SELECTED&occurredAfter=2024-07-01&occurredBefore=2024-12-31 admin 68.4 45.9 -33%
same restricted 71.2 46.9 -34%
same district 72.5 46.5 -36%

Non-indexed orders, broad scope -- pre-existing, unchanged. No index covers createdAt, updatedAt, data element, or assignedUserDisplayName ordering. PG must scan and sort all events in scope. Restricted user (17K events) is fine:

Request params User Events in scope Master (ms) PR (ms) Diff
order=createdAt:desc admin 10M 6,412 6,485 +1%
order=updatedAt:desc admin 10M 6,503 6,623 +2%
order=qrur9Dvnyt5:asc (data element) admin 10M 8,042 7,998 -1%
orgUnitMode=ACCESSIBLE&order=updatedAt:desc restricted 17K 185 181 -2%
orgUnitMode=ACCESSIBLE&order=updatedAt:desc district 1M 3,547 3,706 +4%

Org unit filter, few/zero events + broad user access -- pre-existing, unchanged. Index competition: planner picks the occurreddate index expecting to find matches quickly, scans all 10M rows. See Limitations > Index competition:

Request params User Master (ms) PR (ms) Diff
orgUnit=O6uvpzGd5pu&orgUnitMode=SELECTED&order=occurredAt:desc admin 11,757 11,865 +1%
orgUnit=O6uvpzGd5pu&orgUnitMode=CHILDREN&order=occurredAt:desc admin 16,030 16,184 +1%
Full benchmark results (93 queries)

All timings: EXPLAIN ANALYZE execution time (ms), warmup=2.

Single Events (/api/tracker/events?program=VBqh0ynB2wv, 10M events)

# Scenario Master (ms) PR (ms) Diff
01 base order=occurredAt:desc (admin) 14.4 13.0 -10%
01 base order=occurredAt:desc (super) 15.7 9.9 -37%
01 base order=occurredAt:desc (restricted) 14.9 18.0 +21%
01 base order=occurredAt:desc (district) 12.2 10.0 -18%
02 DESCENDANTS order=occurredAt:desc (admin) 12.8 10.0 -22%
02 DESCENDANTS order=occurredAt:desc (restricted) 19.4 15.2 -22%
02 DESCENDANTS order=occurredAt:desc (district) 16.3 17.3 +6%
03 CHILDREN order=occurredAt:desc (admin) 16,030 16,184 +1%
03 CHILDREN order=occurredAt:desc (restricted) 19.3 15.3 -21%
03 CHILDREN order=occurredAt:desc (district) 17.3 14.4 -17%
04 SELECTED order=occurredAt:desc (admin) 11,757 11,865 +1%
04 SELECTED order=occurredAt:desc (restricted) 16.7 17.6 +5%
04 SELECTED order=occurredAt:desc (district) 16.2 14.6 -10%
05 ALL order=occurredAt:desc (admin) 16.0 10.1 -37%
06 status=ACTIVE order=occurredAt:desc (admin) 16.0 12.9 -19%
07 status=COMPLETED order=occurredAt:desc (admin) 140.5 151.3 +8%
08 status=VISITED order=occurredAt:desc (admin) 16.5 10.0 -39%
09 date range order=occurredAt:desc (admin) 11.3 10.0 -12%
10 updatedWithin=30d order=occurredAt:desc (admin) 142.0 144.2 +2%
11 updated date range order=occurredAt:desc (admin) 155.1 149.2 -4%
12 assignedUserMode=NONE order=occurredAt:desc (admin) 16.0 11.1 -31%
13 assignedUserMode=ANY order=occurredAt:desc (admin) 144.9 148.3 +2%
14 assignedUserMode=PROVIDED order=occurredAt:desc (admin) 160.7 155.4 -3%
15 events filter order=occurredAt:desc (admin) 9.9 10.2 +3%
16 includeDeleted=true order=occurredAt:desc (admin) 16.2 10.0 -38%
17 data element filter order=occurredAt:desc (admin) 15.8 10.1 -36%
18 order=occurredAt:desc (admin) 20.0 12.3 -39%
19 order=createdAt:desc (admin) 6,412 6,485 +1%
20 order=updatedAt:desc (admin) 6,503 6,623 +2%
21 order=qrur9Dvnyt5:asc (admin) 8,042 7,998 -1%
22 totalPages=true order=occurredAt:desc (admin) 15.6 11.0 -29%
23 dataElementIdScheme=CODE order=occurredAt:desc (admin) 16.5 11.0 -33%
24 combined filters order=occurredAt:desc (admin) 12.3 10.4 -15%
29 status=ACTIVE + date range order=occurredAt:desc (admin) 17.3 9.9 -43%
30 ACCESSIBLE order=occurredAt:desc (admin) 15.6 10.9 -30%
30 ACCESSIBLE order=occurredAt:desc (restricted) 17.4 15.6 -10%
30 ACCESSIBLE order=occurredAt:desc (district) 12.7 11.1 -13%
32 default order (admin) 10.8 10.0 -7%
32 default order (restricted) 13.3 17.1 +29%
32 default order (district) 10.2 11.2 +10%
33 order=occurredAt:asc (admin) 11.3 10.3 -9%

Working Lists (programme stage pTo4uMt3xur, Ngelehun CHC)

# Scenario Order Master (ms) PR (ms) Diff
25d SELECTED (admin) default 21.8 35.8 +64%
25d SELECTED (restricted) default 17.6 34.8 +98%
25d SELECTED (district) default 18.8 33.5 +78%
25a SELECTED (admin) occurredAt:desc 30.8 36.6 +19%
25a SELECTED (restricted) occurredAt:desc 31.3 29.7 -5%
25a SELECTED (district) occurredAt:desc 30.6 34.8 +14%
25b SELECTED (admin) updatedAt:desc 57.8 59.3 +3%
25b SELECTED (restricted) updatedAt:desc 56.6 61.1 +8%
25b SELECTED (district) updatedAt:desc 57.6 60.5 +5%
25c SELECTED (admin) assignedUserDisplayName:desc 64.8 63.5 -2%
25c SELECTED (restricted) assignedUserDisplayName:desc 61.7 65.1 +6%
25c SELECTED (district) assignedUserDisplayName:desc 64.2 62.7 -2%
26d SELECTED + date range (admin) default 68.4 45.9 -33%
26d SELECTED + date range (restricted) default 71.2 46.9 -34%
26d SELECTED + date range (district) default 72.5 46.5 -36%
26a SELECTED + date range (admin) occurredAt:desc 48.4 48.7 +1%
26a SELECTED + date range (restricted) occurredAt:desc 48.1 48.2 0%
26a SELECTED + date range (district) occurredAt:desc 47.3 46.7 -1%
26b SELECTED + date range (admin) updatedAt:desc 49.3 47.8 -3%
26b SELECTED + date range (restricted) updatedAt:desc 47.3 47.3 0%
26b SELECTED + date range (district) updatedAt:desc 47.6 46.1 -3%
26c SELECTED + date range (admin) assignedUserDisplayName:desc 48.7 46.9 -4%
26c SELECTED + date range (restricted) assignedUserDisplayName:desc 47.8 50.4 +5%
26c SELECTED + date range (district) assignedUserDisplayName:desc 46.1 48.3 +5%
27d DESCENDANTS (admin) default 25.7 39.3 +53%
27d DESCENDANTS (restricted) default 25.7 34.4 +34%
27d DESCENDANTS (district) default 24.6 33.4 +36%
27a DESCENDANTS (admin) occurredAt:desc 42.6 32.8 -23%
27a DESCENDANTS (restricted) occurredAt:desc 35.3 32.9 -7%
27a DESCENDANTS (district) occurredAt:desc 33.6 34.9 +4%
27b DESCENDANTS (admin) updatedAt:desc 64.5 67.7 +5%
27b DESCENDANTS (restricted) updatedAt:desc 60.1 57.0 -5%
27b DESCENDANTS (district) updatedAt:desc 65.0 60.5 -7%
27c DESCENDANTS (admin) assignedUserDisplayName:desc 66.9 73.5 +10%
27c DESCENDANTS (restricted) assignedUserDisplayName:desc 60.0 65.1 +9%
27c DESCENDANTS (district) assignedUserDisplayName:desc 63.8 63.2 -1%
28d ALL (admin) default 9.9 10.2 +3%
28a ALL (admin) occurredAt:desc 15.8 10.5 -34%
28b ALL (admin) updatedAt:desc 6,579 6,480 -2%
28c ALL (admin) assignedUserDisplayName:desc 8,792 8,686 -1%
31d ACCESSIBLE (admin) default 10.3 13.7 +33%
31d ACCESSIBLE (restricted) default 19.8 34.1 +72%
31d ACCESSIBLE (district) default 10.7 11.6 +8%
31a ACCESSIBLE (admin) occurredAt:desc 16.3 12.0 -26%
31a ACCESSIBLE (restricted) occurredAt:desc 36.6 35.0 -4%
31a ACCESSIBLE (district) occurredAt:desc 13.5 13.8 +2%
31b ACCESSIBLE (admin) updatedAt:desc 6,831 7,080 +4%
31b ACCESSIBLE (restricted) updatedAt:desc 185.2 181.3 -2%
31b ACCESSIBLE (district) updatedAt:desc 3,547 3,706 +4%
31c ACCESSIBLE (admin) assignedUserDisplayName:desc 8,989 9,377 +4%
31c ACCESSIBLE (restricted) assignedUserDisplayName:desc 185.0 190.9 +3%
31c ACCESSIBLE (district) assignedUserDisplayName:desc 3,859 3,864 0%

Limitations

Index competition

Two indices compete for the planner's attention:

  • (programstageid, occurreddate, eventid) -- avoids a sort on occurreddate but must scan all rows in the programme stage, filtering by org unit after the fact.
  • (programstageid, organisationunitid) -- filters by org unit first, then sorts the result.

The first index is better for queries where the org unit filter is unselective (ALL, ACCESSIBLE with a large scope) -- scan in order, stop after limit rows. The second is better when the org unit filter is selective (SELECTED on a single facility, DESCENDANTS of a small subtree) -- narrow the scan first, then sort within the result.

The planner doesn't always choose correctly. When the target org units have few or zero events, the planner may still pick the occurreddate index expecting to find matching rows early. It then scans all 10M rows, discarding each one, before returning an empty result. This is the root cause of the slow SELECTED and CHILDREN queries for admin (Bo district) -- a large org unit where the index scan must visit many rows.

Non-indexed ordering

All orderable fields except occurredAt require a full sort of all matching rows regardless of page size -- no index covers these orderings. createdAt, updatedAt, and data element orders are 6-9s on 10M events.

Total pages

totalPages=true always runs a full count query. Cost is O(N) in matching rows.

What we tried

The original goal of DHIS2-20991 was to fix the slow SELECTED/CHILDREN queries for single events (admin: SELECTED 11.7s, CHILDREN 16s on 10M events). Several approaches were explored and rejected because they all hit the index competition problem described above:

Denormalize organisationunitpath onto singleevent

Store organisationunit.path directly on singleevent so org unit filters (SELECTED, DESCENDANTS) become predicates on the event row without joining organisationunit. Maintained by two triggers: one on singleevent insert/update, one cascading organisationunit.path changes. Added a composite index (programstageid, organisationunitpath, occurreddate, eventid).

Regression: the planner must choose between the path-based index (filters by org unit first, then sorts) and the (programstageid, occurreddate, eventid) index (scans in order, stops at limit). For org units with few or zero events in the programme, the planner picks the occurreddate index expecting to find matches quickly, then scans all 10M rows. The path denormalization just shifts which index the planner over-estimates. In benchmarking it did not improve the slow cases and added trigger/sync complexity.

Denormalize hierarchylevel onto singleevent

Extended the path denormalization to also store organisationunit.hierarchylevel on singleevent, eliminating the OU join for CHILDREN mode (which needs the level check). Same index competition problem as above.

Composite index (programstageid, organisationunitid, occurreddate, eventid)

Extend the existing (programstageid, organisationunitid) index with occurreddate, eventid trailing columns so org unit-scoped queries can both filter and sort from the same index.

Regression: the planner now has three competing indices and makes worse choices in some cases. For enrollments the same approach was tried with a materialized CTE to force plan selection (PR #22982, closed) -- the CTE defeats LIMIT pushdown, causing multi-second regressions when the org unit scope is large (e.g. national hospital owning 50% of events).

Why this PR is the safe subset

All approaches that try to fix the SELECTED/CHILDREN slow path introduce the same fundamental tension: the planner must choose between an index that filters by org unit (good for narrow scopes) and an index that scans in sort order (good for broad scopes or no org unit filter). There is no single plan that handles both well. The planner's cost estimates for LIMIT queries are unreliable when data distribution varies between org units.

This PR avoids that problem entirely by only changing the default order and tie-breaker direction -- changes that affect the ORDER BY clause but not the WHERE clause. The SELECTED/CHILDREN slow path remains pre-existing on master, unchanged by this PR.

@teleivo teleivo force-pushed the DHIS2-20991 branch 8 times, most recently from 39ed422 to 43eef90 Compare February 27, 2026 14:08
@teleivo teleivo marked this pull request as ready for review March 5, 2026 09:16
@teleivo teleivo requested a review from a team as a code owner March 5, 2026 09:16
@ameenhere
Copy link
Copy Markdown
Contributor

What are our options?

Is the cost of additional round trip to the db worth thinking about? Like prefetching the orgunit for SELECTED/CHILDREN cases?

@muilpp
Copy link
Copy Markdown
Contributor

muilpp commented Mar 5, 2026

Even if PG actually respected our join_collapse_limit setting, we’d still hit the same problem whenever the org unit filter isn’t selective enough, right?

I did a bit of digging and found the planner parameter random_page_cost, which lets Postgres know how “expensive” it thinks an index scan is. You can set it per transaction, and by increasing it, Postgres might decide to avoid the index-first plan with LIMIT. It would probably influence the other joins in the query as well, but it might be worth experimenting with.

@teleivo teleivo force-pushed the DHIS2-20991 branch 3 times, most recently from 814eb0d to e801e12 Compare March 7, 2026 06:12
@teleivo teleivo changed the title perf: change default order to created desc and add supporting indices DHIS2-20991 perf: change singleevents default order to occurreddate desc with supporting indices DHIS2-20991 Mar 11, 2026
@teleivo teleivo force-pushed the DHIS2-20991 branch 10 times, most recently from 12bd13b to 0e175bb Compare March 11, 2026 07:56
@teleivo teleivo force-pushed the DHIS2-20991 branch 13 times, most recently from a28e351 to a7fdc10 Compare March 12, 2026 08:18
…-20991

Change default order from ev_id desc to occurreddate desc, eventid desc
to match what clients always request. Introduce OrderJdbcClause utility
for clean order clause building with tie-breaker direction matching the
first user-specified order. Extend the V2_43_50 singleevent occurreddate
index with eventid as trailing column.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Mar 12, 2026

@teleivo teleivo merged commit f022e1b into master Mar 12, 2026
16 checks passed
@teleivo teleivo deleted the DHIS2-20991 branch March 12, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants