Skip to content

CS-11182 follow-up: widen cache invalidation to every reindex entry point#4900

Open
lukemelia wants to merge 1 commit into
mainfrom
cs-11182-broaden-cache-drop-to-worker
Open

CS-11182 follow-up: widen cache invalidation to every reindex entry point#4900
lukemelia wants to merge 1 commit into
mainfrom
cs-11182-broaden-cache-drop-to-worker

Conversation

@lukemelia
Copy link
Copy Markdown
Contributor

Summary

  • The original CS-11182 fix only fired the L2 module_transpile_cache bulk tombstone from Realm.startReindex's post-completion .then, which covers POST <realm>/_full-reindex and <realm>/_reindex but nothing else.
  • Production reindexes routed through /_grafana-reindex, /_grafana-full-reindex, /_post-deployment, the publish-realm flow (Realm.fullIndex), and direct enqueueReindexRealmJob calls all bypass startReindex. On staging this morning a Grafana-button reindex of the base realm completed cleanly but never tombstoned the L2 rows — clients kept being served pre-deploy bytes (the originating symptom for CS-11182: "All Files" still not visible after the deploy + reindex).
  • Move the cross-replica invalidation to the worker side of the from-scratch task: emit notifyAllFileChanges(dbAdapter, realmURL) after IndexRunner.fromScratch returns. Every replica's existing realm_file_changes wildcard listener picks it up and calls realm.clearLocalSourceCaches() (sync L1 wipe + fire-and-forget L2 bulk tombstone). One chokepoint covers every from-scratch trigger uniformly — current and future.

The original Realm.startReindex .then is now strictly belt-and-suspenders for the POST /_full-reindex path, which is fine to leave in place.

Test plan

  • pnpm test-module module-cache-race-test.ts — new test asserts L2 rows are tombstoned after a reindex triggered via realm.realmIndexUpdater.fullIndex (the bypass path that doesn't wire up startReindex's .then).
  • Verified the test fails without the fix (1 live L2 row vs. 0 expected) by stashing the indexer.ts change and re-running just the new test.
  • After deploy to staging: hit the Grafana reindex button on base; confirm cards-grid.gts flips from 13579 bytes (stale) to ~15573 bytes (new "All Files" code) and module_transpile_cache.body IS NULL for the realm's rows.

🤖 Generated with Claude Code

The original CS-11182 fix only fired the L2 `module_transpile_cache`
bulk tombstone from `Realm.startReindex`'s post-completion `.then` —
which covered POST `<realm>/_full-reindex` and `<realm>/_reindex` but
nothing else.

Production reindexes triggered via the operator-action endpoints
(`/_grafana-reindex`, `/_grafana-full-reindex`, `/_post-deployment`),
the publish-realm flow (`Realm.fullIndex`), and direct
`enqueueReindexRealmJob` calls all bypassed `startReindex`. On staging
today, a Grafana-button reindex of the base realm completed without
ever tombstoning the L2 rows, so clients continued to be served
pre-deploy bytes.

Emit `notifyAllFileChanges(dbAdapter, realmURL)` from the worker side
of `fromScratchIndex`, right after `IndexRunner.fromScratch` returns.
The existing `realm_file_changes` wildcard listener on every replica
then calls `realm.clearLocalSourceCaches()` — synchronous L1 wipe plus
the fire-and-forget L2 bulk tombstone. One chokepoint covers every
from-scratch trigger uniformly, including future ones.

The regression test in `module-cache-race-test.ts` drives a reindex
through `realm.realmIndexUpdater.fullIndex` (the bypass path that
never wires up the original `startReindex` callback) and asserts the
L2 rows are tombstoned. Verified to fail without the fix and pass
with it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Host Test Results

    1 files      1 suites   1h 49m 0s ⏱️
2 712 tests 2 697 ✅ 15 💤 0 ❌
2 731 runs  2 716 ✅ 15 💤 0 ❌

Results for commit 20dd768.

Realm Server Test Results

    1 files      1 suites   8m 29s ⏱️
1 453 tests 1 452 ✅ 0 💤 1 ❌
1 544 runs  1 543 ✅ 0 💤 1 ❌

Results for commit 20dd768.

For more details on these errors, see this check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant