PagePool: materialize the reserved module/command tab; plumb priority through definition lookups by habdelra · Pull Request #4863 · cardstack/boxel

habdelra · 2026-05-18T18:30:04Z

Why

Staging investigation of stuck indexing on ctse/ambitious-piranha (914 rows) and ctse/prudent-octopus (255 rows) found every errored row hitting the same fingerprint in timing_diagnostics:

renderStage = waiting-stability
queryLoadsInFlight > 0, all with source: search-resource:*
cardDocLoadsInFlight = [], inFlightModuleImports = []
affinitySnapshot.sameAffinityActivity carrying {kind: module, queue: module, state: queued} entries on the same affinity as the stuck card render

That is the documented self-referential prerender deadlock from the indexing-diagnostics skill rubric:

A card render (e.g. PolicyTrackingDashboard.json) renders and the template fires <Search @query={{filter: {type: {module: '.../cohort.gts', name: 'Cohort'}}}}> against _federated-search.
The search needs the Cohort card definition to plan SQL.
CachingDefinitionLookup.lookupDefinition misses (the modules table doesn't have a row for cohort.gts).
It spawns a prerenderModule call for cohort.gts — same affinity.
The affinity already has its tabs held by the dashboard render that's awaiting the search. The sub-module-prerender queues on the per-tab queue.
The dashboard render waits on the search → which waits on the definition → which waits on the queued module render → which waits on the dashboard's tab. Deadlock.
After 90s, the render-timeout fires and the row gets written as has_error.

The cycle was reproduced in 6 distinct erroring rows in ambitious-piranha and 2 in prudent-octopus. Every recent indexing job for ambitious-piranha has been rejected for ~similar reasons — the deadlock can also kill the whole batch when it gets caught by the manager-side 150s /prerender-visit timeout (job 410276 today fits that shape).

What this PR changes

Two narrowly-scoped fixes that together close the deadlock. Conceptually independent enough to review separately, but they belong in one PR because they fix the same underlying bug from two angles.

1. `PagePool.#selectEntryForAffinity`: materialize the reserved module/command tab

The file-admission cap (`#fileAdmissionCap = max(1, affinityTabMax - 1)`) is supposed to keep one tab's worth of global-pool headroom per affinity for module/command work. But the cap only limits file callers — it doesn't produce the reserved tab. The existing spawn branch in #selectEntryForAffinity had only three ways to actually create a tab for a non-file caller:

Claim an orphan context (only if the affinity had a previously-evicted shared BrowserContext lying around).
Commandeer a dormant standby (only if the standby pool happened to have a spare).
The canExpandPastAffinityCap high-priority tier expansion (dormant in default config — requires PRERENDER_PAGE_POOL_HIGH_PRIORITY_MAX and PRERENDER_HIGH_PRIORITY_THRESHOLD both set).

When (1) and (2) both miss and (3) is dormant, the call falls through to #selectLeastPendingTab(entryList) and queues on the existing busy tab. That is the deadlock state.

The fix adds a fourth path between the failed commandeer and the canExpandPastAffinityCap block: if the caller is non-file (queue !== 'file') and the affinity is under cap (`entryList.length < affinityTabMax`), synchronously await #ensureStandbyPool and retry commandeer. Same shape as the brand-new-affinity branch (`entryList.length === 0`) immediately below — generalized to existing affinities with at least one already-busy tab.

The fix is independent of priority. The reservation invariant says module/command work always gets a tab when one is available under the per-affinity cap; gating that on priority would mean the invariant only holds for some callers, which is the bug, not the contract.

Why this is safe at scale:

#ensureStandbyPool respects #maxPages via #prepareSlotForStandby (caps total contexts at #maxPages + 1, LRU-evicts if needed). It cannot oversubscribe the global pool.
#commandeerDormantTab({ standbyOnly: true }) only succeeds against a real standby — no cross-affinity steal.
File callers still acquire #fileAdmissionCap upstream of #selectEntryForAffinity, so cross-realm fairness for file workload is unchanged.

2. `CachingDefinitionLookup`: plumb priority through definition sub-renders

Today getModuleDefinitionsViaPrerenderer calls prerenderer.prerenderModule({ affinityType, affinityValue, realm, url, auth }) — no priority argument. That means user-initiated reindex work (priority 10) silently downgrades to priority 0 for its definition-lookup sub-prerenders. The high-priority tier expansion (the third tab-creation path above) is gated on caller priority, so this silent downgrade also prevents the existing escape hatch from firing when it could have.

Added a narrow public option type DefinitionLookupOptions = { priority?: number } and threaded it through:

DefinitionLookup.getModuleCacheEntry(url, opts?) (interface + both implementations)
loadModuleCacheEntry / loadModuleCacheEntryUncached / loadModuleCacheEntryCoordinated (private args)
getModuleDefinitionsViaPrerenderer(url, realmURL, userId, priority?)
prerenderer.prerenderModule({ ..., priority })

prerenderModule and downstream already accepted and honored priority end-to-end (render-runner.ts passes it to getPage which uses it for per-tab queue / admission semaphore / global render semaphore priority-aware dequeue). The chain was plumbing-only on the lookup side.

LookupContext (the existing internal type with { requestingRealm }) was the wrong public surface — it carries internal lookup state callers shouldn't see. The new DefinitionLookupOptions is the narrow public option type.

Callers updated:

IndexRunner.preWarmModulesTable now passes { priority: this.#jobPriority }. this.#jobPriority is already threaded from the worker job's JobInfo through to IndexRunner (existing infrastructure; we just stopped throwing it away on the last hop).
RealmIndexQueryEngine.searchCards HTTP path: no notion of request priority on the realm-server side; passes undefined (defaults to 0), preserving current behavior.

What this does NOT do

Does not address the broader "pre-warm walks the wrong modules" problem — that's coming in PR B. PR B uses reader.mtimes() to enumerate every .gts/.gjs in the realm and pre-warms all of them (not just the ones in per-row deps). PR B reduces how often the page-pool deadlock-prevention path even has to fire.
Does not parallelize pre-warm. That's PR C, which lands after PR B's measurement establishes a baseline.
Does not touch clearRealmCache's post-reindex realm-wide wipe (which re-empties the modules table after every full reindex). Out-of-scope follow-up.

Tests

`tests/page-pool-standby-refill-test.ts`

New test: 'module call on an existing affinity with all owned tabs busy materializes a fresh tab'.

Holds a file render on affinity A (one tab, not released), then fires a module call on A. Asserts the module call resolves promptly with a different pageId than the held tab — both that no deadlock occurred and that the module render landed on its own tab. Bounded by a 2s timer so a regression fails the test cleanly instead of hanging.

`tests/definition-lookup-test.ts`

Two new tests:

'getModuleCacheEntry forwards priority to prerenderModule' — captures the priority arg the mock prerenderer receives. Asserts:
- An explicit { priority: 10 } forwards as 10.
- A subsequent call on a cached entry short-circuits at the cache (priority isn't even consulted).
- An invalidation + omitted priority forwards as undefined (which the prerender server treats as 0).
'getModuleCacheEntry caches non-card modules as no-card markers' — direct contract test for the no-card-marker behavior PR B's cost analysis depends on. Stubs the mock prerenderer to return { status: 'ready', definitions: {} }. Asserts:
- The empty-definitions row persists to the modules table (verified by direct SQL).
- A second getModuleCacheEntry short-circuits at the cache without re-invoking the prerenderer.

Both tests live in the existing definition-lookup-test.ts module structure and run in <100ms each.

Verification

Local (full realm-server suite takes too long for in-loop validation — relying on CI for full regression):

```
$ pnpm --filter @cardstack/realm-server run test --filter 'DefinitionLookup'
ok 1 .. 28 (all 28 DefinitionLookup tests pass, including 2 new)
$ pnpm --filter @cardstack/realm-server run test --filter 'module call on an existing'
ok 1 page-pool-standby-refill-test.ts > module call on an existing affinity with all owned tabs busy materializes a fresh tab
```

Both pnpm --filter @cardstack/runtime-common run lint and pnpm --filter @cardstack/realm-server run lint:js clean for the files this PR touches. (The pre-existing prettier baseline in generated realms/.../bxl-chunks/*.ts files is untouched by this PR.)

After merge + staging deploy, verification will be:

Trigger _full-reindex on ctse/prudent-octopus and ctse/ambitious-piranha.
Check boxel_index rows for those realms: `SELECT count(*) FILTER (WHERE has_error) FROM boxel_index WHERE realm_url = ?` — expect zero (vs. 8 today across the two realms).
Spot-check timing_diagnostics->'affinitySnapshot'->'sameAffinityActivity' on the heaviest rows (dashboards, cohorts): no {state: queued} entries.
Grep prerender-server logs for the dashboard's requestId=…: confirm any sub-module-prerender ran on a fresh pageId distinct from the dashboard's pageId.

Test plan

CI passes on the new branch
Staging deploy + _full-reindex of ctse/prudent-octopus lands with zero errored rows
Staging deploy + _full-reindex of ctse/ambitious-piranha lands with zero errored rows
Spot-check affinitySnapshot.sameAffinityActivity on the heaviest rows post-fix — no state: queued entries

🤖 Generated with Claude Code

… through definition lookups Two narrowly-scoped fixes for the self-referential prerender deadlock that surfaces when a card render fires `_federated-search`, the search needs a card definition not yet in the modules table, `CachingDefinitionLookup` spawns a `prerenderModule` for the missing module, and that sub-prerender queues behind the very tab the file render is holding. 1. PagePool#selectEntryForAffinity: when a non-file caller enters the spawn branch and both orphan-claim + standby-commandeer miss, await `#ensureStandbyPool` synchronously and retry commandeer — regardless of priority and regardless of whether the high-priority tier is configured. The file-admission cap reserves global-pool headroom for one module/command slot per affinity; this change actually produces the tab that headroom was being held for. Mirrors the existing brand- new-affinity branch (entryList.length === 0) one branch further down. `#ensureStandbyPool` respects `#maxPages` via `#prepareSlotForStandby`, so this can't oversubscribe the global pool. 2. `CachingDefinitionLookup.getModuleDefinitionsViaPrerenderer`: thread priority through `prerenderModule(...)`. Today the call drops priority on the floor — user-initiated reindex work (priority 10) silently downgrades to priority 0 for its definition-lookup sub-prerenders. `LookupContext` was the wrong public surface (it carries `requestingRealm`, internal); added a narrower `DefinitionLookupOptions` type with `{ priority?: number }` instead and threaded it through `getModuleCacheEntry` → `loadModuleCacheEntry{,Uncached,Coordinated}` → `getModuleDefinitionsViaPrerenderer` → `prerenderModule`. IndexRunner's pre-warm call site passes `this.#jobPriority`. Tests: - `page-pool-standby-refill-test.ts`: held file render on affinity A + a module call on A produces a fresh tab synchronously, not queued on the busy tab. Bounded with a 2s timeout so a regression fails cleanly instead of hanging. - `definition-lookup-test.ts`: priority forwards end-to-end; cached entry skips re-prerender even when priorities differ; default priority forwards as `undefined` (which the prerender server treats as 0). Non-card-module path: `definitions: {}` persists to the modules table and short-circuits subsequent lookups, confirming the no-card- marker behavior the pre-warm cost analysis depends on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1cf2626ea3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

This PR addresses a prerender/indexing deadlock where same-affinity module/command sub-renders can get queued behind a busy file render that is itself waiting on the sub-render (self-referential wait). It does this by ensuring a reserved module/command tab is actually materialized when available, and by propagating job priority through definition-lookup-triggered sub-prerenders so higher-priority work doesn’t silently downgrade.

Changes:

PagePool: add a synchronous standby-refill + retry-commandeer path for non-file callers under the per-affinity cap, to ensure the reserved module/command tab is actually created.
DefinitionLookup: introduce DefinitionLookupOptions { priority?: number } and thread it through to prerenderer.prerenderModule({ priority }).
Tests: add coverage for the standby-refill behavior and for priority plumbing + empty-definitions (“no-card marker”) caching.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
packages/runtime-common/index-runner.ts	Passes the indexing job priority into module pre-warm definition lookups.
packages/runtime-common/definition-lookup.ts	Adds `DefinitionLookupOptions` and forwards `priority` through module cache population/prerender.
packages/realm-server/prerender/page-pool.ts	Adds a refill+retry path to materialize a standby for non-file callers under the affinity cap.
packages/realm-server/tests/page-pool-standby-refill-test.ts	Adds a regression test intended to ensure a module call on a busy affinity gets a fresh tab.
packages/realm-server/tests/definition-lookup-test.ts	Adds tests for priority forwarding and caching empty-definitions module rows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The `current < desired` guard around the new synchronous refill in #selectEntryForAffinity skipped the await when a refill was already in flight. `#currentStandbyCount() = #standbys.size + #creatingStandbys`, so when a file render just consumed the only standby and its post- acquire kickStandbyRefill is mid-creation, creatingStandbys > 0 inflates current to match desired even though no real standby exists yet. The follow-up commandeer then fails and the caller falls through to the busy-tab branch — exactly the deadlock the change was meant to prevent. Drop the guard. #ensureStandbyPool deduplicates internally via #ensuringStandbys, so an unconditional await is a no-op when there is nothing to do and waits on the in-flight refill when there is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Symmetry with the previous commit. `#selectEntryForAffinity`'s brand-new-affinity branch (entryList.length === 0) had the same `current < desired` guard around its `await #ensureStandbyPool()`. Same race applies: `#currentStandbyCount` includes `#creatingStandbys`, so a refill in flight on another affinity (or this one's pre-acquire kick) can inflate `current` to match `desired` before any real standby exists. The brand-new caller would then skip the wait and fall through to cross-affinity steal rather than landing on the freshly-warmed standby a peer just kicked off — defeating the "prefer fresh standby over busy-tab queueing" intent documented in the surrounding comment. Drop the guard. Same reasoning as the non-file branch above: the `#ensuringStandbys` dedup makes the unconditional await a no-op when the pool is healthy and a wait when it isn't. Also broadens the comment on the non-file branch to clarify the no-op vs. wait behavior and note that the at-cap-without-expansion- budget case still falls through to busy-tab — that residual deadlock requires either operator capacity tuning or the high-priority tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T19:30:28Z

Preview deployments

Host staging preview

Host production preview

Host Test Results

1 files 1 suites 1h 48m 29s ⏱️
2 673 tests 2 658 ✅ 15 💤 0 ❌
2 692 runs 2 677 ✅ 15 💤 0 ❌

Results for commit de0548f.

Realm Server Test Results

1 files ± 0 1 suites +1 8m 44s ⏱️ + 8m 44s
1 414 tests +1 414 1 414 ✅ +1 414 0 💤 ±0 0 ❌ ±0
1 501 runs +1 501 1 501 ✅ +1 501 0 💤 ±0 0 ❌ ±0

Results for commit de0548f. ± Comparison against earlier commit 572f941.

…der); fix queues-same-realm regression Two changes addressing review feedback on the earlier PR A commits. 1. Priority threading through the lookupDefinition path The PR's previous priority plumbing covered `getModuleCacheEntry` (the pre-warm path) but not `lookupDefinition`, which is the path that fires sub-`prerenderModule` calls from `_federated-search` during a card render — i.e. the actual deadlock site. This closes the gap. Chain: - `runtime-common/prerender-headers.ts`: new `X_BOXEL_JOB_PRIORITY_HEADER` constant + `sanitizeJobPriorityHeader` sanitizer. Same scale as worker-job priority (0/10 etc), non-negative safe integer. - `realm-server/prerender/prerender-constants.ts`: re-exports as `PRERENDER_JOB_PRIORITY_HEADER` alongside `PRERENDER_JOB_ID_HEADER`. - `remote-prerenderer.ts`: producer sends the header on every outbound prerender request (visit / module / command / screenshot) when the attributes include `priority`. - `render-runner.ts`: prerenderVisitAttempt's CDP page.evaluate now injects `globalThis.__boxelJobPriority` alongside the existing `__boxelJobId`, scoped to the render. - `host/app/services/store.ts`: `jobPriorityHeader()` reads `__boxelJobPriority` and stamps `X-Boxel-Job-Priority` on outbound `_federated-search` fetches alongside the existing `X-Boxel-Job-Id` / `X-Boxel-Consuming-Realm` plumbing. - `realm-server/handlers/handle-search.ts`: extracts and sanitizes the header, threads `priority` into the search opts passed to `searchRealms`. - `realm-index-query-engine.ts`: `Options` carries `priority`; `lookupDefinitionForOpts` and the `_addCardConditions` populate- query-fields paths thread it into `lookupDefinition(codeRef, opts)`. - `definition-lookup.ts`: `DefinitionLookup.lookupDefinition` now accepts the existing `DefinitionLookupOptions` shape; `LookupContext` carries `priority`; `lookupDefinitionWithContext` threads it into the `loadModuleCacheEntry` args alongside the existing fields. The `RealmScopedDefinitionLookup` wrapper forwards opts to `lookupDefinitionForRealm`. User / API callers don't stamp the header (they aren't inside a prerender tab), so live traffic continues to fall back to priority 0 — same observable behavior as today. Indexer-driven card renders fire their sub-prerenders at the originating job's priority, which is useful for observability (timing_diagnostics.priority on errored rows) and for the high-priority-tier expansion path when those env vars are configured. 2. Revert brand-new-affinity branch's unconditional await The earlier commit removed the `current < desired` gate on the brand-new-affinity branch's `await #ensureStandbyPool()` in `#selectEntryForAffinity`, mirroring the same fix applied to the new non-file spawn-branch. Code-review feedback suggested it was symmetric — but it isn't, and CI caught the regression: `prerendering-test.ts > queues same-realm request when tab is transitioning` now times out. Root cause: the unconditional await yields one extra microtask even when `current >= desired` already. In the brand-new-affinity case that microtask shifts the ordering against a concurrent same-affinity file caller arriving on an idle tab — the same caller bumps `#queue.pendingCount` past 1 before the cross-affinity-steal scan, the scan's `pendingCount > 1` filter rejects, and the brand-new caller throws `'No standby page available for prerender'` instead of queueing. The gate is asymmetric on purpose: - Non-file spawn-branch (entryList.length > 0, queue != 'file'): await unconditionally — the cost of skipping is unbounded (the caller queues on the very tab whose work blocks it). - Brand-new-affinity branch (entryList.length === 0): keep the gate — the cost of skipping is bounded (a cross-affinity-steal hop, itself a designed fallback). Adding the extra microtask actively breaks an existing test that relies on the synchronous ordering. Comment updated to explain the asymmetry; the test that caught this is named in the comment for future reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@query

`preWarmModulesTable` previously walked each invalidation's `boxel_index.deps` and added any executable URLs it found. That misses modules referenced by *string* in card templates — a typical pattern in dashboard cards: <Search @query={{filter: {type: {module: '.../cohort.gts', name: 'Cohort'}}}}> `cohort.gts` is referenced as a string parameter to the search filter, never imported into the dashboard's runtime module graph. So it never appears in the dashboard row's `deps`, never gets pre-warmed, and when the dashboard renders during indexing the `_federated-search` call fires a same-affinity `prerenderModule` for `cohort.gts` mid-card- render. That sub-prerender queues behind the dashboard's tab — the self-referential prerender deadlock. Add a realm-wide sweep on top of the existing per-row deps walk. Source is the filesystem-mtimes map: from-scratch already pays for the walk via `discoverInvalidations`, so we just filter and reuse it; incremental adds a fresh `reader.mtimes()` call before pre-warm (typical ~200 ms on a 500-file realm, one call per job, completely dominated by the indexing work that follows). Filter to `.gts` + `.gjs` only — cards can only live in template-bearing modules, so `.ts`/`.js` helpers are correctly excluded from the realm-wide layer. Helpers that ARE needed by a card already get pre-warmed through the existing per-row deps walk; the realm-wide layer is purely additive on top. Adds `hasCardExtension` / `cardExtensions` next to `hasExecutableExtension` / `executableExtensions` in `runtime-common/index.ts`, same shape — `.gts` and `.gjs` are the extensions a `CardDef` / `FieldDef` can live in because they're the ones with a Glimmer template surface. The modules cache treats `definitions: {}` (an empty result from a non-card module — e.g. a rare `.gts` helper) as a valid cache hit at `definition-lookup.ts:561-563`. So the worst case is one wasted prerender per non-card `.gts` per `clearRealmCache` cycle; subsequent lookups skip the prerender entirely. That contract is pinned by `getModuleCacheEntry caches non-card modules as no-card markers` in PR A. Cost analysis for ambitious-piranha (the staging realm where the deadlock fired): 9 `.gts` files + 9 `.ts` files = 18 executables, but only 9 `.gts` files in the realm-wide sweep (the `.ts` files are `bxl/*` helpers). Pre-warm is still serial (concurrency comes in a follow-up PR after baseline measurement). Roughly ~200 ms per file × 9 files = ~1.8 s additional pre-warm cost per from-scratch reindex. Stacked on top of PR #4863 (PagePool tab-materialization + priority plumbing). The two changes attack the same bug from different angles: PR #4863 makes sure the deadlock cannot happen when a sub-prerender DOES fire; this PR makes sure it almost never has to fire in the first place. PR #4863 is the load-bearing fix; this is the preventive layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

main renamed the modules-cache surface (clearRealmCache → clearRealmDefinitions, ModuleCacheEntry → DefinitionCacheEntry, getModuleCacheEntry → getCachedDefinitions, etc). Re-applied the priority-threading additions (`DefinitionLookupOptions` arg, `LookupContext.priority` field, plumb-through to `loadDefinitionCacheEntry`) on top of the renamed surface. Tests updated to call `getCachedDefinitions` rather than `getModuleCacheEntry`. No behavioral changes from the rename — the priority plumbing now uses the new method/type names. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…adlock-pre-warm-realm-modules

Companion change to PR #4863. That PR plumbed `x-boxel-job-priority` through every layer of the prerender / search stack, but only the producer-side prerender tab was stamping the header — user-facing fetches (the host SPA in a real user's browser, external API callers) sent no priority, so any sub-`prerenderModule` fired by `_federated-search` for a `lookupDefinition` cache miss queued behind concurrent background indexing at priority 0. User clicks that drive search are user-initiated work and should outrank background fan-out. Policy on the outbound `jobPriorityHeader()`: 1. Inside a prerender tab (`__boxelDuringPrerender === true`): forward `__boxelJobPriority` as-is, including 0. The render-runner sets the global from the worker job's priority; a 0 is meaningful (the originating job is system-initiated indexing) and must not be upgraded. Falls back to 0 if the global is absent. 2. Outside a prerender tab: stamp `userInitiatedPriority` (10) unless the caller has explicitly set `globalThis.__boxelJobPriority` to something else. Why gate on `__boxelDuringPrerender` rather than the presence of `__boxelJobPriority`: a prerender tab whose render-runner build predates the priority global, or a test fixture that didn't set it, should still be treated as prerender-context (default 0), not as live user traffic (default 10). Stacked on top of #4863 — that PR adds `X_BOXEL_JOB_PRIORITY_HEADER` and the `userInitiatedPriority` constant import is already wired through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drops the `hasCardExtension` helper added in the previous commit. Copilot's review caught a real case I'd missed: `packages/catalog-realm/commands/collect-submission-files.ts` defines `class CollectSubmissionFilesInput extends CardDef` — a card-defining `.ts` file, no Glimmer template. The `.gts`/`.gjs` extension filter would have skipped it on every realm-wide pre-warm sweep, and any `_federated-search` filter referencing it as a `type:` would have fired a same-affinity `prerenderModule` mid-card-render — the exact deadlock pattern this PR aims to prevent. Fix: realm-wide sweep now uses the existing `hasExecutableExtension` (`.gts` / `.gjs` / `.ts` / `.js`, excluding `.d.ts`). Non-card modules pre-warmed by the sweep persist to the modules table as empty- definitions rows (no-card markers — `definition-lookup.ts:561-563`), so subsequent `getCachedDefinitions` / `lookupDefinition` calls short-circuit at the cache without re-firing the prerenderer. That contract is pinned by the `getCachedDefinitions caches non-card modules as no-card markers` test in PR #4863. Variable renamed `allRealmCardModules` → `allRealmExecutables` and comments updated. The `hasCardExtension` helper + `cardExtensions` constant are removed entirely. Cost analysis update: ambitious-piranha goes from 9 candidates (just the `.gts` files) to 18 (9 `.gts` + 9 `.ts` helpers in `bxl/`). At ~200 ms per prerender × 18 modules with serial pre-warm = ~3.6 s additional cost on top of the from-scratch reindex. After the first sweep, the modules table caches the helper rows; subsequent reindexes only pay for modules that actually have card defs. Bounded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three issues from the Copilot review + Hassan's comment on PR #4866. 1. (Copilot) Reword doc comment to not imply external HTTP callers can affect this global. The helper only runs inside the host SPA's JS runtime. External HTTP callers — anything not in the host SPA — bypass the helper entirely and set `X-Boxel-Job-Priority` directly on their request if they care. 2. (Copilot) Tighten gating from truthy `if (duringPrerender)` to `if (duringPrerender === true)`. The global is typed as a boolean; a stray truthy string from a future code path shouldn't silently flip the policy from "user priority" to "preserve 0" — a real user-facing fetch would then queue behind background indexing. Strict-equality is the safer default. 3. (Hassan) Pin the policy with unit tests. Refactored the inline policy logic into an exported pure helper `resolveOutboundJobPriority({duringPrerender, jobPriority})` and wrote `tests/unit/job-priority-header-test.ts` covering: Outside prerender (user / API caller): - No globals → returns `userInitiatedPriority` - `__boxelDuringPrerender = false` → `userInitiatedPriority` - Explicit `__boxelJobPriority` override is honored (3 → 3, 0 → 0; 0 is NOT coerced to user priority) - Truthy but non-`true` `__boxelDuringPrerender` (`'yes'`, `1`) does NOT trigger forward-as-0 — guards against the issue above Inside prerender: - Explicit 10 → 10 - Explicit 0 → 0 (must not upgrade; preserves system-initiated) - Missing `__boxelJobPriority` → 0 (safe default for prerender- context work) - Malformed values (-1, 1.5, '10') fall through to the active branch's default `jobPriorityHeader` (the wrapper that reads the globals) is now a thin shim that calls `resolveOutboundJobPriority` with the actual globals. No behavioral change at the call sites. Note: I considered tightening `duringPrerenderHeaders()` similarly per Copilot's "and similarly in related helpers" suggestion, but left it alone — changing the existing `X-Boxel-During-Prerender` gating is broader scope than this PR and would change observable behavior at the realm-server's cache-only-definitions gate. Worth a separate cleanup if we want strict-boolean across all the related helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reverts the previous "broaden to all executables" commit. The case that broadening fixed — `.ts` files that host CardDef (rare; the only example in the repo today is command-input cards) — is handled correctly by the on-demand `lookupDefinition` read-through during the visit. The PagePool's new tab-materialization for module/command callers makes that on-demand path safe (the sub-prerender gets its own tab instead of queueing behind the render that triggered it), so the cost trade favors the narrow sweep: - Realm-wide pre-warm of `.gts` / `.gjs` only: ~9 prerenders × ~200 ms on a typical card-heavy realm. - Broader sweep including `.ts` / `.js` helpers: ~18+ prerenders, most of them producing no-card-marker rows for files that almost never define a card. This is an optimization on every reindex, not a correctness change. Comments updated to call out the trade-off honestly: `.ts` / `.js` CAN host card defs, the narrow filter exists because the on-demand path is now safe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The new x-boxel-job-priority request header needs to be listed in Access-Control-Allow-Headers so the browser's CORS preflight passes when prerender contexts on origin :4200 fetch :4201's _federated-search endpoint. Without this, the OPTIONS preflight succeeds (no header restriction) but the follow-up QUERY is blocked by Chrome with "Request header field x-boxel-job-priority is not allowed by Access-Control-Allow-Headers in preflight response", so every search fired by host code in a prerendering context silently fails. The priority routing feature this PR introduces depends on those searches landing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…adlock-pre-warm-realm-modules

…rity' into prerender-deadlock-user-api-priority

backspace · 2026-05-19T12:37:23Z

The test plan suggests this:

Staging deploy + _full-reindex of ctse/prudent-octopus lands with zero errored rows

Staging deploy + _full-reindex of ctse/ambitious-piranha lands with zero errored rows

but it doesn’t look like that happened?

habdelra · 2026-05-19T13:00:48Z

The test plan suggests this:

Staging deploy + _full-reindex of ctse/prudent-octopus lands with zero errored rows

Staging deploy + _full-reindex of ctse/ambitious-piranha lands with zero errored rows

but it doesn’t look like that happened?

with all the merges that happen (especially the day before sprint demo), i'm unsure I could get thru a full reindex without an independent merge reverting my deploy. i think its easier to merge, test, and then adjust. our workers are setup such that if indexing gets interrupted by a deploy, it should pick up where it left off after the deploy is finished

…gepool-priority # Conflicts: # packages/host/app/services/store.ts # packages/realm-server/server.ts

…rity' into prerender-deadlock-user-api-priority # Conflicts: # packages/host/app/services/store.ts

…riority Host: stamp user priority on outbound _federated-search by default

…ealm-modules Indexer: pre-warm every realm .gts/.gjs module, not just per-row deps

habdelra requested a review from Copilot May 18, 2026 18:30

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Comment thread packages/realm-server/prerender/page-pool.ts Outdated

Copilot started reviewing on behalf of habdelra May 18, 2026 18:32 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread packages/realm-server/prerender/page-pool.ts

Comment thread packages/realm-server/tests/page-pool-standby-refill-test.ts

habdelra mentioned this pull request May 18, 2026

Indexer: pre-warm every realm .gts/.gjs module, not just per-row deps #4864

Merged

4 tasks

habdelra and others added 5 commits May 18, 2026 15:41

Merge branch 'prerender-deadlock-pagepool-priority' into prerender-de…

978b020

…adlock-pre-warm-realm-modules

habdelra mentioned this pull request May 18, 2026

Host: stamp user priority on outbound _federated-search by default #4866

Merged

2 tasks

habdelra and others added 3 commits May 18, 2026 15:59

habdelra requested a review from a team May 18, 2026 20:51

habdelra and others added 3 commits May 18, 2026 17:33

Merge branch 'prerender-deadlock-pagepool-priority' into prerender-de…

a89c949

…adlock-pre-warm-realm-modules

Merge remote-tracking branch 'origin/prerender-deadlock-pagepool-prio…

18f5504

…rity' into prerender-deadlock-user-api-priority

This was referenced May 18, 2026

flaky test - wait-for-host-standby: retry puppeteer.launch with diagnostics #4872

Merged

Indexer: bounded-parallel pre-warm (concurrency=4) #4871

Closed

habdelra added 2 commits May 19, 2026 09:06

Merge remote-tracking branch 'origin/main' into prerender-deadlock-pa…

afe40c1

…gepool-priority # Conflicts: # packages/host/app/services/store.ts # packages/realm-server/server.ts

Merge remote-tracking branch 'origin/prerender-deadlock-pagepool-prio…

90bea13

…rity' into prerender-deadlock-user-api-priority # Conflicts: # packages/host/app/services/store.ts

burieberry approved these changes May 19, 2026

View reviewed changes

Merge pull request #4866 from cardstack/prerender-deadlock-user-api-p…

572f941

…riority Host: stamp user priority on outbound _federated-search by default

Merge pull request #4864 from cardstack/prerender-deadlock-pre-warm-r…

de0548f

…ealm-modules Indexer: pre-warm every realm .gts/.gjs module, not just per-row deps

habdelra merged commit 4bd2d4f into main May 19, 2026
74 of 75 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PagePool: materialize the reserved module/command tab; plumb priority through definition lookups#4863

PagePool: materialize the reserved module/command tab; plumb priority through definition lookups#4863
habdelra merged 18 commits into
mainfrom
prerender-deadlock-pagepool-priority

habdelra commented May 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026 •

edited

Loading

Uh oh!

backspace commented May 19, 2026

Uh oh!

habdelra commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

habdelra commented May 18, 2026

Why

What this PR changes

1. PagePool.#selectEntryForAffinity: materialize the reserved module/command tab

2. CachingDefinitionLookup: plumb priority through definition sub-renders

What this does NOT do

Tests

tests/page-pool-standby-refill-test.ts

tests/definition-lookup-test.ts

Verification

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview deployments

Host Test Results

Realm Server Test Results

Uh oh!

backspace commented May 19, 2026

Uh oh!

habdelra commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

1. `PagePool.#selectEntryForAffinity`: materialize the reserved module/command tab

2. `CachingDefinitionLookup`: plumb priority through definition sub-renders

`tests/page-pool-standby-refill-test.ts`

`tests/definition-lookup-test.ts`

github-actions Bot commented May 18, 2026 •

edited

Loading

habdelra commented May 19, 2026 •

edited

Loading