External opencode plugin that manages multiple ChatGPT Pro/ProLite/Plus/Team Codex OAuth accounts with quota-aware core/pool preference and priority-based 429 failover.
This plugin hijacks provider: "openai" via auth.loader. The built-in Codex plugin runs first (model filtering, cost zeroing), then this plugin supplies a dummy OAuth apiKey plus a replacement fetch via mergeDeep.
Core auth.json stores type: "oauth" for the primary account so that isCodex = true in opencode's llm.ts:65, preserving exact Codex behavior parity (options.instructions, system prompt, maxOutputTokens).
- JSON config lives at
~/.config/opencode/codex-pool.json; the plugin auto-creates it with{ "fast-mode": "auto", "fast-mode-bias": 0, "sticky-mode": "always", "sticky-strength": 1, "dormant-touch": "new-session-only" }when missing, loads it during plugin initialization, and falls back to defaults with a warning toast if the file is invalid. - SQLite (
~/.local/share/opencode/codex-pool.db) is the sole runtime source of truth for account tokens, cooldown state, shared usage cache, dormant-window touch suppression, and the cross-process locks that coordinate refresh and usage revalidation. - Core
auth.jsonis a mirror of the primary account only, kept in sync forisCodexactivation, while additional accounts stay in SQLite and are represented in auth state through the inert shadow provider. - Auth methods expose primary login, pool-account addition, and a minimal
Edit pool accountsmanager that lists current non-primary rows and can delete a selected pool account after confirmation. - The built-in Codex
chat.headershook is not duplicated; it runs as-is. - 429 failover is strict priority-based (not round-robin) after request ordering has been decided.
- The loader also returns
OAUTH_DUMMY_KEYso the overridden fetch path still satisfies provider auth requirements while keeping Codex OAuth behavior active.
coremeans the primary mirrored OpenAI OAuth account (primary_account = 1);poolmeans every non-primary account in SQLite.- Routing computes a quota score for every currently available account, then reorders the whole candidate list by score. Higher score goes earlier; equal scores fall back to stored priority order.
- The quota signal comes from
https://chatgpt.com/backend-api/wham/usage, using the account's bearer token andChatGPT-Account-Idwhen that ID is known. If SQLite does not currently have a stored ChatGPT account ID for a row, the plugin still calls the usage endpoint without that header instead of skipping usage entirely. When the usage payload returnsaccount_id, the plugin persists that value back to SQLite as the row'schatgpt_account_idfor later requests and pool-account bookkeeping. - Each account's burn score is weighted by plan type: Plus/Team/default =
1, ProLite =sqrt(5)(~2.24), Pro =sqrt(20)(~4.47). Routing uses the official Plus-relative quota ratios and intentionally compresses them with a square root so larger plans remain favored without monopolizing selection. - Per-window score is
((plan_weight * (1 - used_percent / 100) * capacity) / (pace * conservation)) * health_factoronce a window is active.health_factoris a bounded multiplier derived fromremaining_capacity - remaining_time, giving healthy windows up to a small bonus and ahead windows up to a slightly larger penalty without overpowering plan weight or window capacity. Whenlimit_window_secondsis available,pace = max(reset_after_seconds / limit_window_seconds, 0.000001)normalises for elapsed time within the window; otherwisepace = reset_after_secondsand conservation and capacity are skipped to avoid double-counting. Higher score means "this account has more weighted remaining capacity, so burn it first." - The capacity factor accounts for the absolute size of a rate-limit window:
capacity = sqrt(limit_window_seconds / CAPACITY_REF). Larger windows (e.g. 7-day) represent more absolute token capacity than smaller windows (e.g. 5-hour), so the sameused_percenton a larger window leaves more usable room. The sqrt scaling prevents extreme windows from dominating linearly. A 5-hour window (18,000s) getscapacity ≈ 3.16; a 7-day window (604,800s) getscapacity ≈ 18.33— a ~5.8× ratio rather than the raw ~33.6× time ratio. - The conservation factor differentiates tactical (short-recovery) windows from strategic (long-recovery) windows:
conservation = max(1, min(CONSERVATION_CAP, 1 + ln(reset_after_seconds / CONSERVATION_REF))). Windows with recovery under 4 hours (CONSERVATION_REF = 14_400) receive no dampening. Longer recovery horizons are dampened logarithmically up to a cap derived from a 2-week ceiling (CONSERVATION_HORIZON = 1_209_600). Conservation and capacity work as opposing forces after activation: capacity boosts larger windows (more absolute room) while conservation dampens them (longer recovery if exhausted). The health factor then lightly prefers windows whose remaining capacity is ahead of their remaining time, reducing the chance that a mildly ahead large-plan account keeps winning forever on capacity alone. Once active, moderately-used large windows score well (plenty of absolute capacity despite conservation), while heavily-used large windows still lose to healthier short windows. Near-reset long windows (e.g. a 7-day window with 30 minutes left) receive no conservation dampening and full capacity boost, enabling aggressive use-it-or-lose-it burn. - Dormant windows are no longer score-boosted. Instead,
dormant-touchaccepts"always" | "new-session-only" | "disabled". When it is"always", any account with an untouched dormantrate_limitwindow (used_percent = 0andreset_after_seconds === limit_window_seconds) is temporarily promoted ahead of normal quota ranking for one successful request so the untouched window is started exactly once. When it is"new-session-only", that same promotion is allowed only before the current request has active sticky affinity; once a session is sticky, dormant-touch no longer overrides the sticky account. When multiple accounts qualify for dormant-touch, accounts with an untouchedrate.secondarywindow outrank accounts with only an untouchedrate.primarywindow; ties then fall back to score and stored priority. Successful touches are written to SQLite with the window label and a fixed 30-minute expiry, so every opencode process suppresses the same dormant-window priority for that short cooldown even if the cached usage payload still looks dormant. Whendormant-touch = "disabled", this promotion path is skipped. - If a rate limit reports
allowed = falseorlimit_reached = true, its score is 0 (fully blocked). - When
rate_limitexposes multiple complete windows with a clear longest span, ranking treats the longest window as the main score and the shorter windows as guardrails instead of taking a hard minimum. The final routing score ismain_score * worst_guard_factor, where each guard first computesnormalized_guard_score = ln(raw / balanced)against the balanced same-window baseline, then applies a horizon weightguard_weight = clamp((main_reset_after_seconds - guard_reset_after_seconds) / guard_limit_window_seconds, 0, 1)to that pace pressure only. The applied pace factor isexp(guard_weight * normalized_guard_score), clamped to at most1, while the low-cap floor remains separate asleft_floor = remaining_capacity / 0.03only when the guard window has less than3%remaining capacity. The final applied guard factor ismin(left_floor, applied_pace_factor). This means shorter windows stop penalizing the score when they reset alongside the main window, while still forcing protection when their remaining capacity is critically low. In that multi-window case, Pro plan weighting effectively lands on the longest window because the guard factor is derived from normalized guard health rather than plan-weighted raw score. If the windows cannot be reduced cleanly (for example, no clear longest complete window), ranking falls back to the previous conservative raw-window comparison.additional_rate_limitsandcode_review_rate_limitare ignored for account selection. - Raw usage payloads are cached in SQLite (
quota_cache) for 60 seconds so multiple opencode instances share the same warm cache. - Recently active accounts are polled every 30 seconds. The poller only revalidates an account when its shared usage cache is older than 3 minutes, so ranking freshness stays at 60 seconds while background revalidation stays less chatty than per-request stale warming. This active polling no longer requires a stored
chatgpt_account_id; it uses the same optional-header usage fetch path as foreground warming. - Every usage refresh, including background polling, must acquire a per-account SQLite lock (
usage:<account-id>) so simultaneous opencode processes collapse onto one actualwham/usagefetch per account. - If the shared usage cache is stale but not too old (currently up to 1 hour), keep using the expired cached payloads for the current foreground decision and warm them in the background, and show the reused cache age in the selection toast as a compact tag like
(5m ago). A cached payload must still be treated as expired and synchronously refetched if any considered non-dormantrate_limitwindow'supdated_at + reset_after_secondshas already passed, because that cached window can no longer describe the active quota state. Dormant first-use windows (used_percent = 0withreset_after_seconds === limit_window_seconds) do not force expiry. When a cached usage entry is reused for guard-based ranking or fast-mode guard pressure, treat the guard window as having aged by the cache elapsed time before computing the guard score or debt. If a cached usage entry exists but is older than that 1-hour fallback window, or its considered reset deadline has already elapsed, emit aQuota cache expired, fetching usage before selectiontoast, wait for the available accounts' usage fetches to complete, then run account selection and emit the normal selection toast. If the shared usage cache is cold (missing) for either side, keep the current priority order and warm in the background. - Once quota scores are available for the full candidate set, reorder requests by score across the entire fleet rather than only at the
core/poolboundary. - Failed or non-OK usage fetches do not write a negative cache entry. They leave ordering unchanged and allow the next request to retry warming.
- Successful token refresh for an account must invalidate that account's shared usage cache before future ranking, because the old payload may have been computed from stale credentials.
- Cooldowns and disabled-account handling still apply before quota ranking. Only
store.available()rows participate in this comparison. - Accounts are only disabled for durable authorization failure: a request that still returns
401after the plugin refreshes that account and retries once. Transient request, refresh, and usage-fetch errors must not disable the account.
- Fast-mode is implemented as post-ranking request decoration inside
src/fetch.ts; it does not change account ordering or sticky affinity. - Fast-mode policy comes from
~/.config/opencode/codex-pool.jsonviafast-mode: "auto" | "always" | "disabled"plusfast-mode-bias: number.autokeeps the score-based behavior,alwaysforcesservice_tier: "priority"whenever the plugin can decorate the request and the caller did not already set a tier, anddisabledsuppresses plugin-added fast-mode entirely.fast-mode-biasonly affectsauto: positive values make fast-mode more eager, negative values make it more conservative. - The final outbound field is OpenAI's
service_tier, even though upstream config and provider options may useserviceTier. - Fast-mode uses the same shared SQLite raw usage cache that ranking uses for the current attempt. Fresh usage is authoritative for 60 seconds; a 30-second background poller revalidates recently active accounts once their cache age crosses 3 minutes; stale cached usage may still drive the current foreground decision while a background refresh starts unless a considered non-dormant
rate_limitreset deadline has already elapsed, in which case the cache must be synchronously refetched instead of reused. Onlymissingusage, or cache rejected by that elapsed-reset check, should force a synchronous warm-up before a single-account prompt attempt. - The trigger is score-based. For every complete considered window, compute the existing selection
windowScore, then normalize it against a balanced same-window baseline whereremaining_capacity == remaining_time(left == time). The normalized fast-mode window score isln(windowScore(actual) / windowScore(balanced)), so0means on-pace, positive means healthier than pace, and negative means ahead of pace. Among the current scoredrate_limitwindows (rate.primary/rate.secondary), the window with the largestspanbecomes the fast-modemainvalue. If spans tie, the earlier window wins, which currently leavesprimaryasmain. Every other scored window becomes a guardrail.additional_rate_limitsare ignored for fast-mode. Guard debt is weighted by remaining horizon with the sameguard_weight = clamp((main_reset_after_seconds - guard_reset_after_seconds) / guard_limit_window_seconds, 0, 1)rule used by routing, so shorter windows only subtract debt when they reset meaningfully earlier than the main window. The final fast-mode score ismain_score - worst_guard_debt, whereworst_guard_debt = max(guard_weight * max(0, -guard_score)). The current attempt enables fast-mode atfinal_score >= 0.05 - fast-mode-bias; a sticky session that was already fast-enabled keeps it until the score falls below-0.02 - fast-mode-bias. - Windows with less than
3%remaining capacity force fast-mode off regardless of score. Missing mainrate_limitdata still yieldsno data.code_review_rate_limitremains ignored. - If any considered limit is blocked (
allowed = falseorlimit_reached = true) or the mainrate_limitwindow data is incomplete for fast-mode math, fast-mode stays off. - Caller-provided
service_tierorserviceTiertakes precedence and must not be overridden by the plugin. - Request bodies remain immutably snapshotted before retries. Each 401 retry or 429 failover rebuilds an attempt-local JSON body so
service_tier: "priority"never leaks across accounts or attempts. - The account-selection toast must identify the selected account with a
>prefix in theAccount:/Accounts:list and keep the same[plan] account:first line. When a selection is using stale shared usage data, the account label must show the cache age as a compact tag like(5m ago); if the score is also blocked, preserve the tag order as(5m ago, blocked 5m)when only minutes remain,(5m ago, blocked 1h 5m)when hours remain, and(5m ago, blocked)when no reset timer is available. When multiple blocked windows expose reset timers, the toast uses the most exhausted window's timer first (highestused_percent, thenprimary, then shorter reset). For single-window ranking, the second line still shows[window] score ...output. For reduced multi-window ranking, the second line must show<score> (<base> * guard x<factor>)so the chosen reduction is visible without a separatefinallabel. The fast-mode section must be a single concise line such asFast: enabled +80.600,Fast: enabled +59.200 (+80.600 - guard 21.400 - gate 5.000),Fast: disabled (cap<3%, rate.primary), orFast: disabled (no data). The toast must fire immediately before the outbound prompt request for that account. A sticky session must still emit a separate toast when fast-mode flips without an account switch, and that flip toast must use the same one-line fast-mode summary. - Toast plan labels stay compact for alignment:
plus,team,pro5(internalprolite), andpro20(internalpro).
- Different ChatGPT accounts are isolated cache scopes on the provider side. OpenAI's server-side prompt cache is not shared between organizations/accounts, so switching accounts mid-session forces a cache miss and increases latency.
- To preserve provider-side prompt cache warmth, the routing layer tracks which account last handled a successful response (
res.ok) per session and prefers that account for subsequent requests within a 5-minute window (AFFINITY_MS = 300_000), aligned with OpenAI's in-memory prompt cache retention.sticky-mode = "disabled"turns this off, whilesticky-mode = "always"forces the sticky account to hold for the affinity lifetime unless it becomes unavailable. - The session identity is derived from the
prompt_cache_keyfield in the JSON request body (set tosessionIDby opencode). Requests withoutprompt_cache_keyreceive no affinity and always use standard score-based routing. - Different sessions maintain independent affinity: Session A may be sticky to core while Session B is sticky to pool. This ensures that cross-session routing remains quota-aware and distributes load across accounts.
- The sticky account is only abandoned when: (a) the best currently ranked alternative's quota score exceeds the sticky account's score by more than the adaptive margin, (b) the sticky account's score is 0 (fully blocked), (c) the sticky account is in cooldown or disabled (already excluded by
store.available()), or (d) the affinity window has expired. - In
sticky-mode = "auto", the adaptive margin isSWITCH_MARGIN * sticky_strength * (0.5 + 0.5 * min(a, b) / max(a, b))whereSWITCH_MARGIN = 0.35and the defaultsticky-strength = 1. When scores are close (both accounts similarly healthy), the margin approaches the full scaled 35%. When scores diverge (one account is conservation-dampened), the margin shrinks toward a scaled 17.5%, making the router more willing to switch away from a strategically constrained account while still favoring stronger session stickiness.sticky-strength = 0removes the extra sticky margin; values above1make sticky sessions harder to break. - Affinity state lives inside the
createFetchclosure as aMap<string, Affinity>keyed byprompt_cache_key. Expired entries are pruned when the map exceeds 50 entries, and the entire map resets when the plugin loader re-creates the fetch function. - When no affinity is active (first request in a session, after expiry, or no
prompt_cache_key), the standard score ordering applies across every available account. - Request bodies are snapshotted before retries so failover and refresh retries can safely replay the same payload.
- When the selected account changes, the pre-request selection toast includes a compact score summary for the accounts that participated in the selection decision, marks the chosen account with a
>prefix in theAccount:/Accounts:list, pads the leading[plan]column and trailing account column for readability, shows either raw single-window scores or the reduced<score> (<base> * guard x<factor>)multi-window summary without window-duration labels as appropriate, includes a short reason string (for example, higher score, quota cache warming, or failover after a429), and ends with the same one-line fast-mode summary used elsewhere.
- Built-in loader guard:
if (auth.type !== "oauth") return {}— passes when core auth is OAuth. - Built-in loader side effects (model filter + cost zero) are desirable and kept.
- This plugin's loader runs after (external > internal), so
apiKeyandfetchare merged on top of the built-in Codex loader output.
src/
config.ts — Config file bootstrap/parser for `~/.config/opencode/codex-pool.json`
index.ts — Plugin entry, auth hook, auth methods, loader
store.ts — SQLite account/cooldown/lock/shared-usage-cache CRUD (bun:sqlite, WAL)
codex.ts — Codex OAuth constants, PKCE, JWT parsing, token exchange
oauth.ts — Browser OAuth flow, headless device flow, token refresh
sync.ts — Bootstrap an existing primary OAuth auth record into SQLite
fetch.ts — Multi-account fetch with quota-aware ordering, active-account usage polling, sticky affinity, 429 failover, refresh locking, and request URL rewrite
types.ts — Shared types and constants
test/
fetch.test.ts — Routing, failover, refresh, affinity, and quota-cache behavior
store.test.ts — SQLite store, cooldown, lock, and shared-cache behavior
- When an agent changes the program's specification — including behavior, architecture, design decisions, routing logic, constants, file structure, or any other documented contract — the agent MUST update both this AGENTS.md and README.md to reflect the change before considering the task complete.
Follow the opencode repo style:
- Single-word variable names preferred
constoverlet; ternaries or early returns over reassignment- Avoid
else; use early returns - Avoid unnecessary destructuring; use dot notation
- No
as any,@ts-ignore, or@ts-expect-error - Minimal comments; code should be self-explanatory
- Bun APIs preferred (
bun:sqlite,Bun.serve,Bun.file)
- Run tests:
bun testfrom this directory - Typecheck:
bun run typecheck - Tests use real SQLite (
:memory:or temp files), not mocks - Multi-instance tests use two separate
Databaseconnections to the same file
Build the plugin artifact with bun run build, then point opencode at the built entry:
{
"plugin": ["file:///path/to/codex-pool/dist/index.js"]
}For source-based local development, pointing at src/index.ts still works because Bun can import TypeScript directly.
CODEX_OAUTH_PORT: 1455CODEX_API_ENDPOINT:https://chatgpt.com/backend-api/codex/responsesCODEX_ISSUER:https://auth.openai.com- Config default path:
~/.config/opencode/codex-pool.json SENTINEL_SHADOW_PROVIDER:openai-codex-pool-shadow(inert auth.json record for additional accounts)OAUTH_DUMMY_KEY:OAUTH_DUMMY_KEY(dummy key returned by the loader alongside the custom fetch)REFRESH_LEASE_MS:30_000(SQLite refresh lock lease shared across processes)- Usage polling interval:
30_000ms - Usage polling revalidation age:
180_000ms (3 minutes) - Usage fetch lease:
25_000ms (SQLite usage-refresh lock lease shared across processes) CONSERVATION_REF:14_400(4 hours — tactical/strategic boundary)CONSERVATION_HORIZON:1_209_600(2 weeks — conservation cap ceiling)CAPACITY_REF:1_800(30 minutes — capacity normalization baseline)- Stale quota fallback horizon:
3_600_000ms (1 hour) - DB default path:
~/.local/share/opencode/codex-pool.db
Key files in the opencode repo that this plugin interacts with:
packages/opencode/src/plugin/codex.ts— Built-in Codex plugin (OAuth flow, fetch, model shaping)packages/opencode/src/provider/provider.ts:1001-1046— Plugin loader execution looppackages/opencode/src/session/llm.ts:65—isCodexcheck (provider.id === "openai" && auth?.type === "oauth")packages/opencode/src/plugin/index.ts:48-103— Plugin load order (internal first, external second)packages/plugin/src/index.ts— Plugin type definitions (Hooks, AuthHook, Plugin)