Skip to content

Latest commit

 

History

History
155 lines (122 loc) · 23.2 KB

File metadata and controls

155 lines (122 loc) · 23.2 KB

codex-pool

External opencode plugin that manages multiple ChatGPT Pro/ProLite/Plus/Team Codex OAuth accounts with quota-aware core/pool preference and priority-based 429 failover.

Architecture

This plugin hijacks provider: "openai" via auth.loader. The built-in Codex plugin runs first (model filtering, cost zeroing), then this plugin supplies a dummy OAuth apiKey plus a replacement fetch via mergeDeep.

Core auth.json stores type: "oauth" for the primary account so that isCodex = true in opencode's llm.ts:65, preserving exact Codex behavior parity (options.instructions, system prompt, maxOutputTokens).

Key design decisions

  • JSON config lives at ~/.config/opencode/codex-pool.json; the plugin auto-creates it with { "fast-mode": "auto", "fast-mode-bias": 0, "sticky-mode": "always", "sticky-strength": 1, "dormant-touch": "new-session-only" } when missing, loads it during plugin initialization, and falls back to defaults with a warning toast if the file is invalid.
  • SQLite (~/.local/share/opencode/codex-pool.db) is the sole runtime source of truth for account tokens, cooldown state, shared usage cache, dormant-window touch suppression, and the cross-process locks that coordinate refresh and usage revalidation.
  • Core auth.json is a mirror of the primary account only, kept in sync for isCodex activation, while additional accounts stay in SQLite and are represented in auth state through the inert shadow provider.
  • Auth methods expose primary login, pool-account addition, and a minimal Edit pool accounts manager that lists current non-primary rows and can delete a selected pool account after confirmation.
  • The built-in Codex chat.headers hook is not duplicated; it runs as-is.
  • 429 failover is strict priority-based (not round-robin) after request ordering has been decided.
  • The loader also returns OAUTH_DUMMY_KEY so the overridden fetch path still satisfies provider auth requirements while keeping Codex OAuth behavior active.

Remaining-capacity routing

  • core means the primary mirrored OpenAI OAuth account (primary_account = 1); pool means every non-primary account in SQLite.
  • Routing computes a quota score for every currently available account, then reorders the whole candidate list by score. Higher score goes earlier; equal scores fall back to stored priority order.
  • The quota signal comes from https://chatgpt.com/backend-api/wham/usage, using the account's bearer token and ChatGPT-Account-Id when that ID is known. If SQLite does not currently have a stored ChatGPT account ID for a row, the plugin still calls the usage endpoint without that header instead of skipping usage entirely. When the usage payload returns account_id, the plugin persists that value back to SQLite as the row's chatgpt_account_id for later requests and pool-account bookkeeping.
  • Each account's burn score is weighted by plan type: Plus/Team/default = 1, ProLite = sqrt(5) (~2.24), Pro = sqrt(20) (~4.47). Routing uses the official Plus-relative quota ratios and intentionally compresses them with a square root so larger plans remain favored without monopolizing selection.
  • Per-window score is ((plan_weight * (1 - used_percent / 100) * capacity) / (pace * conservation)) * health_factor once a window is active. health_factor is a bounded multiplier derived from remaining_capacity - remaining_time, giving healthy windows up to a small bonus and ahead windows up to a slightly larger penalty without overpowering plan weight or window capacity. When limit_window_seconds is available, pace = max(reset_after_seconds / limit_window_seconds, 0.000001) normalises for elapsed time within the window; otherwise pace = reset_after_seconds and conservation and capacity are skipped to avoid double-counting. Higher score means "this account has more weighted remaining capacity, so burn it first."
  • The capacity factor accounts for the absolute size of a rate-limit window: capacity = sqrt(limit_window_seconds / CAPACITY_REF). Larger windows (e.g. 7-day) represent more absolute token capacity than smaller windows (e.g. 5-hour), so the same used_percent on a larger window leaves more usable room. The sqrt scaling prevents extreme windows from dominating linearly. A 5-hour window (18,000s) gets capacity ≈ 3.16; a 7-day window (604,800s) gets capacity ≈ 18.33 — a ~5.8× ratio rather than the raw ~33.6× time ratio.
  • The conservation factor differentiates tactical (short-recovery) windows from strategic (long-recovery) windows: conservation = max(1, min(CONSERVATION_CAP, 1 + ln(reset_after_seconds / CONSERVATION_REF))). Windows with recovery under 4 hours (CONSERVATION_REF = 14_400) receive no dampening. Longer recovery horizons are dampened logarithmically up to a cap derived from a 2-week ceiling (CONSERVATION_HORIZON = 1_209_600). Conservation and capacity work as opposing forces after activation: capacity boosts larger windows (more absolute room) while conservation dampens them (longer recovery if exhausted). The health factor then lightly prefers windows whose remaining capacity is ahead of their remaining time, reducing the chance that a mildly ahead large-plan account keeps winning forever on capacity alone. Once active, moderately-used large windows score well (plenty of absolute capacity despite conservation), while heavily-used large windows still lose to healthier short windows. Near-reset long windows (e.g. a 7-day window with 30 minutes left) receive no conservation dampening and full capacity boost, enabling aggressive use-it-or-lose-it burn.
  • Dormant windows are no longer score-boosted. Instead, dormant-touch accepts "always" | "new-session-only" | "disabled". When it is "always", any account with an untouched dormant rate_limit window (used_percent = 0 and reset_after_seconds === limit_window_seconds) is temporarily promoted ahead of normal quota ranking for one successful request so the untouched window is started exactly once. When it is "new-session-only", that same promotion is allowed only before the current request has active sticky affinity; once a session is sticky, dormant-touch no longer overrides the sticky account. When multiple accounts qualify for dormant-touch, accounts with an untouched rate.secondary window outrank accounts with only an untouched rate.primary window; ties then fall back to score and stored priority. Successful touches are written to SQLite with the window label and a fixed 30-minute expiry, so every opencode process suppresses the same dormant-window priority for that short cooldown even if the cached usage payload still looks dormant. When dormant-touch = "disabled", this promotion path is skipped.
  • If a rate limit reports allowed = false or limit_reached = true, its score is 0 (fully blocked).
  • When rate_limit exposes multiple complete windows with a clear longest span, ranking treats the longest window as the main score and the shorter windows as guardrails instead of taking a hard minimum. The final routing score is main_score * worst_guard_factor, where each guard first computes normalized_guard_score = ln(raw / balanced) against the balanced same-window baseline, then applies a horizon weight guard_weight = clamp((main_reset_after_seconds - guard_reset_after_seconds) / guard_limit_window_seconds, 0, 1) to that pace pressure only. The applied pace factor is exp(guard_weight * normalized_guard_score), clamped to at most 1, while the low-cap floor remains separate as left_floor = remaining_capacity / 0.03 only when the guard window has less than 3% remaining capacity. The final applied guard factor is min(left_floor, applied_pace_factor). This means shorter windows stop penalizing the score when they reset alongside the main window, while still forcing protection when their remaining capacity is critically low. In that multi-window case, Pro plan weighting effectively lands on the longest window because the guard factor is derived from normalized guard health rather than plan-weighted raw score. If the windows cannot be reduced cleanly (for example, no clear longest complete window), ranking falls back to the previous conservative raw-window comparison. additional_rate_limits and code_review_rate_limit are ignored for account selection.
  • Raw usage payloads are cached in SQLite (quota_cache) for 60 seconds so multiple opencode instances share the same warm cache.
  • Recently active accounts are polled every 30 seconds. The poller only revalidates an account when its shared usage cache is older than 3 minutes, so ranking freshness stays at 60 seconds while background revalidation stays less chatty than per-request stale warming. This active polling no longer requires a stored chatgpt_account_id; it uses the same optional-header usage fetch path as foreground warming.
  • Every usage refresh, including background polling, must acquire a per-account SQLite lock (usage:<account-id>) so simultaneous opencode processes collapse onto one actual wham/usage fetch per account.
  • If the shared usage cache is stale but not too old (currently up to 1 hour), keep using the expired cached payloads for the current foreground decision and warm them in the background, and show the reused cache age in the selection toast as a compact tag like (5m ago). A cached payload must still be treated as expired and synchronously refetched if any considered non-dormant rate_limit window's updated_at + reset_after_seconds has already passed, because that cached window can no longer describe the active quota state. Dormant first-use windows (used_percent = 0 with reset_after_seconds === limit_window_seconds) do not force expiry. When a cached usage entry is reused for guard-based ranking or fast-mode guard pressure, treat the guard window as having aged by the cache elapsed time before computing the guard score or debt. If a cached usage entry exists but is older than that 1-hour fallback window, or its considered reset deadline has already elapsed, emit a Quota cache expired, fetching usage before selection toast, wait for the available accounts' usage fetches to complete, then run account selection and emit the normal selection toast. If the shared usage cache is cold (missing) for either side, keep the current priority order and warm in the background.
  • Once quota scores are available for the full candidate set, reorder requests by score across the entire fleet rather than only at the core/pool boundary.
  • Failed or non-OK usage fetches do not write a negative cache entry. They leave ordering unchanged and allow the next request to retry warming.
  • Successful token refresh for an account must invalidate that account's shared usage cache before future ranking, because the old payload may have been computed from stale credentials.
  • Cooldowns and disabled-account handling still apply before quota ranking. Only store.available() rows participate in this comparison.
  • Accounts are only disabled for durable authorization failure: a request that still returns 401 after the plugin refreshes that account and retries once. Transient request, refresh, and usage-fetch errors must not disable the account.

Dynamic fast-mode

  • Fast-mode is implemented as post-ranking request decoration inside src/fetch.ts; it does not change account ordering or sticky affinity.
  • Fast-mode policy comes from ~/.config/opencode/codex-pool.json via fast-mode: "auto" | "always" | "disabled" plus fast-mode-bias: number. auto keeps the score-based behavior, always forces service_tier: "priority" whenever the plugin can decorate the request and the caller did not already set a tier, and disabled suppresses plugin-added fast-mode entirely. fast-mode-bias only affects auto: positive values make fast-mode more eager, negative values make it more conservative.
  • The final outbound field is OpenAI's service_tier, even though upstream config and provider options may use serviceTier.
  • Fast-mode uses the same shared SQLite raw usage cache that ranking uses for the current attempt. Fresh usage is authoritative for 60 seconds; a 30-second background poller revalidates recently active accounts once their cache age crosses 3 minutes; stale cached usage may still drive the current foreground decision while a background refresh starts unless a considered non-dormant rate_limit reset deadline has already elapsed, in which case the cache must be synchronously refetched instead of reused. Only missing usage, or cache rejected by that elapsed-reset check, should force a synchronous warm-up before a single-account prompt attempt.
  • The trigger is score-based. For every complete considered window, compute the existing selection windowScore, then normalize it against a balanced same-window baseline where remaining_capacity == remaining_time (left == time). The normalized fast-mode window score is ln(windowScore(actual) / windowScore(balanced)), so 0 means on-pace, positive means healthier than pace, and negative means ahead of pace. Among the current scored rate_limit windows (rate.primary / rate.secondary), the window with the largest span becomes the fast-mode main value. If spans tie, the earlier window wins, which currently leaves primary as main. Every other scored window becomes a guardrail. additional_rate_limits are ignored for fast-mode. Guard debt is weighted by remaining horizon with the same guard_weight = clamp((main_reset_after_seconds - guard_reset_after_seconds) / guard_limit_window_seconds, 0, 1) rule used by routing, so shorter windows only subtract debt when they reset meaningfully earlier than the main window. The final fast-mode score is main_score - worst_guard_debt, where worst_guard_debt = max(guard_weight * max(0, -guard_score)). The current attempt enables fast-mode at final_score >= 0.05 - fast-mode-bias; a sticky session that was already fast-enabled keeps it until the score falls below -0.02 - fast-mode-bias.
  • Windows with less than 3% remaining capacity force fast-mode off regardless of score. Missing main rate_limit data still yields no data. code_review_rate_limit remains ignored.
  • If any considered limit is blocked (allowed = false or limit_reached = true) or the main rate_limit window data is incomplete for fast-mode math, fast-mode stays off.
  • Caller-provided service_tier or serviceTier takes precedence and must not be overridden by the plugin.
  • Request bodies remain immutably snapshotted before retries. Each 401 retry or 429 failover rebuilds an attempt-local JSON body so service_tier: "priority" never leaks across accounts or attempts.
  • The account-selection toast must identify the selected account with a > prefix in the Account:/Accounts: list and keep the same [plan] account: first line. When a selection is using stale shared usage data, the account label must show the cache age as a compact tag like (5m ago); if the score is also blocked, preserve the tag order as (5m ago, blocked 5m) when only minutes remain, (5m ago, blocked 1h 5m) when hours remain, and (5m ago, blocked) when no reset timer is available. When multiple blocked windows expose reset timers, the toast uses the most exhausted window's timer first (highest used_percent, then primary, then shorter reset). For single-window ranking, the second line still shows [window] score ... output. For reduced multi-window ranking, the second line must show <score> (<base> * guard x<factor>) so the chosen reduction is visible without a separate final label. The fast-mode section must be a single concise line such as Fast: enabled +80.600, Fast: enabled +59.200 (+80.600 - guard 21.400 - gate 5.000), Fast: disabled (cap<3%, rate.primary), or Fast: disabled (no data). The toast must fire immediately before the outbound prompt request for that account. A sticky session must still emit a separate toast when fast-mode flips without an account switch, and that flip toast must use the same one-line fast-mode summary.
  • Toast plan labels stay compact for alignment: plus, team, pro5 (internal prolite), and pro20 (internal pro).

Sticky affinity

  • Different ChatGPT accounts are isolated cache scopes on the provider side. OpenAI's server-side prompt cache is not shared between organizations/accounts, so switching accounts mid-session forces a cache miss and increases latency.
  • To preserve provider-side prompt cache warmth, the routing layer tracks which account last handled a successful response (res.ok) per session and prefers that account for subsequent requests within a 5-minute window (AFFINITY_MS = 300_000), aligned with OpenAI's in-memory prompt cache retention. sticky-mode = "disabled" turns this off, while sticky-mode = "always" forces the sticky account to hold for the affinity lifetime unless it becomes unavailable.
  • The session identity is derived from the prompt_cache_key field in the JSON request body (set to sessionID by opencode). Requests without prompt_cache_key receive no affinity and always use standard score-based routing.
  • Different sessions maintain independent affinity: Session A may be sticky to core while Session B is sticky to pool. This ensures that cross-session routing remains quota-aware and distributes load across accounts.
  • The sticky account is only abandoned when: (a) the best currently ranked alternative's quota score exceeds the sticky account's score by more than the adaptive margin, (b) the sticky account's score is 0 (fully blocked), (c) the sticky account is in cooldown or disabled (already excluded by store.available()), or (d) the affinity window has expired.
  • In sticky-mode = "auto", the adaptive margin is SWITCH_MARGIN * sticky_strength * (0.5 + 0.5 * min(a, b) / max(a, b)) where SWITCH_MARGIN = 0.35 and the default sticky-strength = 1. When scores are close (both accounts similarly healthy), the margin approaches the full scaled 35%. When scores diverge (one account is conservation-dampened), the margin shrinks toward a scaled 17.5%, making the router more willing to switch away from a strategically constrained account while still favoring stronger session stickiness. sticky-strength = 0 removes the extra sticky margin; values above 1 make sticky sessions harder to break.
  • Affinity state lives inside the createFetch closure as a Map<string, Affinity> keyed by prompt_cache_key. Expired entries are pruned when the map exceeds 50 entries, and the entire map resets when the plugin loader re-creates the fetch function.
  • When no affinity is active (first request in a session, after expiry, or no prompt_cache_key), the standard score ordering applies across every available account.
  • Request bodies are snapshotted before retries so failover and refresh retries can safely replay the same payload.
  • When the selected account changes, the pre-request selection toast includes a compact score summary for the accounts that participated in the selection decision, marks the chosen account with a > prefix in the Account:/Accounts: list, pads the leading [plan] column and trailing account column for readability, shows either raw single-window scores or the reduced <score> (<base> * guard x<factor>) multi-window summary without window-duration labels as appropriate, includes a short reason string (for example, higher score, quota cache warming, or failover after a 429), and ends with the same one-line fast-mode summary used elsewhere.

Coexistence with built-in CodexAuthPlugin

  • Built-in loader guard: if (auth.type !== "oauth") return {} — passes when core auth is OAuth.
  • Built-in loader side effects (model filter + cost zero) are desirable and kept.
  • This plugin's loader runs after (external > internal), so apiKey and fetch are merged on top of the built-in Codex loader output.

File structure

src/
  config.ts  — Config file bootstrap/parser for `~/.config/opencode/codex-pool.json`
  index.ts   — Plugin entry, auth hook, auth methods, loader
  store.ts   — SQLite account/cooldown/lock/shared-usage-cache CRUD (bun:sqlite, WAL)
  codex.ts   — Codex OAuth constants, PKCE, JWT parsing, token exchange
  oauth.ts   — Browser OAuth flow, headless device flow, token refresh
  sync.ts    — Bootstrap an existing primary OAuth auth record into SQLite
  fetch.ts   — Multi-account fetch with quota-aware ordering, active-account usage polling, sticky affinity, 429 failover, refresh locking, and request URL rewrite
  types.ts   — Shared types and constants
test/
  fetch.test.ts — Routing, failover, refresh, affinity, and quota-cache behavior
  store.test.ts — SQLite store, cooldown, lock, and shared-cache behavior

Agent rules

  • When an agent changes the program's specification — including behavior, architecture, design decisions, routing logic, constants, file structure, or any other documented contract — the agent MUST update both this AGENTS.md and README.md to reflect the change before considering the task complete.

Style guide

Follow the opencode repo style:

  • Single-word variable names preferred
  • const over let; ternaries or early returns over reassignment
  • Avoid else; use early returns
  • Avoid unnecessary destructuring; use dot notation
  • No as any, @ts-ignore, or @ts-expect-error
  • Minimal comments; code should be self-explanatory
  • Bun APIs preferred (bun:sqlite, Bun.serve, Bun.file)

Testing

  • Run tests: bun test from this directory
  • Typecheck: bun run typecheck
  • Tests use real SQLite (:memory: or temp files), not mocks
  • Multi-instance tests use two separate Database connections to the same file

Development

Build the plugin artifact with bun run build, then point opencode at the built entry:

{
  "plugin": ["file:///path/to/codex-pool/dist/index.js"]
}

For source-based local development, pointing at src/index.ts still works because Bun can import TypeScript directly.

Constants

  • CODEX_OAUTH_PORT: 1455
  • CODEX_API_ENDPOINT: https://chatgpt.com/backend-api/codex/responses
  • CODEX_ISSUER: https://auth.openai.com
  • Config default path: ~/.config/opencode/codex-pool.json
  • SENTINEL_SHADOW_PROVIDER: openai-codex-pool-shadow (inert auth.json record for additional accounts)
  • OAUTH_DUMMY_KEY: OAUTH_DUMMY_KEY (dummy key returned by the loader alongside the custom fetch)
  • REFRESH_LEASE_MS: 30_000 (SQLite refresh lock lease shared across processes)
  • Usage polling interval: 30_000 ms
  • Usage polling revalidation age: 180_000 ms (3 minutes)
  • Usage fetch lease: 25_000 ms (SQLite usage-refresh lock lease shared across processes)
  • CONSERVATION_REF: 14_400 (4 hours — tactical/strategic boundary)
  • CONSERVATION_HORIZON: 1_209_600 (2 weeks — conservation cap ceiling)
  • CAPACITY_REF: 1_800 (30 minutes — capacity normalization baseline)
  • Stale quota fallback horizon: 3_600_000 ms (1 hour)
  • DB default path: ~/.local/share/opencode/codex-pool.db

Upstream references

Key files in the opencode repo that this plugin interacts with:

  • packages/opencode/src/plugin/codex.ts — Built-in Codex plugin (OAuth flow, fetch, model shaping)
  • packages/opencode/src/provider/provider.ts:1001-1046 — Plugin loader execution loop
  • packages/opencode/src/session/llm.ts:65isCodex check (provider.id === "openai" && auth?.type === "oauth")
  • packages/opencode/src/plugin/index.ts:48-103 — Plugin load order (internal first, external second)
  • packages/plugin/src/index.ts — Plugin type definitions (Hooks, AuthHook, Plugin)