Skip to content

align GPT-5 model routing with current OpenAI defaults#309

Merged
ndycode merged 4 commits intomainfrom
feat/openai-parity-pr1
Mar 23, 2026
Merged

align GPT-5 model routing with current OpenAI defaults#309
ndycode merged 4 commits intomainfrom
feat/openai-parity-pr1

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 22, 2026

Summary

  • add first-class GPT-5.4, GPT-5.4-pro, GPT-5.4-mini, and GPT-5.4-nano routing metadata
  • stop silently downgrading generic GPT-5 aliases to older GPT-5.1-era mappings
  • surface normalized model routing and capability info in CLI diagnostics

Stack

  • base: main

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

📝 Walkthrough

Walkthrough

The changes introduce a centralized model mapping and profiling system that replaces scattered model normalization logic with canonical model profiles, reasoning effort defaults, and tool capability metadata. New functions resolveNormalizedModel(), getModelProfile(), and getModelCapabilities() standardize model identification across the codebase, while updating default model routing from gpt-5.1 to gpt-5.4.

Changes

Cohort / File(s) Summary
Model Profiling System
lib/request/helpers/model-map.ts
Major refactor introducing MODEL_PROFILES record mapping canonical model IDs to normalized names, prompt families, reasoning effort defaults/supported tiers, and tool capabilities. New exported types (PromptModelFamily, ModelReasoningEffort, ModelCapabilities, ModelProfile) and functions (resolveNormalizedModel(), getModelProfile(), getModelCapabilities()) provide structured access to model metadata. Updated getNormalizedModel() to strip provider prefixes and handle case-insensitive lookup; added new DEFAULT_MODEL constant set to gpt-5.4.
Model Normalization Integration
lib/capability-policy.ts, lib/request/request-transformer.ts
capability-policy.ts:line renames helper to resolveNormalizedModel() and removes fallback to withoutProvider. request-transformer.ts:line delegates normalization entirely to resolveNormalizedModel() and replaces multi-branch effort downgrade/upgrade logic with generic coerceReasoningEffort() that checks model profile support and applies predefined fallback ordering per effort tier.
Prompt Family Centralization
lib/prompts/codex.ts, lib/codex-manager.ts
codex.ts:line replaces local getModelFamily() implementation with lookup via getModelProfile(normalizedModel).promptFamily, changes ModelFamily type to imported PromptModelFamily, and updates default prewarm candidates from ["gpt-5-codex", "gpt-5.1"] to ["gpt-5-codex", "gpt-5.4"]. codex-manager.ts:line adds ModelInspection helpers to compute normalized names, detect remapping, derive prompt family, and capture capabilities, then surfaces modelSelection object in JSON reports with requested/normalized/remapped/promptFamily/capabilities breakdown.
Test Expectations
test/codex-manager-cli.test.ts, test/codex-prompts.test.ts, test/codex.test.ts, test/config.test.ts, test/model-map.test.ts, test/property/transformer.property.test.ts, test/request-transformer.test.ts
Updated test coverage across model normalization, family routing, and reasoning effort tiers. codex.test.ts:line consolidates assertions and adds gpt-5.4 general model routing to gpt-5.2 prompt family. config.test.ts:line reflects new effort defaults for lightweight models (gpt-5-mini now defaults to medium instead of minimal). model-map.test.ts:line adds resolveNormalizedModel() test block with new "unknown gpt-5-ish" fallback to gpt-5.4. All tests updated to reflect gpt-5.4 as default normalized model instead of gpt-5.1.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

key review vectors: the model profile system in lib/request/helpers/model-map.ts:line introduces dense type definitions and interconnected lookup logic that cascades through multiple subsystems. the reasoning effort coercion in request-transformer.ts:line replaces branching pattern-matching with profile-driven fallback logic—verify REASONING_FALLBACKS ordering matches intent for each effort tier. missing regression coverage: no explicit tests for edge cases where MODEL_PROFILES lacks an entry and DEFAULT_MODEL fallback is invoked; verify this doesn't mask provider-prefix stripping failures. concurrency risk: __clearCacheForTesting() is called in tests but no evidence caching mechanism itself is thread-safe if future async model loading is added. windows edge case: provider-prefix stripping at lib/request/helpers/model-map.ts:line uses / hardcoded—verify this doesn't collide with path handling on windows if model IDs ever encode paths.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning the pull request description is minimal and generic. it lacks required sections including validation checklist completion, docs/governance checklist status, and risk assessment details. complete all checkbox items (npm run lint/typecheck/test/build), mark docs checklist items as done or explicitly explain why they don't apply, and provide concrete rollback steps beyond the commit hash reference.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed title follows conventional commits format (lowercase imperative, 54 chars), directly summarizes the core change aligning gpt-5 routing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/openai-parity-pr1
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/openai-parity-pr1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ndycode ndycode added the needs-human-review Flagged by automated PR quality screening; maintainer review required. label Mar 22, 2026
coderabbitai[bot]
coderabbitai bot previously requested changes Mar 22, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/codex-manager.ts`:
- Around line 1937-1938: Normalize the model alias once before any live-probe
paths and reuse that normalized value everywhere instead of passing raw
options.model into fetchCodexQuotaSnapshot; update the code that currently
creates probeModel/inspectRequestedModel at lines like const probeModel =
options.model?.trim() || "gpt-5-codex" and ensure that the same normalized
variable is used for all calls to fetchCodexQuotaSnapshot (symbols: probeModel,
modelInspection, fetchCodexQuotaSnapshot) across the other live paths
(previously at the spots calling fetchCodexQuotaSnapshot) so routing is
consistent; add a vitest that takes a single alias and invokes the CLI handlers
for check, report, forecast, best, and fix, asserting the same resolved backend
model is used for each.

In `@lib/prompts/codex.ts`:
- Around line 94-95: The prewarm path is currently firing per-model targets
(e.g., both "gpt-5.4" and "gpt-5.2") which map to the same prompt family via
getModelFamily; update the prewarm logic (the function that builds prewarm
targets around the code referenced at lines ~372-379) to deduplicate targets by
prompt family using getModelFamily so only one request per promptFamily is
issued, add retry/backoff handling for EBUSY/429 in that queue, and ensure no
sensitive tokens/emails are logged; then add a deterministic vitest that covers:
multiple models mapping to the same family produce a single prewarm request,
retries on simulated EBUSY/429, and that gpt-5.1 still gets prewarmed when
present to prevent coverage loss.

In `@lib/request/helpers/model-map.ts`:
- Around line 183-184: Replace the direct addAlias calls for the chat-latest
entries with addReasoningAliases so wildcard legacy variants (e.g.,
"gpt-5-chat-latest-high") map to the correct base model and reasoning defaults;
specifically update the lines that call addAlias("gpt-5.1-chat-latest",
"gpt-5.1") and addAlias("gpt-5-chat-latest", "gpt-5") to use addReasoningAliases
instead, and add a vitest regression that asserts variants like
"gpt-5-chat-latest-high" and "gpt-5.1-chat-latest-foo" resolve to the intended
model IDs and reasoning settings (matching the behavior of addReasoningAliases).

In `@test/codex-prompts.test.ts`:
- Around line 467-486: The direct-family tests for codex variants only assert
the mocked response body and miss verifying the raw GitHub fetch path; update
the two earlier tests that call getCodexInstructions for the direct families
(the tests around the 'codex' and 'codex-max' cases that currently only check
the mocked body) to inspect mockFetch.mock.calls, find the
raw.githubusercontent.com call (like in the gpt-5.4 test), and add an assertion
that the fetched URL contains the expected raw prompt filename (e.g.,
"gpt_5_codex_prompt.md" for codex and "gpt_5_codex_max_prompt.md" for codex-max)
so regressions that collapse families into the wrong prompt file will fail.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 183f50c9-5a91-43e8-91a0-f5088cfe0930

📥 Commits

Reviewing files that changed from the base of the PR and between 1be5e95 and 9e13a24.

📒 Files selected for processing (12)
  • lib/capability-policy.ts
  • lib/codex-manager.ts
  • lib/prompts/codex.ts
  • lib/request/helpers/model-map.ts
  • lib/request/request-transformer.ts
  • test/codex-manager-cli.test.ts
  • test/codex-prompts.test.ts
  • test/codex.test.ts
  • test/config.test.ts
  • test/model-map.test.ts
  • test/property/transformer.property.test.ts
  • test/request-transformer.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/capability-policy.ts
  • lib/request/request-transformer.ts
  • lib/codex-manager.ts
  • lib/prompts/codex.ts
  • lib/request/helpers/model-map.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/property/transformer.property.test.ts
  • test/config.test.ts
  • test/codex-prompts.test.ts
  • test/codex-manager-cli.test.ts
  • test/codex.test.ts
  • test/request-transformer.test.ts
  • test/model-map.test.ts
🔇 Additional comments (5)
test/property/transformer.property.test.ts (1)

48-56: good property coverage for the new defaults.

test/property/transformer.property.test.ts:48-56 and test/property/transformer.property.test.ts:206-217 pin the gpt-5.4 default and the xhigh downgrade path cleanly.

Also applies to: 206-217

test/codex-prompts.test.ts (1)

89-97: good family-routing regression coverage.

test/codex-prompts.test.ts:89-97 pins the gpt-5.4-era general models to the gpt-5.2 prompt family cleanly.

test/codex-manager-cli.test.ts (1)

5705-5776: good json report coverage for model diagnostics.

test/codex-manager-cli.test.ts:5705-5776 locks the new modelSelection payload for both the no-remap and remap cases.

test/config.test.ts (1)

93-118: good reasoning-tier regression coverage.

test/config.test.ts:93-118 and test/config.test.ts:191-196 cover the new lightweight defaults, the gpt-5.4 none default, and the gpt-5.4-pro clamp.

Also applies to: 175-196

test/codex.test.ts (1)

14-30: good prompt-family routing coverage.

test/codex.test.ts:14-30 pins the gpt-5.4-era routing and the default-family fallback in one place.

@ndycode
Copy link
Owner Author

ndycode commented Mar 22, 2026

@coderabbitai review
@greptile @cubic review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

@ndycode: on it — kicking off a full review of PR #309 now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ndycode
Copy link
Owner Author

ndycode commented Mar 22, 2026

@coderabbitai review

@ndycode
Copy link
Owner Author

ndycode commented Mar 22, 2026

@greptile @cubic review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ndycode
Copy link
Owner Author

ndycode commented Mar 22, 2026

@coderabbitai review
@greptile @cubic review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

@ndycode: on it — triggering a full review of PR #309 now. will cover the new model catalog in lib/request/helpers/model-map.ts, the updated routing and reasoning coercion in lib/request/request-transformer.ts, the capability-policy fallback change in lib/capability-policy.ts, the diagnostics additions in lib/codex-manager.ts, and all associated test coverage.

✅ Actions performed

Full review triggered.

@ndycode ndycode dismissed coderabbitai[bot]’s stale review March 22, 2026 15:43

All review threads are resolved and follow-up commits addressed this stale automated change request.

@ndycode ndycode removed the needs-human-review Flagged by automated PR quality screening; maintainer review required. label Mar 22, 2026
@ndycode ndycode merged commit e820c32 into main Mar 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant