Skip to content

fix(brand-claim): surface workos_misconfigured when verification_prefix missing#4520

Open
bokelley wants to merge 1 commit into
mainfrom
bokelley/bridgefund-org-provisioning
Open

fix(brand-claim): surface workos_misconfigured when verification_prefix missing#4520
bokelley wants to merge 1 commit into
mainfrom
bokelley/bridgefund-org-provisioning

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

Found while triaging Mark Hoekx's (BridgeFund) escalation. WorkOS is returning DNS-strategy organization domains without a verification_prefix in our environment, but issueDomainChallenge was treating the field as guaranteed. Result: users got back instructions that read literally Publish the DNS TXT record at verification_prefix.{domain} with the placeholder unsubstituted, and had no way to publish a record. Worse, the existing tokenMissing branch deleted-and-recreated the WorkOS domain on every retry — combined with missing prefixes, this rotated tokens in a no-win loop, invalidating any DNS records the user had managed to publish through guesswork.

This is the second issue I've seen in this area in 24h (the first was a manual-flip + missed-webhook recovery → admin endpoint in #4486-ish). Same surface, different failure mode.

Changes

  • server/src/services/brand-claim.ts

    • New workos_misconfigured result code in the IssueChallengeResult union for the case where WorkOS returns a token but no prefix (operator-side env-config gap, not a transient failure).
    • Split the broken-state detection: only delete-and-recreate on missing verificationToken (the original brand-claim: re-issuing challenge for existing domain with null verification token returns silent workos_error #3953 case). Missing prefix with a present token is now a hard stop — surface the error rather than churn WorkOS state recreate can't fix.
    • Same guard after a successful createOrganizationDomain — if the new record has no prefix, return workos_misconfigured instead of echoing nulls back.
  • server/src/routes/member-profiles.ts

    • /brand-claim/issue maps workos_misconfigured → 503 with an operator-action message.
    • Response now includes a dns_record_name field and the instructions string interpolates real values (prefix.domain + token) instead of literal placeholder text.
  • server/src/addie/mcp/member-tools.ts

    • request_brand_domain_challenge handles the new code with a stop-and-wait message instead of offering retries — the fix lives in the WorkOS dashboard, not in the user's flow.
    • Kept the defensive null check as a belt-and-suspenders fallback in case the service guard ever regresses.
  • Tests: 2 new unit tests covering the existing-domain-no-prefix and create-returns-no-prefix paths, both asserting we don't churn WorkOS via delete/recreate.

Operator note

Resolving the missing-prefix at the source still requires configuring the WorkOS DNS verification template in the dashboard. This PR doesn't fix that — it just makes the failure mode legible and stops handing users broken instructions. While the WorkOS config is missing, brand-claim flow fails fast with a 503; manual verification_strategy: manual flips via the WorkOS dashboard (followed by /api/admin/.../brand-claim/verify to write through) remain the workaround.

Test plan

  • npx vitest run tests/unit/brand-claim-service.test.ts — 21/21 pass (2 new + 19 existing)
  • npx vitest run tests/integration/brand-claim-apply-verified.test.ts — 8/8 pass
  • Pre-commit hooks (full test:unit + dynamic-imports + typecheck) — pass
  • Smoke: manually call POST /api/me/member-profile/brand-claim/issue against a WorkOS env without DNS template configured, verify 503 + clear message

🤖 Generated with Claude Code

…ix missing

WorkOS DNS-strategy domains were being returned without a
`verification_prefix`, leaving users with no way to publish the TXT
record and the brand-claim flow stuck. The `tokenMissing` branch
deleted-and-recreated on every retry, churning tokens in a no-win loop.

- Split the broken-state detection: only delete-and-recreate when
  `verificationToken` is actually missing. Missing prefix with a present
  token is an env-config gap that recreate can't fix.
- New `workos_misconfigured` result code → 503 from the route, with a
  clear "operator action needed" message instead of half-broken
  instructions.
- Same guard after `createOrganizationDomain` succeeds — if the new
  record has no prefix, surface the error rather than echoing nulls.
- Route response now includes a `dns_record_name` field and the
  `instructions` string interpolates real values instead of literal
  `verification_prefix.{domain}` placeholders.
- Addie's `request_brand_domain_challenge` tool handles the new code
  with a stop-and-wait message rather than offering retries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant