fix(designer): Resolve RBAC propagation delay causing Foundry proxy failures#8950
fix(designer): Resolve RBAC propagation delay causing Foundry proxy failures#8950ccastrotrejo wants to merge 5 commits intomainfrom
Conversation
…ailures When selecting a Foundry agent connection for the first time, proxy API calls and RBAC role assignments fired simultaneously from independent useEffect hooks. The proxy calls failed immediately (401/403) because roles hadn't been assigned yet, and React Query's default 3 retries exhausted within ~10 seconds — far shorter than Azure's 30s–2min RBAC propagation window. Three layered fixes: 1. RBAC signal + query gating: Track RBAC assignment status (idle/assigning/assigned/not-needed/failed) and gate Foundry proxy queries behind completion. Proxy calls no longer fire before RBAC PUTs finish. 2. Extended retry for auth errors: Foundry proxy queries now retry up to 12 times on 401/403 with exponential backoff (3s, 6s, 12s, 24s, 30s cap) providing a ~3 minute window for Azure RBAC propagation. 3. Propagation UX: Shows a purposeful spinner with 'Setting up permissions for Foundry access...' instead of a cryptic error. Auto-transitions to normal UI once retries succeed. All changes mirrored in both designer and designer-v2. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 AI PR Validation ReportPR Review ResultsThank you for your submission! Here's detailed feedback on your PR title and body compliance:✅ PR Title
✅ Commit Type
✅ Risk Level
✅ What & Why
✅ Impact of Change
✅ Test Plan
✅ Contributors
|
| Section | Status | Recommendation |
|---|---|---|
| Title | ✅ | Keep as-is. |
| Commit Type | ✅ | Correct selection (fix). |
| Risk Level | ✅ | Matches diff; risk:medium appropriate. |
| What & Why | ✅ | Good explanation and HAR evidence included. |
| Impact of Change | ✅ | Clear; consider adding note about telemetry. |
| Test Plan | ✅ | Unit tests added; consider E2E/integration test. |
| Contributors | ✅ | OK; add more credits if applicable. |
| Screenshots/Videos | Add screenshot/GIF of new UI for reviewer ease. |
Final notes & recommended small improvements
- The PR body and title pass the template checks — nice work.
- A couple of suggested (non-blocking) items to consider addressing in follow-ups or before merge:
- Add a small integration/E2E test (if practical) that simulates RBAC propagation delays so the extended retry behavior and gating are exercised end-to-end. If E2E is not possible, add a short note in the Test Plan explaining why.
- Consider adding lightweight telemetry/metric events around: role-assignment attempts/success/failure, number of extended retries used, and when retries are exhausted. This aids operations if customers encounter propagation issues in the wild.
- Provide a screenshot or short GIF showing the "Assigning permissions..." spinner and the exhausted/retry UI state so product/design reviewers can validate the UX.
- (Optional) Add a short note in the PR or code comments clarifying that when RBAC assignment fails the gating will move forward (
failedstate treated as ready). This is okay, but clarifying intent helps future maintenance.
Please update/add the optional items above if you'd like, otherwise this PR is good to merge from a PR title/body compliance perspective. Thank you for the thorough description and tests!
Last updated: Fri, 20 Mar 2026 21:19:05 GMT
…foundry-rbac-propagation
There was a problem hiding this comment.
Pull request overview
Fixes a race between Foundry proxy queries and RBAC role assignment/propagation in the Designer parameters tab by gating proxy queries until RBAC assignment completes and extending retries for auth/permission errors that commonly occur during Azure RBAC propagation.
Changes:
- Add RBAC assignment status tracking in ParametersTab and gate Foundry proxy queries until RBAC is ready (plus propagation UI while waiting).
- Introduce
isFoundryAuthError()and extend React Query retry/backoff behavior for Foundry proxy calls on 401/403-like errors. - Add unit tests validating
isFoundryAuthError()against common/real-world error shapes.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| libs/designer/src/lib/ui/panel/nodeDetailsPanel/tabs/parametersTab/index.tsx | Adds RBAC state machine + query gating + propagation loading UI for Foundry agent connections. |
| libs/designer/src/lib/ui/panel/connectionsPanel/createConnection/custom/useCognitiveService.ts | Adds isFoundryAuthError() and RBAC-aware retry/backoff + rbacReady gate parameter for Foundry queries. |
| libs/designer/src/lib/ui/panel/connectionsPanel/createConnection/custom/test/foundryRetry.spec.ts | New Vitest suite covering isFoundryAuthError() behavior. |
| libs/designer-v2/src/lib/ui/panel/nodeDetailsPanel/tabs/parametersTab/index.tsx | Mirrors RBAC gating + propagation UI changes in Designer v2. |
| libs/designer-v2/src/lib/ui/panel/connectionsPanel/createConnection/custom/useCognitiveService.ts | Mirrors isFoundryAuthError() + retry/backoff + rbacReady gate in Designer v2. |
| .then((missingRoles) => { | ||
| if (missingRoles.length === 0) { | ||
| setFoundryRbacStatus('not-needed'); | ||
| return Promise.resolve(); | ||
| } | ||
| return Promise.all(missingRoles.map((role) => RoleService().addAppRoleAssignmentForResource(targetResourceId, role.id))).then( | ||
| () => { | ||
| setFoundryRbacStatus('assigned'); | ||
| } | ||
| ); |
There was a problem hiding this comment.
The RBAC assignment promise chain can update foundryRbacStatus for a stale foundryAccountResourceId if the connection/resource changes while the request is in flight. catch guards against this, but the .then path sets state unconditionally, which can incorrectly mark RBAC as assigned/not-needed and enable proxy queries for the wrong resource. Add a guard in the .then path (similar to the catch) to ensure rbacAssignedResourceRef.current === targetResourceId before calling setFoundryRbacStatus(...) (and consider a cleanup flag to avoid setState after unmount).
There was a problem hiding this comment.
Fixed in 94a8fa4 — added rbacAssignedResourceRef guard in both .then() paths to match the existing .catch() guard. Both v1 and v2 updated.
| size="tiny" | ||
| label={ | ||
| foundryRbacStatus === 'assigning' | ||
| ? 'Assigning permissions for Foundry access...' | ||
| : 'Setting up permissions for Foundry access. This may take up to a minute...' | ||
| } |
There was a problem hiding this comment.
These new spinner labels are user-facing strings, but this file otherwise uses react-intl for UI text. Please move these messages to intl.formatMessage(...) (and add loc IDs) so they get localized consistently.
There was a problem hiding this comment.
Fixed in 94a8fa4 — all user-facing strings now use intl.formatMessage() with loc IDs (fRbac1-fRbac5). Added useIntl() to ParameterSection.
| .then((missingRoles) => { | ||
| if (missingRoles.length === 0) { | ||
| setFoundryRbacStatus('not-needed'); | ||
| return Promise.resolve(); | ||
| } | ||
| return Promise.all(missingRoles.map((role) => RoleService().addAppRoleAssignmentForResource(targetResourceId, role.id))).then( | ||
| () => { | ||
| setFoundryRbacStatus('assigned'); | ||
| } | ||
| ); |
There was a problem hiding this comment.
The RBAC assignment promise chain can update foundryRbacStatus for a stale foundryAccountResourceId if the connection/resource changes while the request is in flight. catch guards against this, but the .then path sets state unconditionally, which can incorrectly mark RBAC as assigned/not-needed and enable proxy queries for the wrong resource. Add a guard in the .then path (similar to the catch) to ensure rbacAssignedResourceRef.current === targetResourceId before calling setFoundryRbacStatus(...) (and consider a cleanup flag to avoid setState after unmount).
There was a problem hiding this comment.
Fixed in 94a8fa4 — same stale-resource guard applied to v1 .then() path.
| size="tiny" | ||
| label={ | ||
| foundryRbacStatus === 'assigning' | ||
| ? 'Assigning permissions for Foundry access...' | ||
| : 'Setting up permissions for Foundry access. This may take up to a minute...' | ||
| } |
There was a problem hiding this comment.
These new spinner labels are user-facing strings, but this file otherwise uses react-intl for UI text. Please move these messages to intl.formatMessage(...) (and add loc IDs) so they get localized consistently.
There was a problem hiding this comment.
Fixed in 94a8fa4 — same i18n fix applied to v1. All spinner labels, button text, and aria-labels use intl.formatMessage().
When RBAC propagation retries exhaust (~3min), the UI now transitions from the spinner to a clear message with a 'Retry now' button instead of sitting on the spinner forever. Three states now distinguished: - assigning: 'Assigning permissions for Foundry access...' (spinner) - propagating: 'Setting up permissions... may take up to a minute' (spinner) - exhausted: 'Permissions are still propagating' + Retry button The retry button calls refetch() to restart the query with fresh retries. Uses role='alert' and aria-live='polite' for screen reader announcements. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📊 Coverage CheckThe following changed files need attention:
Please add tests for the uncovered files before merging. |
Address Copilot review feedback: 1. Add rbacAssignedResourceRef guard in .then() path to prevent setting foundryRbacStatus for a stale resource if the user switches Foundry connections while RBAC assignment is in flight. Previously only the .catch() path had this guard. 2. Replace hardcoded English spinner/button labels with intl.formatMessage() calls for localization consistency. Adds useIntl() to ParameterSection component (both v1 and v2). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes 8 audit findings from React performance, accessibility, and code
quality reviews:
P1 (Perf): Hoist EMPTY_PARAM_GROUPS to module constant — prevents
useSelector from returning a new {} reference on every render.
P2 (Perf): Add cancelled flag to RBAC useEffect cleanup — prevents
setState after component unmount.
P4 (Perf): Fix retry window comment (actual is ~5min, not ~3min).
A1 (A11y): Replace role='alert' + aria-live='polite' with role='status'
— fixes contradictory ARIA semantics.
A4 (A11y): Auto-focus retry button via ref when retries exhaust.
A5 (A11y): Use tokens.colorNeutralForeground2 instead of raw CSS var.
C5 (Quality): Log errors in .catch() instead of swallowing silently.
C9 (Quality): Move FoundryRbacStatus type to module scope.
All changes mirrored in both designer and designer-v2.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Commit Type
Risk Level
What & Why
When selecting a Foundry agent connection for the first time, proxy API calls (
/agents,/deployments) and RBAC role assignments fired simultaneously from independentuseEffecthooks. The proxy calls failed immediately (401/403) because roles hadn't been assigned yet, and React Query's default 3 retries exhausted within ~10 seconds — far shorter than Azure's 30s–2min RBAC propagation window.HAR evidence (timeline from real capture):
Fix: Three layered approach
1. RBAC signal + query gating
idle → assigning → assigned/not-needed/failed)enabled: rbacReadyso they don't fire before RBAC PUTs complete2. Extended retry for auth errors
isFoundryAuthError()helper that detects 401/403 from various error shapes (status codes,.code,.message)3. Propagation UX
Impact of Change
isFoundryAuthErrorexported for reuse. Foundry hooks now accept optionalrbacReadyparameter.foundryQueryOptsincludes RBAC-aware retry config.Test Plan
New Tests
foundryRetry.spec.ts— 12 tests forisFoundryAuthErrorcovering: null/undefined, status codes (401/403), error codes, error messages, real-world HAR error shapes, and negative casesExisting Tests (all passing)
foundryUpdates.spec.ts(v1 + v2) — 40 tests ✅foundryAgentService.spec.ts— 33 tests ✅foundryAgentDetails.spec.tsx— 19 tests ✅foundryAgentPicker.spec.tsx— 6 tests ✅role.spec.ts— 6 tests ✅Files Changed
useCognitiveService.ts(×2) —isFoundryAuthError, retry config,rbacReadyparam on hooksparametersTab/index.tsx(×2) — RBAC status tracking, query gating, propagation UIfoundryRetry.spec.ts(new) — Unit tests for auth error detectionContributors
@ccastrotrejo