Skip to content

feat(rbac): foundation — catalog, schema, sqlc, seed, backfill (PR 1 of 4)#283

Open
mcharles-square wants to merge 18 commits into
mainfrom
feat/rbac-pr1-foundation
Open

feat(rbac): foundation — catalog, schema, sqlc, seed, backfill (PR 1 of 4)#283
mcharles-square wants to merge 18 commits into
mainfrom
feat/rbac-pr1-foundation

Conversation

@mcharles-square
Copy link
Copy Markdown
Collaborator

@mcharles-square mcharles-square commented May 21, 2026

First PR of the four-part granular RBAC plan (#282 — the plan PR, updated alongside this branch as the design evolved).

This is the foundation cut: schema, permission catalog, sqlc bindings, seed migration with per-org built-in roles, startup reconciliation, the user_organization backfill, and the onboarding dual-write. The existing RequireAdmin middleware stays in place and continues to be the live authorization path. PR 2 will swap callers to a permission-based gate; until that lands, this PR is purely additive — the new tables are populated and the resolver inputs are ready, but no code path reads them yet.

What landed (foundation surface)

Schema — migration 000052 (server/migrations/000052_create_permission_tables.{up,down}.sql)

  • permission, role_permission, and user_organization_role tables.
  • ALTER TABLE role adds is_builtin, builtin_key, and organization_idevery role is now owned by an organization (built-in or custom). The legacy global uq_role_name UNIQUE(name) from migration 000002 is dropped.
  • Two partial unique indexes on role (live rows only):
    • uq_role_org_builtin_key — per-org built-in identity, keyed on (organization_id, builtin_key).
    • uq_role_org_custom_name — per-org custom uniqueness keyed on (organization_id, LOWER(BTRIM(name))). Case-insensitive and whitespace-trim-tolerant: "Admin", "admin", and " ADMIN " collapse to one row.
  • Two partial unique indexes on user_organization_role (NULL-safe, live-row scoped):
    • uq_user_org_role_org_scope for scope_type = 'org' (omits scope_id to dodge Postgres's NULL-distinct trap).
    • uq_user_org_role_site_scope for scope_type = 'site'.
    • Both predicated on deleted_at IS NULL so re-assignment after UnassignRole is allowed.
  • Composite FK from user_organization_role(role_id, organization_id)role(id, organization_id) (UNIQUE (id, organization_id) on role is the target). An assignment row can no longer reference a role from a different org — DB-enforced tenant isolation.
  • Composite FK to site(id, org_id) for site-scoped assignments, DEFERRABLE INITIALLY DEFERRED for transactional reassignments.
  • CHECK constraints:
    • chk_role_builtin_key_matches_flagis_builtin = TRUE ↔ builtin_key IS NOT NULL.
    • chk_role_custom_name_not_reserved — custom rows cannot be named SUPER_ADMIN, ADMIN, or FIELD_TECH (case-insensitive). Legacy global rows (organization_id IS NULL) are exempt so existing deployments don't fail the validation on upgrade.
    • chk_user_org_role_scope_id_matches_type — org-scope ⇒ scope_id IS NULL; site-scope ⇒ scope_id IS NOT NULL.

Permission catalog — server/internal/domain/authz/catalog.go + proto/authz/v1/authz.proto

29 permission keys across 10 resource groups (fleet, miner, rack, site, serverlog, curtailment, fleetnode, apikey, user, role). Source of truth for everything downstream. Tests assert catalog completeness, key shape (^[a-z]+:[a-z_]+$), and resource grouping.

sqlc queries (server/sqlc/queries/)

Org-aware throughout — every role-touching query takes organization_id. New / changed:

  • permission.sql — list/get/upsert catalog + role-permission insert/delete + PrunePermissionsOutsideKeys (used by SUPER_ADMIN full-reconcile only).
  • role.sqlListBuiltinRolesForOrg, GetBuiltinRoleForOrg, UpsertBuiltinRoleForOrg, ListActiveOrganizationIDs, ListCustomRolesForOrg, CreateCustomRole, UpsertCustomRoleForOrg, UpdateCustomRoleName, SoftDeleteCustomRole. The legacy UpsertRole(name, description) was retargeted to UpsertCustomRoleForOrg(name, description, organization_id) and its ON CONFLICT matches the new partial index expression.
  • user_organization_role.sqlAssignRole, UnassignRole, ListAssignmentsForUser, ListAssignmentsForRole, ListEffectivePermissionsForUser (single-query resolver source), and the last-org-scope-SUPER_ADMIN guards used by the future delete/deactivate paths.
  • All cross-table reads also assert r.organization_id = uor.organization_id as belt-and-suspenders defense in depth.

Seed + per-org reconciliation — migration 000053 + server/internal/domain/authz/{builtin,reconcile}.go

  • 000053 seeds the catalog and one set of three built-in role rows per active organization by cross-joining organization × built-in spec (including soft-deleted orgs, since membership doesn't cascade). Existing user_organization rows that pointed at the legacy global ADMIN/SUPER_ADMIN rows are repointed to each user's per-org replacement; the legacy globals are then soft-deleted.
  • Reconcile(ctx, db) runs at boot from cmd/fleetd/main.go inside a pg_advisory_xact_lock'd transaction. It iterates ListActiveOrganizationIDs and calls SeedOrgBuiltins(ctx, q, orgID) for each.
  • Per-builtin policy:
    • SUPER_ADMIN — fully reconciled to AllPermissions() per org. Tampering is repaired on every boot.
    • ADMIN / FIELD_TECHseeded once on creation; left alone after that. When a per-org row is first inserted, every permission in the seed formula is attached. Subsequent boots leave its role_permission set untouched. The operator owns the role after creation. Re-asserting the seed set every boot would have silently restored permissions the operator deliberately revoked (e.g., miner:update_pools, miner:firmware_update) — exactly the bug Codex caught in round 1.
  • SeedOrgBuiltins is also called from the onboarding flow (see below) so a brand-new org gets its built-ins in the same transaction as org creation.

Backfill — migration 000054 + onboarding dual-write

  • 000054 mirrors every active user_organization row into user_organization_role as an org-scope assignment, after 000053 has repointed legacy role_id references to per-org rows. Soft-deleted user_organization rows are skipped.
  • Onboarding dual-write: CreateAdminUserWithOrganization (founding user, in sqlstores/user.go) and CreateUserOrganizationRole (subsequent users) now write both user_organization AND user_organization_role inside the same transaction. Without this, fresh installs would have a founding SUPER_ADMIN with no row in the assignment table, and the resolver would deny everything once PR 2 starts consuming it.
  • auth.Service.CreateUser swapped from GetRoleByName("ADMIN") (which would return an arbitrary per-org ADMIN row across orgs) to GetBuiltinRoleForOrg(orgID, AdminRoleName).

Plan deviation (still in force)

The plan's U5 also renamed user_organization.role_id to role_id_deprecated_do_not_use and added a raising trigger on non-NULL writes. That part is moved to PR 2's opening migration so it lands together with the caller swap. Doing it in PR 1 would break the existing onboarding flow the moment this deploys. The plan's intent — loud failure during the soak window — is preserved; the soak boundary just shifts from "between PR 1 and U12" to "between PR 2 and U12". Noted inline in 000054_backfill_user_organization_to_assignments.up.sql.

What this PR does NOT do

  • Does not add the RequirePermission middleware or resolver — PR 2.
  • Does not swap any existing RequireAdmin callers — PR 2.
  • Does not rename user_organization.role_id or add the raising trigger — PR 2.
  • Does not add role/assignment management RPCs — PR 3.
  • Does not change UserInfo.permissions — PR 3.
  • Does not touch the React client — PR 3.
  • Does not drop the legacy user_organization.role_id column — PR 4.

Review history baked in

Four rounds of automated review (Codex Security + Codex P-tier + Copilot) drove substantial design changes captured above. Headline outcomes:

  • Cross-tenant authorization bleed closed. The original design had global built-in rows that a SUPER_ADMIN editing ADMIN would affect across every org. Switched to per-org built-ins.
  • NULL-distinct assignment-uniqueness loophole closed. Replaced a naive UNIQUE (… , scope_id) with two partial unique indexes; soft-delete no longer blocks re-assignment.
  • Cross-tenant role-assignment loophole closed. Composite FK from user_organization_role to role(id, organization_id) makes a cross-org pointer structurally impossible.
  • Editable-built-in footgun closed. Reconciliation moved from "re-assert seed set every boot" (which silently restored revoked permissions) to "seed on first creation, leave alone after."
  • Reserved-name collisions closed. chk_role_custom_name_not_reserved is case-insensitive and whitespace-trim-tolerant; the partial unique index for customs is case-folded so "Admin"/"admin" collapse to one row per org.
  • Rollback safety. 000054 down hard-deletes mirrored assignments (was soft-delete, which held the FK and blocked role deletes); 000052 down's custom-cleanup runs before any table or column drops; legacy global rows are preserved through the rollback so user_organization.role_id doesn't dangle.
  • Soft-deleted orgs get built-ins too so the 000053 repoint and 000054 backfill produce FK-valid rows even when a soft-deleted org has active memberships.

All review threads from the four rounds are resolved on the PR.

Test plan

  • go build ./... clean
  • go vet ./... clean
  • lefthook run pre-commit and lefthook run pre-push (golangci-lint + client-typecheck) both green
  • just gen clean — sqlc statically parses the migrations and produces compiling bindings; proto generation produces authz/v1/authz_pb.ts and authz/v1/authz.pb.go
  • Non-DB tests pass: go test ./internal/domain/authz/... -run 'TestAllPermissions|TestCatalog|TestLookup|TestResourceOrder'
  • DB-dependent tests need Dockerjust dev to bring up Postgres, then go test ./internal/domain/authz/.... Covers: fresh-install seeding (TestReconcile_FreshInstall_*), idempotency, per-org isolation (TestReconcile_PerOrgIsolation_*), seed-on-create semantics (TestReconcile_OperatorRevokedSensitivePermStaysRevokedOnRestart, TestReconcile_NewCatalogPermissionDoesNotPropagateToExistingBuiltins), SUPER_ADMIN tamper repair, NULL-safe assignment uniqueness (TestAssignment_DuplicateLiveOrgScopeRejected, TestAssignment_SoftDeletedRowDoesNotBlockReassign), backfill behavior, soft-deleted-org seeding (TestSeed_SoftDeletedOrgStillGetsBuiltins), onboarding dual-write (TestOnboarding_FoundingUserGetsOrgScopeSuperAdminAssignment), reserved built-in names case-insensitive (TestCustomRole_ReservedBuiltinNamesRejectedCaseInsensitively), case-insensitive per-org custom uniqueness (TestCustomRole_CaseInsensitiveUniquenessPerOrg), and same-name-across-orgs-allowed (TestCustomRole_SameNameAcrossOrgsAllowed).
  • Migration smokejust dev applies 000052–000054 cleanly; psql confirms per-org built-in role rows exist with the expected permission counts and all CHECK / FK constraints are in place.

Rollback

Each migration has a paired *.down.sql. The reconciler is invoked once at boot; reverting this PR removes that call. Down migrations were rewritten across the review rounds to be order-correct (custom cleanup before any DROPs, table-existence dependencies respected) and to preserve the legacy global ADMIN/SUPER_ADMIN rows that downstream rollback steps reference. 000054 hard-deletes the mirrored assignment rows it created instead of soft-deleting, so the role-table down-DELETE that follows succeeds against the ON DELETE RESTRICT composite FK.

Adds the technical design for moving Proto Fleet from the current
SUPER_ADMIN/ADMIN/RequireAdmin gate to a granular permission model with
scoped role assignments, editable built-in roles, and per-org custom
roles.

Key decisions baked in:
- Three seeded roles: SUPER_ADMIN (immutable), ADMIN and FIELD_TECH
  (editable by SUPER_ADMIN, not deletable)
- Permission catalog plus read-pairing rule (action perms require
  matching read perms; enforced at role save)
- Site-scope narrowing semantics (site-scoped assignment overrides
  org-scoped grant at that site; org grant still applies elsewhere)
- Filter-don't-reject list semantics for site-scoped readers
- Route-guard parity: every primary-nav entry has a permission
  predicate; nav hides AND route redirects when false
- Additive-only startup reconciliation for ADMIN/FIELD_TECH so
  operator edits survive upgrades; SUPER_ADMIN fully reconciled
- Two error codes: BUILTIN_ROLE_IMMUTABLE (SUPER_ADMIN only) and
  BUILTIN_ROLE_NON_DELETABLE (ADMIN/FIELD_TECH delete attempts)
Add the canonical permission catalog as a Go code constant in
server/internal/domain/authz/catalog.go and the matching Permission
wire shape in proto/authz/v1/authz.proto. The catalog is the single
source of truth for U2's seed migration, U4's startup reconciliation,
U6's resolver lookup, U8's role-save validation, and U10's client-side
gating.

29 permissions across 10 resource groups (fleet, miner, rack, site,
serverlog, curtailment, fleetnode, apikey, user, role). No apikey:read
— the API Keys page lives under route-guarded Settings, so a
viewer-only split has no reachable UI.

Tests cover catalog completeness (every Perm* constant is in the
catalog), key shape (^[a-z]+:[a-z_]+$), fresh-copy semantics, and
resource grouping.
Migration 000051 lays the schema foundation for the RBAC v2 work:

- ALTER role ADD is_builtin, builtin_key (stable identifier code uses)
- CREATE permission (catalog) — reconciled at startup from
  domain/authz/catalog.go, U4 will land the reconciliation
- CREATE role_permission (join, ON DELETE RESTRICT on permission_id
  so obsolete catalog rows cannot silently drop)
- CREATE user_organization_role (multi-assignment with org/site
  scope, composite FK to site(id, org_id) for DB-enforced tenant
  isolation, DEFERRABLE so transactional reassignments work)

Existing user_organization.role_id is preserved unchanged; U5
(migration 000053) will backfill assignments and neutralize the legacy
column with a raising trigger. Drop happens in U12 after a soak
window.
…s (U3)

Adds sqlc queries used by the resolver (U6), seed reconciliation (U4),
and role management RPCs (U8):

- permission.sql — list/get/upsert catalog rows, role-permission
  insert/delete, and the SUPER_ADMIN PrunePermissionsOutsideKeys
  used by full reconciliation
- user_organization_role.sql — AssignRole/UnassignRole, list helpers,
  ListEffectivePermissionsForUser (the single-query resolver source),
  and the last-org-scope-SUPER_ADMIN guards used by U8
- role.sql extensions — ListBuiltinRoles, UpsertBuiltinRole,
  CreateCustomRole, UpdateCustomRoleName, SoftDeleteCustomRole.
  Custom-mutation queries filter `builtin_key IS DISTINCT FROM
  'SUPER_ADMIN'`; SoftDeleteCustomRole is locked to is_builtin=FALSE
  (delete locked for every built-in)

`just gen` produces clean bindings; server still builds.
Migration 000052 seeds the 29-permission catalog and the three
built-in role rows (SUPER_ADMIN, ADMIN, FIELD_TECH) so a fresh
install has usable state before the server's Go-level reconciler
runs.

server/internal/domain/authz/builtin.go declares the in-code seed
formulas — SUPER_ADMIN gets AllPermissions() and is the only role
reconciled fully; ADMIN gets AllPermissions() minus {user:*,
role:manage} via a code formula; FIELD_TECH gets an explicit set
(fleet:read, miner:read, miner:blink_led, miner:download_logs,
rack:read, rack:manage). ADMIN and FIELD_TECH are reconciled
additive-only so operator edits via UpdateCustomRole (U8) persist.

reconcile.go runs inside a single transaction holding
pg_advisory_xact_lock keyed on hashtextextended('authz:builtin_reconcile')
so rolling deploys and autoscaler events serialize cleanly.
Convergence uses idempotent inserts plus a targeted prune for
SUPER_ADMIN only — no delete-then-insert window where SUPER_ADMIN
has zero rows.

main.go invokes authz.Reconcile after db.ConnectAndMigrate and before
any service is constructed, so no request can be served before
built-in roles are reconciled.

Tests cover: fresh install (all three roles, correct permission
counts), idempotency, operator-edit survival on ADMIN and
FIELD_TECH, SUPER_ADMIN tamper repair, obsolete-permission pruning
on SUPER_ADMIN, concurrent reconciles. DB-dependent tests need
Postgres running locally (just dev) — non-DB catalog tests pass
green.
…ion_role (U5)

Migration 000053 mirrors every active user_organization row into the
new user_organization_role table as an org-scope assignment, so users
keep their existing access once the resolver in PR 2 starts evaluating
assignments instead of the legacy role_id column. Soft-deleted
user_organization rows are skipped. The down migration soft-deletes
only the rows the up migration created — assignments granted via the
new RPCs post-backfill survive a rollback.

Plan deviation called out explicitly in the migration comment: the
plan's U5 also renamed user_organization.role_id to
role_id_deprecated_do_not_use and added a raising trigger on non-NULL
writes. That part is moved to PR 2's opening migration so it lands
together with the caller swap in U6/U7. Doing the rename here would
break the existing onboarding flow (CreateUserOrganization,
UpdateUserRole) the moment PR 1 deploys, which would leave PR 1
unshippable without PR 2 already in flight. The plan's intent — loud
failure during the soak window — still holds, just inside PR 2's
boundary.

Tests cover the backfill INSERT against a real Postgres
(testutil.GetTestDB), exercise the ON CONFLICT path with a re-run,
prove soft-deleted rows are not copied, and confirm idempotency.
DB-dependent; needs Docker + just dev to execute locally.
…int)

The intrange linter flagged the concurrent-reconcile test's classic
three-clause for. Cosmetic only.
Copilot AI review requested due to automatic review settings May 21, 2026 12:38
@mcharles-square mcharles-square requested a review from a team as a code owner May 21, 2026 12:38
@github-actions github-actions Bot added documentation Improvements or additions to documentation javascript Pull requests that update javascript code client server shared labels May 21, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f33566ac9e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/migrations/000051_create_permission_tables.up.sql Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (2bf18b57c9765ca25a3634b67b225ee1b4b9e91d...7e95091ad8e136daf69cda50179068c899d39c93, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: HIGH

Findings

[HIGH] Non-super-admin sessions can create a new ADMIN user

  • Category: Auth
  • Location: server/internal/domain/auth/service.go:613
  • Description: CreateUser resolves the current org’s built-in ADMIN role and then creates a user with that role, but this path still does not call checkCanManageUser or enforce the new user:manage permission before doing so. The endpoint is session-only, but session-only is not authorization. With the new RBAC foundation and dual-write, a logged-in non-super-admin role can bootstrap a fresh ADMIN account in both user_organization and user_organization_role.
  • Impact: Any authenticated lower-privilege user in an org can escalate to ADMIN by creating an account, receiving the temporary password, and logging in as that account. ADMIN still passes existing RequireAdmin checks across many handlers.
  • Recommendation: Enforce checkCanManageUser(ctx, orgID) or the new user:manage permission before generating the temp password or looking up the ADMIN role. Add a regression test using a VIEWER/FIELD_TECH session against CreateUser.

[MEDIUM] Last-SUPER_ADMIN guards count deactivated users

  • Category: Auth | Reliability
  • Location: server/sqlc/queries/user_organization_role.sql:98
  • Description: The new CountOrgScopeSuperAdminsExcludingAssignment and CountOrgScopeSuperAdminsExcludingUser queries count active assignment rows, but they do not join "user" or filter u.deleted_at IS NULL. Since deactivation is a soft delete and this PR starts dual-writing assignment rows, a deactivated SUPER_ADMIN can still be counted as preserving the org’s last super-admin.
  • Impact: Future unassign/deactivate flows using these guards can allow removal of the last active SUPER_ADMIN, leaving the organization without a usable recovery/admin account.
  • Recommendation: Join "user" in both count queries and require active users, and include r.deleted_at IS NULL. When deactivating a user, also soft-delete or otherwise exclude their user_organization_role rows in the same transaction.

Notes

No cryptostealing/pool hijack changes, command execution changes, plugin boundary changes, discovery changes, or infrastructure changes were present in the reviewed diff.


Generated by Codex Security Review |
Triggered by: @mcharles-square |
Review workflow run

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Lays the foundation for the upcoming granular RBAC rollout by introducing a permission catalog, the supporting database schema + sqlc bindings, seed/backfill migrations, and a startup reconciliation step to keep built-in roles and the catalog converged over time (while keeping RequireAdmin as the active authorization path for now).

Changes:

  • Added RBAC v2 schema: permission, role_permission, and user_organization_role (scoped assignments) plus built-in awareness on role.
  • Seeded the initial permission catalog + built-in roles via migration and added a boot-time reconciler to keep catalog/roles in sync.
  • Backfilled org-scope assignments from legacy user_organization to preserve existing access.

Reviewed changes

Copilot reviewed 18 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server/sqlc/queries/user_organization_role.sql New sqlc queries for scoped role assignments and effective-permission joins.
server/sqlc/queries/role.sql Adds built-in role lookups and custom-role CRUD queries.
server/sqlc/queries/permission.sql Adds catalog upsert/lookups and role-permission mutation helpers.
server/migrations/000051_create_permission_tables.up.sql Creates permission/role-permission/assignment tables; adds built-in columns to role.
server/migrations/000051_create_permission_tables.down.sql Drops RBAC v2 tables and removes built-in columns from role.
server/migrations/000052_seed_builtin_roles.up.sql Seeds permission rows and built-in roles (SUPER_ADMIN/ADMIN/FIELD_TECH) + role_permission seeds.
server/migrations/000052_seed_builtin_roles.down.sql Attempts to undo the seed state for roles/permissions.
server/migrations/000053_backfill_user_organization_to_assignments.up.sql Backfills user_organization_role from legacy user_organization.
server/migrations/000053_backfill_user_organization_to_assignments.down.sql Soft-deletes backfilled assignment rows on rollback.
server/internal/domain/authz/reconcile.go New boot-time reconciler for catalog + built-in roles/permissions.
server/internal/domain/authz/reconcile_test.go DB-dependent tests for reconcile correctness, idempotency, and concurrency.
server/internal/domain/authz/catalog.go Introduces the in-code permission catalog and related helper APIs.
server/internal/domain/authz/catalog_test.go Unit tests enforcing catalog key shape, uniqueness, and completeness.
server/internal/domain/authz/builtin.go Declares built-in role specs and seed permission formulas.
server/internal/domain/authz/backfill_test.go DB-dependent tests validating backfill behavior and idempotency.
server/cmd/fleetd/main.go Calls authz.Reconcile(...) after migrations and before serving.
proto/authz/v1/authz.proto New proto messages for permissions and permission groups.
server/generated/grpc/authz/v1/authz.pb.go Generated Go types for the new authz.v1 protos.
client/src/protoFleet/api/generated/authz/v1/authz_pb.ts Generated TS types for the new authz.v1 protos.
server/generated/sqlc/user_organization.sql.go Regenerated sqlc output reflecting new role columns.
server/generated/sqlc/user_organization_role.sql.go Generated sqlc bindings for user_organization_role queries.
server/generated/sqlc/role.sql.go Generated sqlc bindings for new role queries/mutations.
server/generated/sqlc/permission.sql.go Generated sqlc bindings for permission catalog and role-permission operations.
server/generated/sqlc/models.go Generated sqlc models for Permission/RolePermission/UserOrganizationRole + updated Role model.
server/generated/sqlc/db.go Generated sqlc statement preparation/close wiring for new queries.
docs/plans/2026-05-19-001-feat-granular-rbac-plan.md Adds/updates the accepted granular RBAC plan doc (referenced by this PR).

Comment thread server/migrations/000051_create_permission_tables.up.sql Outdated
Comment thread server/migrations/000053_seed_builtin_roles.down.sql
Comment thread server/sqlc/queries/user_organization_role.sql Outdated
Comment thread server/internal/domain/authz/catalog.go Outdated
… assignment uniqueness, downstream cleanups

Addresses the codex/copilot review threads on PR #283 plus the Codex
Security HIGH finding on cross-tenant role bleed.

**Schema (000051):**
- Replace uq_user_org_role_scope (broken on org-scope NULL scope_id)
  with two partial unique indexes scoped to live rows:
    uq_user_org_role_org_scope  WHERE scope_type = 'org'  AND deleted_at IS NULL
    uq_user_org_role_site_scope WHERE scope_type = 'site' AND deleted_at IS NULL
  This closes the NULL-distinct loophole AND unblocks re-assignment
  after UnassignRole (soft-delete no longer collides).
- Per-org role model: add role.organization_id with FK to
  organization(id) ON DELETE CASCADE. Drop the global uq_role_name
  from 000002 and replace with two partial unique indexes — built-ins
  unique per (org, builtin_key), customs unique per (org, name).
  Editing one org's ADMIN cannot leak into another org's ADMIN.
- Add CHECK chk_role_builtin_key_matches_flag enforcing the
  is_builtin ↔ builtin_key invariant at the DB layer.

**Seed (000052):**
- Per-org seed: cross-join active orgs with the built-in spec list
  so every org gets its own SUPER_ADMIN, ADMIN, FIELD_TECH rows.
- Repoint existing user_organization.role_id from the legacy global
  ADMIN/SUPER_ADMIN rows to the per-org replacements (match by name).
- Soft-delete the legacy global rows.
- Down migration deletes role_permission unconditionally before
  permission rows (thread 3 — RESTRICT FK fired the other way).

**Backfill (000053):** ON CONFLICT clause updated to target the
partial index (user_id, organization_id, role_id) WHERE scope_type
= 'org' AND deleted_at IS NULL.

**Reconciler:** Rewritten to list active orgs and call seedOrgBuiltins
for each. Public SeedOrgBuiltins helper exposed so the onboarding
flow can seed a brand-new org's built-ins inside its existing
creation transaction. Per-org policy matches the plan — SUPER_ADMIN
full reconcile, ADMIN/FIELD_TECH additive-only.

**Onboarding wiring:** CreateAdminUserWithOrganization stops calling
the global UpsertRole; it now calls authz.SeedOrgBuiltins and uses
the returned per-org SUPER_ADMIN id for the user's user_organization
row. roleName/roleDescription params kept on the interface for
backward compatibility but ignored (built-in definitions come from
authz/builtin.go).

**sqlc queries:** UpsertBuiltinRoleForOrg, GetBuiltinRoleForOrg,
ListBuiltinRolesForOrg, ListActiveOrganizationIDs, CreateCustomRole
(takes org_id), ListCustomRolesForOrg (filters by org_id),
UpdateCustomRoleName and SoftDeleteCustomRole gated on
is_builtin = FALSE.

**Catalog (thread 5):** CatalogByResource comment corrected — map
iteration is not stable. Added ResourceOrder() for callers that need
declaration-driven order.

**Tests:**
- New: per-org-isolation test proves editing org A's ADMIN does not
  affect org B's ADMIN (the whole point of Option C).
- New: TestAssignment_SoftDeletedRowDoesNotBlockReassign and
  TestAssignment_DuplicateLiveOrgScopeRejected lock in the fixed
  unique-index behavior.
- All existing reconcile/backfill tests rewritten to operate per-org
  (helpers take an orgID; tests insert an org before calling
  Reconcile).

Resolves codex/copilot review threads on PR #283.
@mcharles-square
Copy link
Copy Markdown
Collaborator Author

Codex Security Review — response

Thanks for the careful read. Both findings are addressed in commit 0e47b16.

HIGH — Custom roles global across organizations

CreateCustomRole inserts custom roles into the global role table with no organization_id … a user with role:manage in one organization can create or modify role definitions that may be assigned in other organizations.

This was a real cross-tenant bleed risk and went deeper than just custom roles — the built-in ADMIN and FIELD_TECH rows were also global and (per the plan) editable, so a SUPER_ADMIN in org A editing ADMIN would also affect every user holding ADMIN in org B.

Switched the design to per-org built-ins (every role is owned by an organization):

  • role now carries organization_id BIGINT NOT NULL (legacy global rows are soft-deleted by 000052 after their references are repointed; future migrations will tighten to NOT NULL).
  • Dropped uq_role_name (the global name unique from 000002). Replaced with two partial unique indexes on live rows:
    • uq_role_org_builtin_key — built-ins unique per (organization_id, builtin_key).
    • uq_role_org_custom_name — customs unique per (organization_id, name).
  • CreateCustomRole and ListCustomRolesForOrg now take organization_id. Custom roles can never be created without an owning org and can never be listed cross-tenant.
  • Built-in seeding moved into authz.SeedOrgBuiltins, called by both the boot reconciler (looping over every active organization) and CreateAdminUserWithOrganization (inside the existing org-creation transaction). A new org gets its own SUPER_ADMIN/ADMIN/FIELD_TECH rows before any user is assigned.
  • 000052 migrates existing prod state by cross-joining organization × built-in spec, creating per-org rows, repointing user_organization.role_id from the legacy global rows to the per-org replacements, and soft-deleting the legacy rows.

New regression test TestReconcile_PerOrgIsolation_EditingOneOrgDoesNotAffectAnother proves the property.

MEDIUM — Assignment uniqueness does not work for org-scope rows and conflicts with soft delete

uq_user_org_role_scope includes nullable scope_id. PostgreSQL treats NULL values as distinct in normal unique constraints … because deleted_at is not part of a partial uniqueness strategy, site-scope assignments cannot be re-created after soft delete.

Fixed both halves by replacing uq_user_org_role_scope with two partial unique indexes scoped to live rows:

  • uq_user_org_role_org_scope on (user_id, organization_id, role_id) WHERE scope_type = 'org' AND deleted_at IS NULL — omits scope_id entirely so the NULL-distinct trap can't bite.
  • uq_user_org_role_site_scope on (user_id, organization_id, role_id, scope_id) WHERE scope_type = 'site' AND deleted_at IS NULL — soft-deleted rows leave the index.

New tests:

  • TestAssignment_DuplicateLiveOrgScopeRejected — second live insert with the same (user, org, role, 'org', NULL) is rejected.
  • TestAssignment_SoftDeletedRowDoesNotBlockReassignAssignRole after UnassignRole of the same tuple succeeds.

The backfill ON CONFLICT was also retargeted to the new org partial index.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e47b16251

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/migrations/000053_seed_builtin_roles.up.sql
Per project feedback: code comments should document the why directly,
not point at plan/PR labels that can rot. Stripped "U4", "U6", "U8",
"U11", "PR 2", and similar references from catalog.go, reconcile.go,
the sqlc query files, the migrations, and the onboarding hook in
user.go; rewrote each affected comment so it stands on its own.

Behavior unchanged.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef9ba2cee8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/migrations/000052_create_permission_tables.up.sql
Comment thread server/migrations/000051_create_permission_tables.up.sql Outdated
… per-org admin lookup + dual-write

Addresses round-2 PR review:

**Reconciler (Codex Security HIGH #1):** previous implementation
re-asserted every seed permission on every boot for ADMIN and
FIELD_TECH, which silently restored operator-revoked rows the next
time fleetd started. The intent of "additive-only" was operator-edits-
survive — actual behavior was "seed-set-is-restored." Fixed by
splitting reconciliation: GetBuiltinRoleForOrg first; if the row does
not exist, INSERT and seed every permission in the spec; if it does
exist, SUPER_ADMIN gets full reconcile (assign + prune), ADMIN and
FIELD_TECH get nothing. Once a per-org built-in row exists the
operator owns it; new catalog permissions only show up on roles the
operator opts into editing. Two new regression tests:
TestReconcile_OperatorRevokedSensitivePermStaysRevokedOnRestart and
TestReconcile_NewCatalogPermissionDoesNotPropagateToExistingBuiltins.

**Composite role FK (Codex Security HIGH #2):** added UNIQUE(id,
organization_id) on role and replaced user_organization_role's
role_id FK with a composite FK on (role_id, organization_id) →
role(id, organization_id). An assignment can no longer reference a
role owned by a different organization — DB-enforced tenant
isolation. ListEffectivePermissionsForUser,
CountOrgScopeSuperAdminsExcludingAssignment, and
CountOrgScopeSuperAdminsExcludingUser now also assert
r.organization_id = uor.organization_id in their joins as defense in
depth.

**Per-org ADMIN lookup (Codex thread P1):** auth.Service.CreateUser
was calling GetRoleByName(AdminRoleName) which now returns an
arbitrary per-org ADMIN row across orgs. Added
UserManagementStore.GetBuiltinRoleForOrg(orgID, builtinKey) and the
SQL implementation; CreateUser uses it with the user's actual orgID
so the new user is bound to their own org's ADMIN row.

**Dual-write into user_organization_role (Codex Security MEDIUM):**
new users created post-migration only landed in legacy
user_organization. The resolver, once it goes live, reads from
user_organization_role, which means fresh installs would have a
founding SUPER_ADMIN with no effective permissions. Both
CreateAdminUserWithOrganization and CreateUserOrganizationRole now
dual-write: legacy user_organization plus an org-scope row in
user_organization_role. New test
TestOnboarding_FoundingUserGetsOrgScopeSuperAdminAssignment locks
this in.
@mcharles-square
Copy link
Copy Markdown
Collaborator Author

Codex Security Review (round 2) — response

All three findings addressed in dcf5224. Plan PR #282 updated in bac7657 to reflect the corrected semantics.

HIGH — Editable built-in roles regain revoked permissions on restart

reconcileBuiltinPermissions unconditionally reassigns every seed permission before only pruning SUPER_ADMIN. For ADMIN and FIELD_TECH, this means any operator-removed seed permission is silently restored on the next fleetd restart.

Spot-on. The original wording "additive-only" was misleading — the implementation re-asserted the seed set on every boot, which silently restored anything an operator deliberately revoked. Fixed by switching to seed-on-create: seedOrgBuiltins now calls GetBuiltinRoleForOrg first; if the row does not exist, INSERT and seed all permissions; if it exists, SUPER_ADMIN gets full reconcile (assign + prune) and ADMIN/FIELD_TECH get nothing. Once a per-org built-in row exists the operator owns it.

Two regression tests added:

  • TestReconcile_OperatorRevokedSensitivePermStaysRevokedOnRestart — operator revokes miner:update_pools, miner:firmware_update, and miner:reboot from ADMIN; multiple reconciles leave all three revoked.
  • TestReconcile_NewCatalogPermissionDoesNotPropagateToExistingBuiltins — inserts a new permission row directly to simulate catalog growth; reconcile leaves existing ADMIN and FIELD_TECH unchanged.

Trade-off: new catalog keys added in future releases no longer auto-propagate to existing orgs' ADMIN/FIELD_TECH. Operators add them via the role editor. That's the correct ownership boundary — "the operator owns this role" cannot also mean "the deploy can quietly widen it."

HIGH — Role assignments can reference roles from another organization

user_organization_role.role_id references only role(id), not (role_id, organization_id).

Fixed structurally:

  • New constraint uq_role_id_org_id UNIQUE (id, organization_id) on role.
  • user_organization_role's FK on role_id replaced with a composite FK (role_id, organization_id) → role(id, organization_id). An assignment can no longer point at a role owned by a different org.
  • ListEffectivePermissionsForUser, CountOrgScopeSuperAdminsExcludingAssignment, and CountOrgScopeSuperAdminsExcludingUser now also assert r.organization_id = uor.organization_id in their joins as defense in depth — the FK is the structural guarantee, the join filter is a belt-and-suspenders check that ensures the resolver never even considers a cross-tenant row even if the FK were dropped in a future migration.

MEDIUM — New users are not mirrored into the new assignment table

Onboarding and user creation still write only to legacy user_organization; they do not also insert the corresponding org-scope row into user_organization_role. Fresh installs are especially exposed: migrations run before onboarding, so the founding SUPER_ADMIN gets no row in user_organization_role.

Both CreateAdminUserWithOrganization (founding user) and CreateUserOrganizationRole (subsequent users) now dual-write: legacy user_organization row plus an org-scope row in user_organization_role. Both writes happen inside the same transaction as the user/org create so a failure cleanly rolls back the whole flow.

New test TestOnboarding_FoundingUserGetsOrgScopeSuperAdminAssignment:

  1. Calls CreateAdminUserWithOrganization against a fresh DB.
  2. Asserts exactly one assignment row exists for the user/org.
  3. Asserts the assignment is scope_type='org', scope_id IS NULL.
  4. Asserts the assigned role_id matches the org's per-org SUPER_ADMIN row (not a legacy global).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dcf5224bf1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/migrations/000052_seed_builtin_roles.up.sql Outdated
Migration 000051 drops uq_role_name, so the old UpsertRole's
ON CONFLICT (name) was broken at runtime — any caller would fail with
"no unique or exclusion constraint matching the ON CONFLICT
specification." Codex P2 caught it.

Replaced UpsertRole with UpsertCustomRoleForOrg(name, description,
organization_id). ON CONFLICT now targets the partial unique index
uq_role_org_custom_name WHERE is_builtin = FALSE AND
deleted_at IS NULL — the correct conflict surface for per-org custom
roles. Built-ins still go through UpsertBuiltinRoleForOrg.

Updated callers:

- server/internal/testutil/database_queries.go: CreateSuperAdminUser
  and CreateSuperAdminUser2 now call authz.SeedOrgBuiltins to get the
  per-org SUPER_ADMIN row (mirroring the real onboarding flow), and
  dual-write into user_organization_role so test users land in both
  tables.
- server/internal/handlers/apikey/handler_test.go: the VIEWER custom
  role created for the non-admin auth test uses UpsertCustomRoleForOrg
  with the test's existing adminUser.OrganizationID; the test user is
  also dual-written into user_organization_role.

Build + lint clean.
Codex caught: SoftDeleteOrganization only marks
organization.deleted_at — it doesn't cascade to user_organization. A
soft-deleted org can still have active user_organization rows. If
000052 filtered those orgs out of the per-org built-in seed:

1. Step 3 (repoint user_organization.role_id to per-org rows) finds
   no per-org built-in for the soft-deleted org and leaves the row
   pointing at the legacy global ADMIN/SUPER_ADMIN.
2. 000053 backfills that row into user_organization_role with
   role_id = legacy global (organization_id NULL) and
   organization_id = the soft-deleted org's id.
3. The composite FK fk_user_org_role_role (role_id, organization_id)
   → role(id, organization_id) rejects the row because the legacy
   global has organization_id NULL.

Deploy fails on any environment with soft-deleted org memberships.

Fixed by dropping the org.deleted_at IS NULL filter in the per-org
seed; soft-deleted orgs now get their three built-in rows, the
repoint succeeds, and the backfill produces FK-valid rows. Three rows
per soft-deleted org is bounded and lets undelete "just work."
ListActiveOrganizationIDs (used by the boot reconciler) still excludes
deleted orgs, so we don't reconcile their permissions on every boot.

New test TestSeed_SoftDeletedOrgStillGetsBuiltins inserts a
soft-deleted org and asserts SeedOrgBuiltins produces all three
per-org rows.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54336d8a35

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/migrations/000052_create_permission_tables.down.sql
…ering

Round-3 PR review fixes:

**000053 down — hard-delete mirrored assignments (Codex Security HIGH):**
Soft-deleted user_organization_role rows still hold the composite FK
(ON DELETE RESTRICT) to per-org built-in roles, so the next step in
the rollback chain — 000052 down's DELETE on those roles — failed
mid-rollback. Switched to hard delete. Removed the redundant
deleted_at filter so the down also clears rows produced by dual-write
paths during the soak window.

**000051 up — reserve built-in names from custom rows (Codex Security MEDIUM):**
Existing authorization gates still compare session.Info.Role by
string. A custom row named 'ADMIN' or 'SUPER_ADMIN' could be mistaken
for the built-in by name-based code paths. Added CHECK
chk_role_custom_name_not_reserved rejecting customs with names
'SUPER_ADMIN', 'ADMIN', or 'FIELD_TECH'.

**000051 down — handle duplicate custom names (Codex P1 thread):**
The partial unique index uq_role_org_custom_name allows two orgs to
have customs with the same name. The down's restoration of
UNIQUE(name) would then fail with a duplicate-key error and abort
mid-rollback. Hard-delete every custom row before restoring the
global constraint. PR 1 ships no RPC to create customs so this is a
no-op in production today; it makes the down safe forward.

**backfill_test.go — insert org before Reconcile (Codex Security MEDIUM):**
Tests called authz.Reconcile then insertTestOrganization, but
Reconcile only iterates orgs that exist at the time it runs.
Reordered so the org is in place before the boot reconcile runs and
the per-org built-ins are created. All five affected tests fixed.
@mcharles-square
Copy link
Copy Markdown
Collaborator Author

Codex Security Review (round 3) — response

All three findings addressed in 6105c60.

HIGH — RBAC rollback is blocked by retained assignment FK references

000053 down only soft-deletes user_organization_role rows, but those rows still retain FK references to per-org built-in role rows. The next rollback step, 000052 down, then tries to DELETE those built-in roles, which will fail under fk_user_org_role_role.

Fixed: 000053 down now hard-deletes the mirrored assignment rows (using DELETE ... USING user_organization instead of UPDATE ... SET deleted_at). Also dropped the redundant uo.deleted_at IS NULL filter so the down also clears rows produced by the dual-write paths during the soak window. After 000053 down runs, no live or soft-deleted rows hold the FK, and 000052 down's DELETE on per-org built-in roles succeeds.

MEDIUM — Custom roles can use built-in role names while authorization still trusts role names

A custom role named ADMIN or SUPER_ADMIN could be mistaken for the built-in role by current name-based gates, creating a privilege-escalation path.

Fixed at the DB layer in 000051 up:

ALTER TABLE role
    ADD CONSTRAINT chk_role_custom_name_not_reserved CHECK (
        is_builtin = TRUE
        OR name NOT IN ('SUPER_ADMIN', 'ADMIN', 'FIELD_TECH')
    );

Application code that creates custom roles will surface this as a clear constraint violation. The structural block holds even if a future migration or a direct DB insert tries to create a colliding name.

MEDIUM — New backfill tests create orgs after reconciliation

Insert the test organization before authz.Reconcile, or call authz.SeedOrgBuiltins after direct org inserts.

Fixed: all five affected tests in backfill_test.go now insertTestOrganization(t, db) and insertTestUser(t, db) before authz.Reconcile(ctx, db), so the org exists when Reconcile iterates ListActiveOrganizationIDs and the per-org built-ins are created. reconcile_test.go already used the correct ordering — backfill was the outlier.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6105c60b75

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/migrations/000052_create_permission_tables.up.sql
…tolerant

The round-3 chk_role_custom_name_not_reserved CHECK and the
uq_role_org_custom_name partial unique index were both case-sensitive.
That left two real gaps:

1. A custom role named "admin" / "Admin" / " ADMIN " passed the CHECK
   and didn't collide with the built-in "ADMIN" in the unique index.
   The existing name-based gates (`info.Role != "ADMIN"`) wouldn't be
   fooled — they're also case-sensitive uppercase — but the role list
   would show near-duplicates that read identically, and any future
   code change to case-insensitive comparison would reopen the gate
   bypass.
2. Within a single org, an operator could create "admin" and "Admin"
   as two distinct custom roles. Confusing UX and a soft attack
   surface.

Both closed:

- CHECK rewritten as `LOWER(BTRIM(name)) NOT IN
  ('super_admin', 'admin', 'field_tech')` so case variants and
  whitespace-padded forms are rejected at insert time.
- Partial unique index `uq_role_org_custom_name` keyed on
  `(organization_id, LOWER(BTRIM(name)))` so case variants collapse
  to one row per org. Display capitalization is preserved — the
  `name` column stores the original; only the index key is
  case-folded.
- `UpsertCustomRoleForOrg`'s ON CONFLICT target updated to match the
  new index expression so upserts hit the existing row instead of
  failing.

Three new regression tests:
- TestCustomRole_ReservedBuiltinNamesRejectedCaseInsensitively walks
  every reasonable case/whitespace variant of the three built-in
  names and asserts each is rejected.
- TestCustomRole_CaseInsensitiveUniquenessPerOrg upserts "Floor
  Manager" then "FLOOR MANAGER" against the same org and asserts
  both calls return the same role id (and the original "Floor
  Manager" casing is preserved).
- TestCustomRole_SameNameAcrossOrgsAllowed proves per-org isolation:
  two different orgs can each have a "Floor Manager" custom role.

Homoglyph attacks (Cyrillic А for Latin A, etc.) remain out of
scope — a unicode-normalization pass at the API boundary is the
right place for that if it ever becomes a real concern. Documented
in the migration comment.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81e7ee38cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/migrations/000051_create_permission_tables.down.sql Outdated
…r 000051 down

Round-4 PR review caught two related bugs in my last commit:

**000051 up — CHECK rejected legacy rows on upgrade (Codex HIGH / P0 thread):**
Migration 000002 seeded a global ADMIN row, and onboarding may have
created a SUPER_ADMIN row, both with is_builtin=FALSE and
organization_id NULL. Adding chk_role_custom_name_not_reserved was
making Postgres validate against those rows immediately and aborting
the migration before 000052 could repoint and soft-delete them. New
deployments would also fail since 000002 always seeds ADMIN. Exempt
legacy rows (organization_id IS NULL) from the CHECK — the exemption
is safe because CreateCustomRole always sets organization_id, so an
application caller cannot exploit it.

**000051 down — order error + over-broad role delete (Codex MEDIUM / P1 thread):**
The previous version of the down file dropped role_permission at the
top, then later ran DELETE FROM role_permission — relation does not
exist. It also DELETE'd every is_builtin=FALSE role, which on a real
rollback would include the legacy ADMIN/SUPER_ADMIN rows that 000052's
own down just restored, leaving user_organization.role_id dangling.

Rewrote with explicit steps:

  Step 1: custom cleanup (only org-scoped customs; legacy globals
          preserved so they can satisfy 000052's restored
          user_organization references).
  Step 2: drop new tables and assignment indexes.
  Step 3: drop role-table columns/constraints/indexes.
  Step 4: restore uq_role_name UNIQUE(name).

Behavior unchanged for fresh installs (no customs exist); now safe
under rollback after PR 3 ships CreateCustomRole.
@mcharles-square
Copy link
Copy Markdown
Collaborator Author

Codex Security Review (round 4) — response

Both findings addressed in 14cde48. They cover the same two bugs the inline threads flagged.

HIGH — RBAC migration rejects existing ADMIN/SUPER_ADMIN rows

000051 adds chk_role_custom_name_not_reserved while existing legacy rows still have is_builtin = FALSE and names like ADMIN … PostgreSQL validates the new CHECK against existing rows immediately, so the migration fails before 000052 can repoint and soft-delete those legacy rows.

Fixed: the CHECK now exempts organization_id IS NULL. Legacy global rows pass; CreateCustomRole (which always sets organization_id) cannot exploit the exemption.

MEDIUM — Down migration cannot roll back cleanly

The down migration drops role_permission at line 6, then later tries to DELETE FROM role_permission, which will fail with "relation does not exist." It also deletes every is_builtin = FALSE role, which includes the legacy ADMIN/SUPER_ADMIN rows that 000052.down just restores

Fixed: rewrote the down migration into four explicit steps — custom cleanup first (while the table and the is_builtin/organization_id columns still exist), then table drops, then role-column drops, then uq_role_name restored. The custom-role delete is restricted to organization_id IS NOT NULL so the legacy globals 000052 restored stay in place and user_organization.role_id does not dangle.

Build + lint clean.

main shipped a new 000051_bound_curtailment_operational_controls
migration. Bumping the RBAC migrations to avoid a number collision:

  000051_create_permission_tables          → 000052
  000052_seed_builtin_roles                → 000053
  000053_backfill_user_organization...     → 000054

Inline references in the migration files and in the authz Go package
comments updated to match. No behavior change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client documentation Improvements or additions to documentation javascript Pull requests that update javascript code server shared

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants