Skip to content

Stage 7: shared test broker (test-broker.litentry.org) for CI + dev — parallel infra, isolated AWS resources #66

@hanwencheng

Description

@hanwencheng

Motivation

Today, exercising the full Stage 7 OIDC path (broker → JWT → STS AssumeRoleWithWebIdentity → S3 PrincipalTag-scoped read) requires each developer to:

  1. Stand up their own EC2 broker host (scripts/setup-broker-host.sh)
  2. Have AWS admin to register the OIDC provider, create the role, attach the bucket policy (docs/cloud-setup.md §4)
  3. Wire DNS + cert + nginx + systemd

That's a multi-hour bring-up and gates contributors who don't have AWS admin. CI today can't run the end-to-end §4.5 proof at all — there's no broker reachable from CI runners.

This issue proposes a shared, long-lived test broker at test-broker.litentry.org that CI and dev both point at, exercising the exact same code paths as prod against fully isolated AWS resources.

Non-goal: don't share the prod role

The naive shape — "same role, second broker" — defeats §4.4. A buggy CI test, a leaked CI bearer, or a misconfigured dev script could read or write real user data. Shared OIDC providers + shared roles = shared blast radius.

The right shape is parallel infrastructure:

Concern Prod Test
Hostname `broker.litentry.org` `test-broker.litentry.org`
OIDC provider `oidc-provider/broker.litentry.org` `oidc-provider/test-broker.litentry.org`
IAM role `agentkeys-data-role` `agentkeys-data-role-test`
S3 bucket `agentkeys-mail-${ACCT}` `agentkeys-mail-test-${ACCT}`
Backend real Heima chain (or v0.1 mock) mock-server (`auth_token: federation-proof`)
Lifetime long-lived long-lived (NOT ephemeral per CI run)

Same code, same trust-policy shape, same PrincipalTag enforcement — just isolated identifiers and data.

Why long-lived (not ephemeral per CI run)

AWS validates the OIDC issuer URL against the JWT `iss` claim byte-for-byte, and the issuer URL must be reachable over public TLS for AWS to fetch the JWKS. That means:

  • Stable DNS A record
  • Real Let's Encrypt cert (HTTP-01 challenge needs port 80 from anywhere)
  • The OIDC provider has to be pre-registered in IAM with the matching URL

None of those work for a CI-spawned ephemeral host. A long-lived EC2 (~$5–10/mo with t3.micro + EIP) is the right shape.

Proposed scope

  • Provision `test-broker.litentry.org` host (mirrors §5 of cloud-setup.md)
  • Add a parallel `agentkeys-data-role-test` + `agentkeys-mail-test-${ACCT}` bucket (mirrors §3, §4)
  • Document the bring-up in `docs/cloud-setup.md` as §4-test (or split into a new `docs/test-environment.md`)
  • Add CI workflow that exercises §4.5's end-to-end JWT → STS → S3 proof against test-broker
  • CI auth: bearer for the test mock-server stored as a GitHub Actions secret, scoped read-only to the test bucket
  • Per-PR or per-run namespacing inside the test bucket (e.g. `pr-${PR_NUMBER}/run-${RUN_ID}/...`) so concurrent CI runs don't step on each other's writes
  • Cleanup job: nightly `s3 rm` of test prefixes older than N days
  • Runbook: who owns it, how to rotate the mock-server bearer, what to do when the cert renewal fails

Open design questions

  1. Single AWS account or separate test account? Same-account is cheaper and simpler, but a separate account gives a hard blast-radius boundary. For an early-stage project, same-account with strict naming + bucket policies is probably fine. Worth a short ADR.
  2. Test broker auth model — does CI mint sessions via the existing mock-server's `federation-proof` path, or does test-broker accept a special CI bearer? The former reuses dev tooling; the latter is more honest (CI is not a dev).
  3. Should dev usage of test-broker need a personal credential, or is it open-mint? Open-mint is simpler but means anyone who knows the URL can mint test JWTs (which only grant read on the test bucket — bounded blast radius, but still).
  4. Cert renewal monitoring — Let's Encrypt certs renew every 90 days. If test-broker silently dies, CI breaks. Need an alarm.
  5. Conflict with Stage 7: complete AWS OIDC federation deployment (deferred from PR #61) #62? That issue tracks completing the prod deployment. This issue is parallel infrastructure for test, not a duplicate, but bring-up sequence matters: prod (Stage 7: complete AWS OIDC federation deployment (deferred from PR #61) #62) first, then test infra patterns drop out as a copy.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions