From 0e0575da54a10e61dcc046cc4da17116d015d587 Mon Sep 17 00:00:00 2001 From: Oleksiy Pylypenko Date: Mon, 13 Apr 2026 11:02:24 +0200 Subject: [PATCH 1/5] 017, 018: AWS KMS credentials config restructure & EKS auth 017 - restructure AWS KMS config to group credential providers under a `credentials` node, with backward compat for existing flat fields 018 - add IRSA (AssumeRoleWithWebIdentity) and EKS Pod Identity credential providers, with shared async refresh base class References: kroxylicious/kroxylicious#1295, kroxylicious/kroxylicious#3684 Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Oleksiy Pylypenko --- ...-aws-kms-credentials-config-restructure.md | 162 +++++++++++++++++ proposals/018-aws-kms-eks-authentication.md | 168 ++++++++++++++++++ 2 files changed, 330 insertions(+) create mode 100644 proposals/017-aws-kms-credentials-config-restructure.md create mode 100644 proposals/018-aws-kms-eks-authentication.md diff --git a/proposals/017-aws-kms-credentials-config-restructure.md b/proposals/017-aws-kms-credentials-config-restructure.md new file mode 100644 index 0000000..bf08465 --- /dev/null +++ b/proposals/017-aws-kms-credentials-config-restructure.md @@ -0,0 +1,162 @@ +# 017 - AWS KMS: credentials configuration restructure + + +* [017 - AWS KMS: credentials configuration restructure](#017---aws-kms-credentials-configuration-restructure) + * [Current situation](#current-situation) + * [Motivation](#motivation) + * [Proposal](#proposal) + * [New `credentials` configuration node](#new-credentials-configuration-node) + * [Backward compatibility for existing providers](#backward-compatibility-for-existing-providers) + * [Implementation](#implementation) + * [Affected/not affected projects](#affectednot-affected-projects) + * [Compatibility](#compatibility) + * [Rejected alternatives](#rejected-alternatives) + + +## Current situation + +The AWS KMS provider for the Record Encryption filter authenticates to AWS using one of two mechanisms, each configured as a separate top-level field on the `Config` record: + +```yaml +kmsConfig: + endpointUrl: https://kms.us-east-1.amazonaws.com + longTermCredentials: # option A + accessKeyId: + password: AKIA... + secretAccessKey: + password: wJalr... + region: us-east-1 +``` + +```yaml +kmsConfig: + endpointUrl: https://kms.us-east-1.amazonaws.com + ec2MetadataCredentials: # option B + iamRole: KroxyliciousInstance + region: us-east-1 +``` + +Both fields are nullable on the `Config` record. `CredentialsProviderFactory` enforces at runtime that exactly one is non-null, rejecting configurations that specify zero or both. + +This structure works for two providers, but additional provider types are planned (see proposal 018). Each new type would add another nullable top-level field and another branch in the mutual-exclusivity check. + +## Motivation + +1. **Scalability** — with four or more providers the flat-field approach results in a growing number of nullable fields on `Config` and an increasingly complex exclusivity check in the factory. +2. **Clarity** — a single `credentials` key in the YAML makes it self-evident that exactly one authentication mechanism is configured, rather than requiring the user to know that `longTermCredentials` and `ec2MetadataCredentials` are mutually exclusive siblings. +3. **Extensibility** — future credential providers (IRSA, Pod Identity, SSO, STS AssumeRole) can be added by inserting one field into a dedicated record, with no changes to the top-level `Config` shape. + +## Proposal + +### New `credentials` configuration node + +All credential providers are grouped under a single `credentials` key: + +```yaml +kmsConfig: + endpointUrl: https://kms.us-east-1.amazonaws.com + credentials: + longTerm: + accessKeyId: + password: AKIA... + secretAccessKey: + password: wJalr... + region: us-east-1 +``` + +```yaml +kmsConfig: + endpointUrl: https://kms.us-east-1.amazonaws.com + credentials: + ec2Metadata: + iamRole: KroxyliciousInstance + region: us-east-1 +``` + +The `credentials` object is a new record with one nullable field per provider. The `CredentialsProviderFactory` enforces that exactly one field is non-null: + +```java +public record CredentialsConfig( + @JsonProperty("longTerm") @Nullable LongTermCredentialsProviderConfig longTerm, + @JsonProperty("ec2Metadata") @Nullable Ec2MetadataCredentialsProviderConfig ec2Metadata) {} +``` + +Future providers (proposal 018) will add fields to this record. + +### Backward compatibility for existing providers + +The existing flat-field configuration keys (`longTermCredentials`, `ec2MetadataCredentials`) are preserved as deprecated fields on `Config`. The record's compact constructor transparently migrates them into the `credentials` node: + +```yaml +# Old style — still works, deprecated +kmsConfig: + endpointUrl: https://kms.us-east-1.amazonaws.com + longTermCredentials: + accessKeyId: + password: AKIA... + secretAccessKey: + password: wJalr... + region: us-east-1 +``` + +Specifying both a deprecated flat field **and** the `credentials` node simultaneously is a configuration error. + +### Implementation + +The `Config` record gains a `credentials` field and retains the two existing flat fields for backward compatibility: + +```java +public record Config( + @JsonProperty(value = "endpointUrl", required = true) URI endpointUrl, + @JsonProperty(value = "longTermCredentials") @Nullable LongTermCredentialsProviderConfig longTermCredentialsProviderConfig, + @JsonProperty(value = "ec2MetadataCredentials") @Nullable Ec2MetadataCredentialsProviderConfig ec2MetadataCredentialsProviderConfig, + @JsonProperty(value = "credentials") @Nullable CredentialsConfig credentials, + @JsonProperty(value = "region", required = true) String region, + @JsonProperty(value = "tls") @Nullable Tls tls) { + + public Config { + Objects.requireNonNull(endpointUrl); + Objects.requireNonNull(region); + // Migrate deprecated flat fields into credentials node + var migrated = credentials; + if (longTermCredentialsProviderConfig != null) { + if (migrated != null) throw ...; + migrated = new CredentialsConfig(longTermCredentialsProviderConfig, null); + } + if (ec2MetadataCredentialsProviderConfig != null) { + if (migrated != null) throw ...; + migrated = new CredentialsConfig(null, ec2MetadataCredentialsProviderConfig); + } + credentials = migrated; + } +} +``` + +The `CredentialsProviderFactory` is simplified to dispatch solely from `config.credentials()`. + +## Affected/not affected projects + +**Affected:** +- `kroxylicious-kms-provider-aws-kms` — new `CredentialsConfig` record, modified `Config`, modified `CredentialsProviderFactory` +- `kroxylicious-kms-provider-aws-kms-test-support` — `Config` constructor update +- `kroxylicious-systemtests` — `Config` constructor update +- `kroxylicious-docs` — updated configuration snippets, deprecation notes + +**Not affected:** +- Other KMS providers (HashiCorp Vault, Azure Key Vault, Fortanix DSM, in-memory) +- The Kroxylicious operator +- Proxy runtime core + +## Compatibility + +- **Backward compatible:** existing YAML with `longTermCredentials` or `ec2MetadataCredentials` at the top level continues to work unchanged. +- **Deprecation:** the flat-field style will emit a log warning encouraging migration to the `credentials` node. Removal of the deprecated fields is deferred to a future major version. +- **Forward compatible:** adding a new credential provider requires only a new field on `CredentialsConfig` — no changes to the top-level `Config` shape. + +## Rejected alternatives + +1. **Jackson polymorphic deserialization (`@JsonTypeInfo`)** — using a type discriminator inside `credentials` (e.g. `type: webIdentity`) would enforce single-provider semantics at the deserialization layer. Rejected because it requires a discriminator property that differs from the key-per-provider style used elsewhere in Kroxylicious (Azure KMS `entraIdentity`, Fortanix `apiKeySession`) and makes the YAML less intuitive. + +2. **Break existing config without backward compat** — since the project is pre-1.0, breaking changes are nominally acceptable. Rejected because existing users have working configurations and migration should be painless. The cost of the backward-compat shim (a few lines in the compact constructor) is negligible compared to the user friction of a forced migration. + +3. **Keep flat fields, just add more** — rejected because it scales poorly. With four providers, the `Config` record would have four nullable credential fields interspersed with `endpointUrl`, `region`, and `tls`, making the YAML hard to read and the factory logic fragile. diff --git a/proposals/018-aws-kms-eks-authentication.md b/proposals/018-aws-kms-eks-authentication.md new file mode 100644 index 0000000..cbe871d --- /dev/null +++ b/proposals/018-aws-kms-eks-authentication.md @@ -0,0 +1,168 @@ +# 018 - AWS KMS: EKS workload authentication (IRSA & Pod Identity) + + +* [018 - AWS KMS: EKS workload authentication (IRSA & Pod Identity)](#018---aws-kms-eks-workload-authentication-irsa--pod-identity) + * [Current situation](#current-situation) + * [Motivation](#motivation) + * [Proposal](#proposal) + * [IRSA (AssumeRoleWithWebIdentity)](#irsa-assumerolewithwebidentity) + * [Request flow](#request-flow) + * [Configuration](#configuration) + * [EKS Pod Identity](#eks-pod-identity) + * [Request flow](#request-flow-1) + * [Configuration](#configuration-1) + * [Shared refresh infrastructure](#shared-refresh-infrastructure) + * [Environment variable defaults](#environment-variable-defaults) + * [Affected/not affected projects](#affectednot-affected-projects) + * [Compatibility](#compatibility) + * [Rejected alternatives](#rejected-alternatives) + + +## Current situation + +The AWS KMS provider for the Record Encryption filter supports two authentication mechanisms: long-term IAM credentials and EC2 instance metadata. Per proposal 017, these are being restructured under a `credentials` configuration node. + +The existing `Credentials` interface already supports an optional `securityToken()` for temporary credentials, and `AwsV4SigningHttpRequestBuilder` adds the `X-Amz-Security-Token` header when present — so the signing layer is already ready for temporary credentials from STS. + +The implementation is AWS-SDK-free: all HTTP calls use `java.net.http.HttpClient` with a custom SigV4 implementation. + +Neither existing mechanism works for pods on **Amazon EKS**: long-term keys are a security anti-pattern, and EC2 IMDS is only reachable from EC2 instances. + +## Motivation + +Amazon EKS is the primary deployment target for Kroxylicious on AWS. AWS supports two pod-level authentication mechanisms: + +1. **IAM Roles for Service Accounts (IRSA)** — a projected OIDC token is exchanged for temporary credentials via STS `AssumeRoleWithWebIdentity`. This requires an OIDC trust-policy on the IAM role and an annotated Kubernetes service account. + +2. **EKS Pod Identity** — the AWS-recommended successor to IRSA. An in-cluster Pod Identity Agent (link-local HTTP endpoint) exchanges a projected service-account token for temporary credentials. No OIDC trust-policy boilerplate is required — the binding is established via an EKS pod-identity association. + +Both return the same credential triple (access key, secret key, session token) that the existing `Credentials` interface supports. + +This proposal also extracts the async credential refresh machinery from `Ec2MetadataCredentialsProvider` into a shared base class, so the three refreshing providers (EC2, IRSA, Pod Identity) share the same battle-tested code. + +## Proposal + +### IRSA (AssumeRoleWithWebIdentity) + +#### Request flow + +1. Read the projected JWT from `webIdentityTokenFile` — **fresh on every refresh**, since kubelet rotates this file roughly hourly. +2. Build an **unsigned** `POST` to the regional STS endpoint: + ``` + Action=AssumeRoleWithWebIdentity&Version=2011-06-15 + &RoleArn=&RoleSessionName= + &WebIdentityToken= + ``` + `AssumeRoleWithWebIdentity` is unsigned by design — the JWT is the credential. +3. Request `Accept: application/json` so STS replies with JSON (no XML parser needed). +4. Parse `AssumeRoleWithWebIdentityResponse.AssumeRoleWithWebIdentityResult.Credentials` into a record implementing `Credentials`. +5. On HTTP 4xx, surface the STS error code (`InvalidIdentityToken`, `AccessDenied`, `ExpiredTokenException`) in the exception message. + +#### Configuration + +Registered under `credentials.webIdentity` (per proposal 017): + +```yaml +credentials: + webIdentity: + roleArn: arn:aws:iam::123456789012:role/KroxyliciousIRSA + webIdentityTokenFile: /var/run/secrets/eks.amazonaws.com/serviceaccount/token + roleSessionName: my-session # optional + stsEndpointUrl: https://sts.us-east-1.amazonaws.com # optional, derived from region + stsRegion: us-east-1 # optional, falls back to Config.region + durationSeconds: 3600 # optional + credentialLifetimeFactor: 0.8 # optional +``` + +On a properly-annotated EKS pod the webhook injects `AWS_ROLE_ARN` and `AWS_WEB_IDENTITY_TOKEN_FILE`, so the minimal configuration is: + +```yaml +credentials: + webIdentity: {} +``` + +### EKS Pod Identity + +#### Request flow + +1. Read the projected token from `authorizationTokenFile` — **fresh on every refresh**. +2. `GET credentialsFullUri` with header `Authorization: ` (no `Bearer` prefix — this is the documented Pod Identity Agent contract). +3. Parse the JSON response (`AccessKeyId`, `SecretAccessKey`, `Token`, `Expiration`) into a record implementing `Credentials`. +4. Validate the URI scheme (`http`/`https` only) at construction time. + +#### Configuration + +Registered under `credentials.podIdentity` (per proposal 017): + +```yaml +credentials: + podIdentity: + credentialsFullUri: http://169.254.170.23/v1/credentials + authorizationTokenFile: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token + credentialLifetimeFactor: 0.8 # optional +``` + +On a properly-associated EKS pod the agent injects `AWS_CONTAINER_CREDENTIALS_FULL_URI` and `AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE`, so the minimal configuration is: + +```yaml +credentials: + podIdentity: {} +``` + +### Shared refresh infrastructure + +The async refresh / expiry / exponential-backoff state machine embedded in `Ec2MetadataCredentialsProvider` is extracted into a package-private `AbstractRefreshingCredentialsProvider`. + +The base class owns: +- `AtomicReference>` credential state machine +- Single-thread `ScheduledExecutorService` for background refresh +- `ExponentialBackoff(500ms, factor 2, cap 60s, jitter)` for retries +- Preemptive refresh at configurable `lifetimeFactor` (default 0.8) +- `close()` to shut down the executor + +Subclasses implement: +- `fetchCredentials()` — provider-specific HTTP call +- `expirationOf(C)` — extract expiration instant +- `onRefreshFailure(Throwable)` / `onRefreshSuccess(C)` — optional log hooks + +`Ec2MetadataCredentialsProvider` becomes a subclass with identical external behaviour. + +### Environment variable defaults + +| Provider | Config field | Env var fallback | +|----------|-------------|-----------------| +| IRSA | `roleArn` | `AWS_ROLE_ARN` | +| IRSA | `webIdentityTokenFile` | `AWS_WEB_IDENTITY_TOKEN_FILE` | +| IRSA | `roleSessionName` | `AWS_ROLE_SESSION_NAME` | +| IRSA | `stsRegion` | `AWS_REGION` | +| Pod Identity | `credentialsFullUri` | `AWS_CONTAINER_CREDENTIALS_FULL_URI` | +| Pod Identity | `authorizationTokenFile` | `AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE` | + +The env-var lookup uses a `Function` seam in the provider constructor so tests can inject a fake environment. + +## Affected/not affected projects + +**Affected:** +- `kroxylicious-kms-provider-aws-kms` — new provider classes, refactored `Ec2MetadataCredentialsProvider`, new `AbstractRefreshingCredentialsProvider`, updated `CredentialsConfig` (per 017) +- `kroxylicious-docs` — new AsciiDoc procedures and configuration snippets + +**Not affected:** +- Other KMS providers +- The Kroxylicious operator +- Proxy runtime core + +## Compatibility + +- **Additive:** both new providers are new configuration options under the `credentials` node (proposal 017). No existing configuration changes. +- **No breaking changes:** the refactoring of `Ec2MetadataCredentialsProvider` into a subclass of `AbstractRefreshingCredentialsProvider` preserves its external behaviour, including log messages. +- **Testing:** no real EKS cluster is available in CI. The providers are unit-tested against WireMock-stubbed STS / Pod Identity Agent endpoints. End-to-end validation on a real EKS cluster is a manual pre-merge step. + +## Rejected alternatives + +1. **Use the AWS SDK** — The SDK brings transitive dependencies on Netty, Project Reactor, and its own Jackson modules. The existing provider is deliberately SDK-free. Both STS `AssumeRoleWithWebIdentity` (one unsigned POST) and the Pod Identity Agent protocol (one authenticated GET) are simple enough to implement directly. + +2. **Default credential chain** — The AWS SDK implements an implicit chain that tries sources in order. Rejected in favour of explicit configuration: users choose exactly one provider in YAML. An implicit chain makes misconfiguration harder to diagnose and contradicts the explicit-config pattern used by other KMS providers (Azure `entraIdentity`, Fortanix `apiKeySession`). + +3. **Support only IRSA, defer Pod Identity** — Pod Identity is the AWS-recommended successor to IRSA. The shared base class makes the Pod Identity implementation small (~150 lines of provider-specific code), so deferring adds process overhead without meaningful risk reduction. + +4. **Duplicate the refresh state machine** — copying the async refresh / backoff code into each new provider avoids touching the existing `Ec2MetadataCredentialsProvider`. Rejected because the duplication (~200 lines per provider) increases maintenance burden and divergence risk. The extracted base class is tested via the existing `Ec2MetadataCredentialsProviderTest` which must continue to pass unchanged. From d711ee554cb601b72678366f19e3c265d5062c49 Mon Sep 17 00:00:00 2001 From: Oleksiy Pylypenko Date: Tue, 14 Apr 2026 10:24:39 +0200 Subject: [PATCH 2/5] =?UTF-8?q?018:=20address=20review=20feedback=20?= =?UTF-8?q?=E2=80=94=20config=20tables,=20descriptions,=20env=20var=20just?= =?UTF-8?q?ification?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add comprehensive config options tables for IRSA and Pod Identity with field descriptions, defaults, and env var fallbacks - Add fail-fast documentation (KmsException at construction time) - Add paragraph justifying env var defaults (platform-injected values, separation of concerns, AWS SDK convention) - Document durationSeconds range (900-43200), roleSessionName default and regex, stsEndpointUrl derivation, credentialLifetimeFactor behavior Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Oleksiy Pylypenko --- proposals/018-aws-kms-eks-authentication.md | 35 +++++++++++++++------ 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/proposals/018-aws-kms-eks-authentication.md b/proposals/018-aws-kms-eks-authentication.md index cbe871d..b7fd38a 100644 --- a/proposals/018-aws-kms-eks-authentication.md +++ b/proposals/018-aws-kms-eks-authentication.md @@ -74,6 +74,18 @@ credentials: credentialLifetimeFactor: 0.8 # optional ``` +| Field | Description | Required? | Default | Env var fallback | +|-------|-------------|-----------|---------|-----------------| +| `roleArn` | ARN of the IAM role to assume via STS. | No | — | `AWS_ROLE_ARN` | +| `webIdentityTokenFile` | Path to the projected service-account OIDC token file. Read fresh on every credential refresh (kubelet rotates it roughly hourly). | No | — | `AWS_WEB_IDENTITY_TOKEN_FILE` | +| `roleSessionName` | Identifier for the assumed-role session, visible in AWS CloudTrail. Must match `[\w+=,.@-]{2,64}`. | No | `kroxylicious-` (generated at construction time) | `AWS_ROLE_SESSION_NAME` | +| `stsEndpointUrl` | STS endpoint URL for the `AssumeRoleWithWebIdentity` call. Override for non-standard partitions (GovCloud, China). | No | `https://sts..amazonaws.com` | — | +| `stsRegion` | AWS region used to derive the STS endpoint URL. | No | Value of `Config.region` | `AWS_REGION` | +| `durationSeconds` | Requested duration of the assumed-role session, in seconds. Valid range: [900, 43200](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRoleWithWebIdentity.html). When absent the field is omitted from the STS request and STS applies the role's configured maximum session duration. | No | Omitted (STS default) | — | +| `credentialLifetimeFactor` | Controls preemptive refresh: the credential is refreshed in the background once it reaches this fraction of its total lifetime. For example, 0.8 means the credential is refreshed at 80% of its lifetime. This behaviour is shared with the existing EC2 metadata provider via the `AbstractRefreshingCredentialsProvider` base class. | No | `0.8` | — | + +The provider **fails fast** at construction time with a `KmsException` if `roleArn` and `webIdentityTokenFile` cannot be resolved from either YAML configuration or their respective environment variables. + On a properly-annotated EKS pod the webhook injects `AWS_ROLE_ARN` and `AWS_WEB_IDENTITY_TOKEN_FILE`, so the minimal configuration is: ```yaml @@ -102,6 +114,14 @@ credentials: credentialLifetimeFactor: 0.8 # optional ``` +| Field | Description | Required? | Default | Env var fallback | +|-------|-------------|-----------|---------|-----------------| +| `credentialsFullUri` | URL of the Pod Identity Agent credentials endpoint. Must use `http` or `https` scheme (validated at construction time). | No | — | `AWS_CONTAINER_CREDENTIALS_FULL_URI` | +| `authorizationTokenFile` | Path to the projected service-account token file used as the `Authorization` header when calling the agent. Read fresh on every credential refresh. | No | — | `AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE` | +| `credentialLifetimeFactor` | Controls preemptive refresh: the credential is refreshed in the background once it reaches this fraction of its total lifetime. For example, 0.8 means the credential is refreshed at 80% of its lifetime. This behaviour is shared with the existing EC2 metadata provider via the `AbstractRefreshingCredentialsProvider` base class. | No | `0.8` | — | + +The provider **fails fast** at construction time with a `KmsException` if `credentialsFullUri` and `authorizationTokenFile` cannot be resolved from either YAML configuration or their respective environment variables. + On a properly-associated EKS pod the agent injects `AWS_CONTAINER_CREDENTIALS_FULL_URI` and `AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE`, so the minimal configuration is: ```yaml @@ -129,16 +149,13 @@ Subclasses implement: ### Environment variable defaults -| Provider | Config field | Env var fallback | -|----------|-------------|-----------------| -| IRSA | `roleArn` | `AWS_ROLE_ARN` | -| IRSA | `webIdentityTokenFile` | `AWS_WEB_IDENTITY_TOKEN_FILE` | -| IRSA | `roleSessionName` | `AWS_ROLE_SESSION_NAME` | -| IRSA | `stsRegion` | `AWS_REGION` | -| Pod Identity | `credentialsFullUri` | `AWS_CONTAINER_CREDENTIALS_FULL_URI` | -| Pod Identity | `authorizationTokenFile` | `AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE` | +Several configuration fields fall back to standard AWS environment variables when not set in YAML. The Kroxylicious proxy generally prefers explicit YAML configuration where all settings are visible in a single file. However, environment variable defaults are justified here for the following reasons: + +1. **Platform-injected values** — the EKS IRSA webhook and the Pod Identity agent webhook automatically inject these environment variables into every pod that uses an annotated service account. The file paths and URIs they inject are not published at well-known stable addresses (unlike the EC2 metadata endpoint at `169.254.169.254`), so the environment is the canonical way the platform communicates them. +2. **Separation of concerns** — requiring these values to be duplicated in the Kroxylicious YAML leaks AWS infrastructure configuration (IAM role ARNs, projected token paths) into Kubernetes manifests. This breaks the clean separation between infrastructure-level setup (IAM roles, EKS webhook annotations, pod-identity associations — managed by platform admins) and application-level configuration (the Kroxylicious YAML — managed by application teams). +3. **AWS SDK convention** — the [AWS SDK](https://docs.aws.amazon.com/sdkref/latest/guide/feature-container-credentials.html) itself discovers these values from the environment rather than expecting application-level configuration, so using env-var defaults follows the established AWS convention that EKS users are already familiar with. -The env-var lookup uses a `Function` seam in the provider constructor so tests can inject a fake environment. +All env-var-backed fields can still be overridden explicitly in YAML when the operator wants full control. ## Affected/not affected projects From 0558bd8d1c7d5bdfdb69c0f743183fc1426a2a39 Mon Sep 17 00:00:00 2001 From: Oleksiy Pylypenko Date: Wed, 15 Apr 2026 11:36:27 +0200 Subject: [PATCH 3/5] 018: drop stsRegion field, derive STS endpoint from Config.region Since Config.region is already required in YAML, there is no need for a separate stsRegion field or AWS_REGION env var fallback. The STS endpoint is derived as https://sts..amazonaws.com by default. stsEndpointUrl remains as an optional override for non-standard partitions (China: amazonaws.com.cn, ISO: c2s.ic.gov). Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Oleksiy Pylypenko --- proposals/018-aws-kms-eks-authentication.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/proposals/018-aws-kms-eks-authentication.md b/proposals/018-aws-kms-eks-authentication.md index b7fd38a..46c91ac 100644 --- a/proposals/018-aws-kms-eks-authentication.md +++ b/proposals/018-aws-kms-eks-authentication.md @@ -68,8 +68,7 @@ credentials: roleArn: arn:aws:iam::123456789012:role/KroxyliciousIRSA webIdentityTokenFile: /var/run/secrets/eks.amazonaws.com/serviceaccount/token roleSessionName: my-session # optional - stsEndpointUrl: https://sts.us-east-1.amazonaws.com # optional, derived from region - stsRegion: us-east-1 # optional, falls back to Config.region + stsEndpointUrl: https://sts.us-east-1.amazonaws.com # optional, derived from Config.region durationSeconds: 3600 # optional credentialLifetimeFactor: 0.8 # optional ``` @@ -79,8 +78,7 @@ credentials: | `roleArn` | ARN of the IAM role to assume via STS. | No | — | `AWS_ROLE_ARN` | | `webIdentityTokenFile` | Path to the projected service-account OIDC token file. Read fresh on every credential refresh (kubelet rotates it roughly hourly). | No | — | `AWS_WEB_IDENTITY_TOKEN_FILE` | | `roleSessionName` | Identifier for the assumed-role session, visible in AWS CloudTrail. Must match `[\w+=,.@-]{2,64}`. | No | `kroxylicious-` (generated at construction time) | `AWS_ROLE_SESSION_NAME` | -| `stsEndpointUrl` | STS endpoint URL for the `AssumeRoleWithWebIdentity` call. Override for non-standard partitions (GovCloud, China). | No | `https://sts..amazonaws.com` | — | -| `stsRegion` | AWS region used to derive the STS endpoint URL. | No | Value of `Config.region` | `AWS_REGION` | +| `stsEndpointUrl` | STS endpoint URL for the `AssumeRoleWithWebIdentity` call. Override for non-standard partitions where the endpoint pattern differs (e.g. China: `sts..amazonaws.com.cn`, ISO: `sts..c2s.ic.gov`). | No | `https://sts..amazonaws.com` | — | | `durationSeconds` | Requested duration of the assumed-role session, in seconds. Valid range: [900, 43200](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRoleWithWebIdentity.html). When absent the field is omitted from the STS request and STS applies the role's configured maximum session duration. | No | Omitted (STS default) | — | | `credentialLifetimeFactor` | Controls preemptive refresh: the credential is refreshed in the background once it reaches this fraction of its total lifetime. For example, 0.8 means the credential is refreshed at 80% of its lifetime. This behaviour is shared with the existing EC2 metadata provider via the `AbstractRefreshingCredentialsProvider` base class. | No | `0.8` | — | From c6c5fad5044905ce3fa71e860953feb20f20e950 Mon Sep 17 00:00:00 2001 From: Oleksiy Pylypenko Date: Fri, 17 Apr 2026 12:38:00 +0200 Subject: [PATCH 4/5] =?UTF-8?q?017:=20fix=20rejected=20alternative=20?= =?UTF-8?q?=E2=80=94=20reference=20deprecation=20policy?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per k-wall's review, the project has a deprecation policy even pre-1.0, so "nominally acceptable breaking changes" was incorrect framing. Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Oleksiy Pylypenko --- proposals/017-aws-kms-credentials-config-restructure.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/017-aws-kms-credentials-config-restructure.md b/proposals/017-aws-kms-credentials-config-restructure.md index bf08465..9c80e52 100644 --- a/proposals/017-aws-kms-credentials-config-restructure.md +++ b/proposals/017-aws-kms-credentials-config-restructure.md @@ -157,6 +157,6 @@ The `CredentialsProviderFactory` is simplified to dispatch solely from `config.c 1. **Jackson polymorphic deserialization (`@JsonTypeInfo`)** — using a type discriminator inside `credentials` (e.g. `type: webIdentity`) would enforce single-provider semantics at the deserialization layer. Rejected because it requires a discriminator property that differs from the key-per-provider style used elsewhere in Kroxylicious (Azure KMS `entraIdentity`, Fortanix `apiKeySession`) and makes the YAML less intuitive. -2. **Break existing config without backward compat** — since the project is pre-1.0, breaking changes are nominally acceptable. Rejected because existing users have working configurations and migration should be painless. The cost of the backward-compat shim (a few lines in the compact constructor) is negligible compared to the user friction of a forced migration. +2. **Break existing config without backward compat** — rejected because the project has a [deprecation policy](https://github.com/kroxylicious/kroxylicious/blob/main/DEV_GUIDE.md#deprecation-policy) that requires maintaining backward compatibility even pre-1.0. The cost of the backward-compat shim (a few lines in the compact constructor) is negligible compared to the user friction of a forced migration. 3. **Keep flat fields, just add more** — rejected because it scales poorly. With four providers, the `Config` record would have four nullable credential fields interspersed with `endpointUrl`, `region`, and `tls`, making the YAML hard to read and the factory logic fragile. From 83ee98d5335fda63f7d282fa38db4246a2f8f598 Mon Sep 17 00:00:00 2001 From: Oleksiy Pylypenko Date: Mon, 20 Apr 2026 09:30:10 +0200 Subject: [PATCH 5/5] 018: address tombentley review feedback - Acknowledge Accept: application/json is undocumented in STS API - Add IAM permissions note for assumed role - Add durationSeconds effective maximum (role MaxSessionDuration) - Add credentialLifetimeFactor valid range (0, 1) - Extend fail-fast to validate roleSessionName regex and durationSeconds - Add Pod Identity trust boundary note (link-local HTTP, same as IMDS) Co-Authored-By: Claude Opus 4.6 (1M context) Signed-off-by: Oleksiy Pylypenko --- proposals/018-aws-kms-eks-authentication.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/proposals/018-aws-kms-eks-authentication.md b/proposals/018-aws-kms-eks-authentication.md index 46c91ac..b4d3df3 100644 --- a/proposals/018-aws-kms-eks-authentication.md +++ b/proposals/018-aws-kms-eks-authentication.md @@ -32,7 +32,7 @@ Neither existing mechanism works for pods on **Amazon EKS**: long-term keys are Amazon EKS is the primary deployment target for Kroxylicious on AWS. AWS supports two pod-level authentication mechanisms: -1. **IAM Roles for Service Accounts (IRSA)** — a projected OIDC token is exchanged for temporary credentials via STS `AssumeRoleWithWebIdentity`. This requires an OIDC trust-policy on the IAM role and an annotated Kubernetes service account. +1. **IAM Roles for Service Accounts (IRSA)** — a projected OIDC token is exchanged for temporary credentials via STS `AssumeRoleWithWebIdentity`. This requires an OIDC trust-policy on the IAM role and an annotated Kubernetes service account. The assumed role must have permissions to perform KMS operations on the KEKs (at minimum `kms:Encrypt`, `kms:Decrypt`, `kms:GenerateDataKey*`, `kms:DescribeKey`) as described in the alias-based policy setup. 2. **EKS Pod Identity** — the AWS-recommended successor to IRSA. An in-cluster Pod Identity Agent (link-local HTTP endpoint) exchanges a projected service-account token for temporary credentials. No OIDC trust-policy boilerplate is required — the binding is established via an EKS pod-identity association. @@ -54,7 +54,7 @@ This proposal also extracts the async credential refresh machinery from `Ec2Meta &WebIdentityToken= ``` `AssumeRoleWithWebIdentity` is unsigned by design — the JWT is the credential. -3. Request `Accept: application/json` so STS replies with JSON (no XML parser needed). +3. Request `Accept: application/json` so STS replies with JSON (no XML parser needed). This header is not documented in the STS API reference but is used by the AWS SDK v2 and works in practice. 4. Parse `AssumeRoleWithWebIdentityResponse.AssumeRoleWithWebIdentityResult.Credentials` into a record implementing `Credentials`. 5. On HTTP 4xx, surface the STS error code (`InvalidIdentityToken`, `AccessDenied`, `ExpiredTokenException`) in the exception message. @@ -79,10 +79,10 @@ credentials: | `webIdentityTokenFile` | Path to the projected service-account OIDC token file. Read fresh on every credential refresh (kubelet rotates it roughly hourly). | No | — | `AWS_WEB_IDENTITY_TOKEN_FILE` | | `roleSessionName` | Identifier for the assumed-role session, visible in AWS CloudTrail. Must match `[\w+=,.@-]{2,64}`. | No | `kroxylicious-` (generated at construction time) | `AWS_ROLE_SESSION_NAME` | | `stsEndpointUrl` | STS endpoint URL for the `AssumeRoleWithWebIdentity` call. Override for non-standard partitions where the endpoint pattern differs (e.g. China: `sts..amazonaws.com.cn`, ISO: `sts..c2s.ic.gov`). | No | `https://sts..amazonaws.com` | — | -| `durationSeconds` | Requested duration of the assumed-role session, in seconds. Valid range: [900, 43200](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRoleWithWebIdentity.html). When absent the field is omitted from the STS request and STS applies the role's configured maximum session duration. | No | Omitted (STS default) | — | -| `credentialLifetimeFactor` | Controls preemptive refresh: the credential is refreshed in the background once it reaches this fraction of its total lifetime. For example, 0.8 means the credential is refreshed at 80% of its lifetime. This behaviour is shared with the existing EC2 metadata provider via the `AbstractRefreshingCredentialsProvider` base class. | No | `0.8` | — | +| `durationSeconds` | Requested duration of the assumed-role session, in seconds. Valid range: [900, 43200](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRoleWithWebIdentity.html). The effective maximum is further capped by the IAM role's `MaxSessionDuration` setting (default 3600). Validated at construction time. When absent the field is omitted from the STS request and STS applies the role's configured maximum session duration. | No | Omitted (STS default) | — | +| `credentialLifetimeFactor` | Controls preemptive refresh: the credential is refreshed in the background once it reaches this fraction of its total lifetime. Must be in the range (0, 1). For example, 0.8 means the credential is refreshed at 80% of its lifetime. This behaviour is shared with the existing EC2 metadata provider via the `AbstractRefreshingCredentialsProvider` base class. | No | `0.8` | — | -The provider **fails fast** at construction time with a `KmsException` if `roleArn` and `webIdentityTokenFile` cannot be resolved from either YAML configuration or their respective environment variables. +The provider **fails fast** at construction time with a `KmsException` if `roleArn` and `webIdentityTokenFile` cannot be resolved from either YAML configuration or their respective environment variables. `roleSessionName` is validated against the `[\w+=,.@-]{2,64}` regex and `durationSeconds` against [900, 43200] at construction time for immediate feedback. On a properly-annotated EKS pod the webhook injects `AWS_ROLE_ARN` and `AWS_WEB_IDENTITY_TOKEN_FILE`, so the minimal configuration is: @@ -116,7 +116,9 @@ credentials: |-------|-------------|-----------|---------|-----------------| | `credentialsFullUri` | URL of the Pod Identity Agent credentials endpoint. Must use `http` or `https` scheme (validated at construction time). | No | — | `AWS_CONTAINER_CREDENTIALS_FULL_URI` | | `authorizationTokenFile` | Path to the projected service-account token file used as the `Authorization` header when calling the agent. Read fresh on every credential refresh. | No | — | `AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE` | -| `credentialLifetimeFactor` | Controls preemptive refresh: the credential is refreshed in the background once it reaches this fraction of its total lifetime. For example, 0.8 means the credential is refreshed at 80% of its lifetime. This behaviour is shared with the existing EC2 metadata provider via the `AbstractRefreshingCredentialsProvider` base class. | No | `0.8` | — | +| `credentialLifetimeFactor` | Controls preemptive refresh: the credential is refreshed in the background once it reaches this fraction of its total lifetime. Must be in the range (0, 1). For example, 0.8 means the credential is refreshed at 80% of its lifetime. This behaviour is shared with the existing EC2 metadata provider via the `AbstractRefreshingCredentialsProvider` base class. | No | `0.8` | — | + +The Pod Identity Agent listens on a link-local address (`169.254.170.23`) assigned by AWS, reachable only from the same node. Communication uses plain HTTP by design (same trust boundary as EC2 IMDS at `169.254.169.254`). The projected service-account token authenticates each pod to the agent. The provider **fails fast** at construction time with a `KmsException` if `credentialsFullUri` and `authorizationTokenFile` cannot be resolved from either YAML configuration or their respective environment variables.