Add spire-agent workload API rate limiting by pod UID with OS UID fal… by terahertz5k · Pull Request #6724 · spiffe/spire

terahertz5k · 2026-03-09T18:54:23Z

…lback

Pull Request check list

Commit conforms to CONTRIBUTING.md?
Proper tests/regressions included?
Documentation updated?

Affected functionality
SPIRE agent Workload API — adds optional per-caller rate limiting for all 5 Workload API methods.

Description of change
Adds an optional ratelimit configuration block to the SPIRE agent that enforces per-caller token-bucket rate limits on Workload API methods (FetchX509SVID, FetchX509Bundles, FetchJWTSVID, FetchJWTBundles, ValidateJWTSVID). When a caller exhausts its burst, the agent returns ResourceExhausted immediately. All limits default to 0 (disabled), so existing deployments are unaffected.

Key design decisions:

Pod UID key with OS UID fallback — On Linux with Kubernetes, the rate limit key is the pod UID from the cgroup path, preventing all uid=0 pods from sharing one bucket. Falls back to OS UID on non-Kubernetes or non-Linux systems.
Non-blocking rejection (AllowN) — Unlike the server-side perIPLimiter which uses blocking WaitN, the Workload API fails fast with ResourceExhausted to avoid silently queuing goroutines during reconnection storms.
Two-generation GC — Same pattern as perIPLimiter: O(active-callers) memory, no background goroutine, automatic eviction after ~2 minutes of inactivity.
Metric-only observability — Rejections emit a workload_api.rate_limit_exceeded counter with method and key_type labels. Per-rejection logging was intentionally omitted to avoid logrus.Entry allocation pressure under heavy load.
Middleware integration — Implemented as middleware.Middleware, inserted last in the chain (after verifySecurityHeader). No changes to existing handler code.

Configuration example:

agent {
  ratelimit {
    fetch_jwt_svid    = 20
    fetch_x509_svid   = 20
  }
}

Testing:

Unit tests: token bucket mechanics, GC, pod UID resolution, OS UID fallback, metrics labels, concurrency with race detector
Integration test through full gRPC stack
Load tested in a local Kubernetes cluster with 800 concurrent workers — agent stays stable with no OOM

Which issue this PR fixes
fixes #2010

…lback Signed-off-by: Kevin Lui <kevin.lui@thetradedesk.com>

sorindumitru · 2026-03-12T18:58:31Z

Hi @terahertz5k, thanks for opening a PR for this! It's something missing from the agent that we'd definitely want improved.

We were wondering if you can give some details about the particular scenario that you ran into that this PR helped you.

sorindumitru · 2026-03-16T10:30:02Z

+// perCallerRateLimiter maintains per-caller rate limiters using a two-generation GC pattern.
+// The key is a string that may represent a pod UID or OS UID (with a prefix to avoid collisions).
+type perCallerRateLimiter struct {
+	limit int
+
+	mtx sync.RWMutex
+
+	// previous holds all the limiters that were current at the GC.
+	previous map[string]*rate.Limiter
+
+	// current holds all the limiters that have been created or moved
+	// from the previous limiters since the last GC.
+	current map[string]*rate.Limiter
+
+	// lastGC is the time of the last GC.
+	lastGC time.Time
+}


There's a lot in common between this structure and the perIPLimiter in the server. Is is possible to factor out some of the common code into pkg/common/ratelimit so that the agent and the server can share the code?

Good call.

I can extract a shared PerKeyLimiter into pkg/common/ratelimit with a Limiter interface that covers both AllowN and WaitN. The two-generation GC map logic moves there, and both the agent's perCallerRateLimiter and server's perIPLimiter become thin wrappers that just handle key extraction.

Today, the agent stores *rate.Limiter directly while the server stores its own rawRateLimiter interface (so tests can inject fakes). One shared Limiter interface replaces rawRateLimiter and works for both cases: the agent calls AllowN on it, the server calls WaitN, and the server's test fakes just need a no-op AllowN stub added.

sorindumitru · 2026-03-17T14:12:09Z

+// OS UID prefixed with "uid:".
+func (m workloadRateLimitMiddleware) resolveRateLimitKey(caller peertracker.CallerInfo) string {
+	if m.resolver != nil {
+		if podUID := m.resolver.GetPodUID(caller.PID); podUID != "" {


This operation can be expensive and will add some RPC time even in cases where spire is not running on Kubernetes but has rate limits configured. We would need to find a way to address this where we don't do unnecessary work on nodes that don't need it.

It would be good to understand the scenario you ran into that this change helped with. We can see how can change this to handle that and also not affect non-k8s users.

Agreed, the procfs reads on every RPC are unnecessary on non-k8s nodes.

Thinking that I can add two things:

Check KUBERNETES_SERVICE_HOST env var at resolver construction time. If unset, return a nil resolver so there's zero overhead on non-k8s nodes.

On k8s nodes, we could wrap the resolver with a per-PID cache (short TTL ~10s) so we only hit procfs once per PID per interval, instead of on every RPC. Not sure if this is needed or not though.

I replied separately, but we were seeing agent memory increase due to badly behaving applications hammering the socket for certain operations. This would cause k8s to kill the agent due to OOM for the entire k8s node.

terahertz5k · 2026-03-17T20:16:59Z

hi @sorindumitru ,

We were seeing badly behaving applications hammer the socket and cause memory usage on the agent to increase high enough for k8s to kill the spire-agent pod due to OOM.

We will be using this for x509 and jwt fetch.

I'll have a look at your other two comments and see what I can do there.

sorindumitru · 2026-03-17T21:51:27Z

We were seeing badly behaving applications hammer the socket and cause memory usage on the agent to increase high enough for k8s to kill the spire-agent pod due to OOM.

How were these applications misbehaving?

terahertz5k · 2026-03-17T22:41:41Z

We were seeing badly behaving applications hammer the socket and cause memory usage on the agent to increase high enough for k8s to kill the spire-agent pod due to OOM.

How were these applications misbehaving?

We have an application that spins up multiple short-lived pods simultaneously, with processes forking within containers and creating a new connection to the agent socket on every retry rather than reusing existing ones. We worked with the team to fix their retry logic and raised agent memory limits, which stabilized things.

We didn't know it was this application at first, but we were able to see on our graphs that something was making lots of jwtfetch requests at certain times of the day as well as the number of agent pod restarts.

That said, we want to add this rate limiting as a safety net. It's valid for applications to have multiple identities and make fetch requests for each, but a single caller shouldn't be able to overwhelm the agent to the point of OOM. We'd rather shed excess load gracefully than let one misbehaving workload take down the agent for everyone on the node.

…ution Extract the two-generation GC per-key rate limiter into pkg/common/ratelimit so both the agent and server share the same implementation. Add Kubernetes auto-detection to skip procfs reads on non-k8s nodes and cache pod UID lookups per PID to reduce overhead. Signed-off-by: Kevin Lui <kevin.lui@thetradedesk.com>

sorindumitru · 2026-03-20T18:16:41Z

Thanks @terahertz5k for the answers above. One issue we have with the current approach is that it's very specific to k8s (and a bit to unix). We're wondering if applying the ratelimit per workload SPIFFE ID might be a better approach. This has the benefit of being useful in non-k8s cases but also comes with some disadvantages:

The agent is spending a bit more time attesting the workloads, so you might not such a big benefit out of it.
The ratelimit could potentially apply to multiple workload replicas, which may be a good or a bad thing, but can also be tuned through the limits.

Do you think you might be able to apply the ratelimit after the attestation and see if it still works ok for the cases you ran into?

Pittu-Sharma · 2026-03-23T13:42:32Z

Hello, I would like to work on this issue.

terahertz5k · 2026-03-24T21:54:31Z

@sorindumitru , thanks for the feedback.

The issue we hit was specifically that FetchJWTSVID is unary and every call triggers a fresh attestation. Moving the rate limit to after attestation would most likely undermine the primary goal of this change, which is to protect the agent from memory exhaustion. The concurrent attestation work is what drives the OOM, I think.

The pod UID resolution is the only k8s specific piece and it's only active on Linux nodes running k8s. On all other platforms, it falls back to OS UID. It's really an enhancement for cases (that are more likely in K8s) where container images may use the same OS UID.

To validate this, I'll move the rate limit after attestation and run my load tests. I'll report back with the results.

terahertz5k · 2026-03-24T23:52:17Z

Based on my tests at 128MB and a 10 RPS rate limit, pre-attestation code survived 5 minutes of 4 pods with 100 forked workers each hammering FetchJWTSVID on a tight loop, but post-attestation code OOMKilled instantly.

Doing more tests at 256MB but with 4x300 workers, pre-attestation code peaked at 168 MB, but post-attestation code was OOMKilled. This shows there is a large delta as load increases between pre- and post- attestation rate limiting. Granted this load is pretty extreme for a single agent.

I increased memory to 512MB just to see what the actual memory peak was at 4x300 workers with 10 RPS rate limit.
Pre-Attestation: 138.4 MiB
Post-Attestation: 307.5 MiB
No rate limiting: 341.0 MiB

sorindumitru · 2026-04-14T10:26:52Z

Sorry for the long delay here @terahertz5k, I was out for a few weeks. Would it be possible to get some heap profiling snapshots of these? I'm curious where this extra time is spent. I've done this locally using the unix workload attestor and the main memory increase seems to come from gRPC message buffers/parsing:

You can configure the agent with something like the following:

agent {
    ...
    profiling_enabled = true
    profiling_freq = 10
    profiling_port = 9999
    profiling_names = [ "heap" ]
}

This will tell the agent to periodically dump heap profiling dumps to disk, in a .profiles folder. You can also point go tool pprof to the 999 port and it will read the profiling information from there. I'm curious if there's anything else taking up the memory that we could improve.

sorindumitru · 2026-04-14T16:48:17Z

Or you could try the following patch, which reduces the read buffer size from 32KB to 1KB:

diff --git a/pkg/agent/endpoints/endpoints.go b/pkg/agent/endpoints/endpoints.go
index 2cd3cedb9..5537d0497 100644
--- a/pkg/agent/endpoints/endpoints.go
+++ b/pkg/agent/endpoints/endpoints.go
@@ -20,6 +20,10 @@ import (
        "github.com/spiffe/spire/pkg/common/telemetry"
 )

+const (
+       readBufferSize = 1024
+)
+
 type Server interface {
        ListenAndServe(ctx context.Context) error
        WaitForListening(listening chan struct{})
@@ -107,6 +111,7 @@ func (e *Endpoints) ListenAndServe(ctx context.Context) error {
                grpc.Creds(peertracker.NewCredentials()),
                grpc.UnaryInterceptor(unaryInterceptor),
                grpc.StreamInterceptor(streamInterceptor),
+               grpc.ReadBufferSize(readBufferSize),
        )

        workload_pb.RegisterSpiffeWorkloadAPIServer(server, e.workloadAPIServer)

I don't know if that's the right thing to do or if this is the right size to pick, but we do usually have smaller incoming payloads, so this may be a good improvement to have.

terahertz5k · 2026-04-16T23:15:38Z

After investigating further, I found that my refactor in 4f8c01a introduced a memory regression.

This time around, I switched from unlimited RPS to a 50 RPS/pod cap this time around, since pre-attestation rejection is nearly instantaneous while post-attestation has to complete a full attestation first so unlimited RPS means the two modes see completely different request volumes and aren't really comparable.

The issue is the sync.Map cache that 4f8c01a added, keyed by PID, to try and avoid repeated procfs reads. That works fine if callers are long-lived processes, but spire-agent api fetch jwt spawns a fresh subprocess per call. Every request gets a new PID and the cache never hits. The map grows without bound.

The fix is just removing the cache. The rest of the refactor, extracting PerKeyLimiter into the shared package, the cleaner agent/server code, is fine and worth keeping.

Numbers (120s sustained, 50 RPS/pod across 4 pods, 10 RPS rate limit, 512 MiB):

No RL: 74 MiB (baseline)
Pre-RL 01b606f (original): 47 MiB (-37% vs no RL)
Pre-RL 4f8c01a (as committed): 264 MiB (+257% vs no RL)
Pre-RL 4f8c01a (cache removed): 46 MiB (-38% vs no RL)
Post-RL: 51 MiB (-31% vs no RL)

Pre-attestation RL with the cache removed and post-attestation RL land at nearly the same place. Happy to go either direction.

terahertz5k · 2026-04-16T23:17:24Z

After the cache removal fix for pre-attestation RL, I ran comparison tests on your 1k read buffer patch. Numbers are averaged across the 120s window rather than peak:

No RL: 58 MiB without, 55 MiB with (-5%)
Pre-RL (cache removed): 47 MiB without, 43 MiB with (-8%)
Post-RL: 56 MiB without, 50 MiB with (-10%)

There's a small but consistent reduction in sustained memory across all configs, most visible on post-RL at -10%.

sorindumitru · 2026-04-17T06:54:36Z

Thanks @terahertz5k. Doing the rate limit post attestation with the key being the SPIFFE ID sounds like the best way forward. We get most of the benefits and we also end up not being k8s specific.

Out of curiosity, I see that the memory numbers you mentioned now are smaller. Was this a run with fewer workloads?

terahertz5k · 2026-04-17T17:44:57Z

The number of workloads was the same, but the old tests were at unlimited RPS. Pre-attestation rejection would be nearly instantaneous compared to doing full attestation. That means pre-RL ends up seeing far more total requests than the other configs, making it an unequal comparison. Capping at 50 RPS/pod keeps the request volume equal across all three configs. It also takes out some of the variance of my cpu being totally maxed out and probably thermal throttling.

I'll work on the post-attestation code and commit that soon.

Did you want me to include the read buffer patch? Or is that something you're working on elsewhere?

Signed-off-by: Kevin Lui <kevin.lui@thetradedesk.com>

Add spire-agent workload API rate limiting by pod UID with OS UID fal…

01b606f

…lback Signed-off-by: Kevin Lui <kevin.lui@thetradedesk.com>

terahertz5k requested review from MarcosDY, amartinezfayo, evan2645, rturner3 and sorindumitru as code owners March 9, 2026 18:54

sorindumitru reviewed Mar 17, 2026

View reviewed changes

sorindumitru mentioned this pull request Mar 24, 2026

Implement Agent Workload API Rate Limiting and fix rootless image builds #6783

Closed

3 tasks

Rate limit workload API by SPIFFE ID after attestation

737a409

Signed-off-by: Kevin Lui <kevin.lui@thetradedesk.com>

Conversation

terahertz5k commented Mar 9, 2026

Uh oh!

sorindumitru commented Mar 12, 2026

Uh oh!

sorindumitru Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

terahertz5k Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

sorindumitru Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

terahertz5k Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

terahertz5k commented Mar 17, 2026

Uh oh!

sorindumitru commented Mar 17, 2026

Uh oh!

terahertz5k commented Mar 17, 2026

Uh oh!

sorindumitru commented Mar 20, 2026

Uh oh!

Pittu-Sharma commented Mar 23, 2026

Uh oh!

terahertz5k commented Mar 24, 2026

Uh oh!

terahertz5k commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sorindumitru commented Apr 14, 2026

Uh oh!

sorindumitru commented Apr 14, 2026

Uh oh!

terahertz5k commented Apr 16, 2026

Uh oh!

terahertz5k commented Apr 16, 2026

Uh oh!

sorindumitru commented Apr 17, 2026

Uh oh!

terahertz5k commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

terahertz5k commented Mar 24, 2026 •

edited

Loading