Improve entry lookup with events based cache by sorindumitru · Pull Request #6645 · spiffe/spire

sorindumitru · 2026-02-10T20:39:17Z

The performance of looking up registration entries with the events based cache seem to be significantly lower than the full-sync one. There are some things that we can do to improve things:

For looking up specific entries we can stop searching when we found all entries. This imporves the BenchmarkEntryLookup test 4-5x.
The events based cache uses btrees to store all information, but we don't really need that for all the data. Entry and agent data could be stored in maps and access to them would be much faster. We don't have Benchmarks for updating the cache after the initial rebuild, but those should also be faster. It uses a bit more memory, but the performance improvements are likely worth it.

Before:

BenchmarkBuildInMemory-16                             19          61686511 ns/op        27459340 B/op     110808 allocs/op
BenchmarkGetAuthorizedEntriesInMemory-16           15518             76428 ns/op           68281 B/op         13 allocs/op
BenchmarkEntryLookup-16                               42          26478334 ns/op           92193 B/op       1180 allocs/op

After:

BenchmarkBuildInMemory-16                             32          37114788 ns/op        31055134 B/op     103866 allocs/op
BenchmarkGetAuthorizedEntriesInMemory-16           55364             21229 ns/op            9464 B/op         12 allocs/op
BenchmarkEntryLookup-16                              260           4638031 ns/op           91262 B/op       1180 allocs/op

This brings it on par or better with the full-sync cache:

BenchmarkBuildInMemory-16                             39          28392282 ns/op         9912597 B/op     100307 allocs/op
BenchmarkGetAuthorizedEntriesInMemory-16           47872             25150 ns/op            9366 B/op         11 allocs/op
BenchmarkBuildSQL-16                                   8         136010010 ns/op        28472828 B/op     569549 allocs/op
BenchmarkEntryLookup-16                              134           8829545 ns/op          132226 B/op       1024 allocs/op

Some benchmarks were slightly broken so those also needed some fixing.

sorindumitru · 2026-02-10T20:45:28Z

-		EntriesByEntryID:  c.entriesByEntryID.Len(),
-		EntriesByParentID: c.entriesByParentID.Len(),
+		EntriesByEntryID:  len(c.entriesByEntryID),
+		EntriesByParentID: entryByParentIDCount,


It might be best to just remove this stat. It's arguably not that important. It should always be equal to EntriesByEntryID

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>

amartinezfayo

Thank you @sorindumitru for this, this is a great improvement!
I have some comments / suggestions.

amartinezfayo · 2026-03-12T11:53:32Z

-	c.entriesByParentID.ReplaceOrInsert(er)
-	c.entriesByEntryID.ReplaceOrInsert(er)
+
+	c.entriesByEntryID[entry.Id] = entry


I think there's a duplicate assignment here. c.entriesByEntryID[entry.Id] = entry appears once before the entriesByParentID lookup and then again right before parentEntries[entry.Id] = entry.
The second one seems to has no effect since the value is the same. It looks like a copy-paste artifact?
Would it make sense to remove the first occurrence and keep only the one just before parentEntries[entry.Id] = entry?

amartinezfayo · 2026-03-12T11:56:15Z

+			if len(parentEntries) == 0 {
+				delete(c.entriesByParentID, entry.ParentId.Path)
+			}
 		}


Should we consider adding an early return here after deleting a normal workload entry, similar to what the old code did? Since a workload entry is never stored in aliasesByEntryID, it seems like the alias search loop that follows will always find nothing for workload entries. I think the original code had a return after if len(entryRecordsToDelete) > 0 for this reason, and it seems worth preserving that short-circuit on the common path.

amartinezfayo · 2026-03-12T11:58:48Z

 }

 func (c *Cache) UpdateEntry(entry *types.Entry) {
+	if entry.ParentId.TrustDomain != c.trustDomain {


It might be worth adding a brief comment here explaining that this guard is what makes the path-only keying scheme in entriesByParentID and entriesByEntryID correct.
As I understand, since all stored entries are guaranteed to belong to the same trust domain, using bare paths as map keys is unambiguous. Without that context, a future reader might wonder why the full SPIFFE ID isn't used.

amartinezfayo · 2026-03-12T12:01:33Z

-	for _, entry := range records[lenBefore:] {
-		records = c.appendDescendents(records, entry.SPIFFEID, parentSeen)
+	parentEntries := c.entriesByParentID[parentID]
+	for _, entry := range parentEntries {


I'm wondering whether the non-deterministic iteration order of map[string]*types.Entry could cause issues. The old btree iteration had a stable order (by parentID then entryID), while map iteration in Go is randomized. I don't see any sort-before-assert in the test changes, so I wanted to flag this in case any callers or tests implicitly depend on a stable ordering.
It might not cause failures right now but could make tests flaky.
Same thing in addDescendants.

I had a look through this and I think this should be ok. For LookupAuthorizedEntries we return a map anyway (and the callers of it seem to not depend on the order either). For GetAuthorizedEntries we have 2 users:

SyncAuthorizedEntries: this requires sorting by EntryID, which the previous implementation didn't provide either, so it sorts them.

GetAuthorizedEntries: Doesn't do any sorting, but I think the agent doesn't depend on the order either. It ends up adding them into a map anyway.

amartinezfayo · 2026-03-12T12:03:14Z

-	c.entriesByParentID.AscendGreaterOrEqual(pivot, func(record entryRecord) bool {
-		if record.ParentID != parentID {
-			return false
+	parentEntries := c.entriesByParentID[parentID]


nit:
I think we could squeeze a bit more out of the early-exit optimization by checking len(foundEntries) == len(requestedEntries) once at the top of the function body, before entering the loop. Right now, if the direct-parent traversal already satisfies all requests, each subsequent alias call to addDescendants still enters the loop and processes one sibling before detecting the exit condition. Adding the check before the for would let alias-level calls bail out immediately. Something like:

parentEntries := c.entriesByParentID[parentID] if len(foundEntries) == len(requestedEntries) { return } for _, entry := range parentEntries {

Seems to be really a minor thing given the benchmark numbers.

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>

amartinezfayo

Thank you @sorindumitru!

sorindumitru requested review from MarcosDY, amartinezfayo, evan2645 and rturner3 as code owners February 10, 2026 20:39

sorindumitru force-pushed the events-lookup-improvements branch from d75eef2 to 7720fde Compare February 10, 2026 20:43

sorindumitru commented Feb 10, 2026

View reviewed changes

sorindumitru force-pushed the events-lookup-improvements branch 3 times, most recently from dac7e8c to 9979bb2 Compare February 14, 2026 10:16

MarcosDY assigned amartinezfayo Feb 19, 2026

sorindumitru marked this pull request as draft February 26, 2026 19:53

sorindumitru marked this pull request as ready for review March 3, 2026 18:14

sorindumitru force-pushed the events-lookup-improvements branch 3 times, most recently from 4f5b225 to e51145d Compare March 3, 2026 19:00

sorindumitru added 8 commits March 3, 2026 19:01

Fix the benchmark to actually do something

a1cbc8c

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>

Exit early from the lookup if we found all the requested entries

c5350e2

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>

Remove some more allocations from the events based cache lookup

fb426d7

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>

Use maps instead of btrees in some places

014a6f8

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>

Add one more benchmark

86f15ad

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>

Also verify that all entries are in the expected trust domain

48bd5a5

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>

Approximate value of EntriesByParentID

e51145d

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>

Merge branch 'main' into events-lookup-improvements

aaab740

amartinezfayo reviewed Mar 12, 2026

View reviewed changes

Review comments

918e306

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>

MarcosDY added this to the 1.15.0 milestone Mar 17, 2026

amartinezfayo approved these changes Mar 21, 2026

View reviewed changes

sorindumitru added this pull request to the merge queue Mar 21, 2026

Merged via the queue into spiffe:main with commit 35c72d2 Mar 21, 2026
50 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve entry lookup with events based cache#6645

Improve entry lookup with events based cache#6645
sorindumitru merged 9 commits intospiffe:mainfrom
sorindumitru:events-lookup-improvements

sorindumitru commented Feb 10, 2026 •

edited

Loading

Uh oh!

sorindumitru Feb 10, 2026

Uh oh!

amartinezfayo left a comment

Uh oh!

amartinezfayo Mar 12, 2026

Uh oh!

amartinezfayo Mar 12, 2026

Uh oh!

amartinezfayo Mar 12, 2026

Uh oh!

amartinezfayo Mar 12, 2026

Uh oh!

sorindumitru Mar 15, 2026

Uh oh!

amartinezfayo Mar 12, 2026

Uh oh!

amartinezfayo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sorindumitru commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sorindumitru Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

amartinezfayo left a comment

Choose a reason for hiding this comment

Uh oh!

amartinezfayo Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

amartinezfayo Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

amartinezfayo Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

amartinezfayo Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

sorindumitru Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

amartinezfayo Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

amartinezfayo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sorindumitru commented Feb 10, 2026 •

edited

Loading