Skip to content

Improve entry lookup with events based cache#6645

Merged
sorindumitru merged 9 commits intospiffe:mainfrom
sorindumitru:events-lookup-improvements
Mar 21, 2026
Merged

Improve entry lookup with events based cache#6645
sorindumitru merged 9 commits intospiffe:mainfrom
sorindumitru:events-lookup-improvements

Conversation

@sorindumitru
Copy link
Copy Markdown
Collaborator

@sorindumitru sorindumitru commented Feb 10, 2026

The performance of looking up registration entries with the events based cache seem to be significantly lower than the full-sync one. There are some things that we can do to improve things:

  • For looking up specific entries we can stop searching when we found all entries. This imporves the BenchmarkEntryLookup test 4-5x.
  • The events based cache uses btrees to store all information, but we don't really need that for all the data. Entry and agent data could be stored in maps and access to them would be much faster. We don't have Benchmarks for updating the cache after the initial rebuild, but those should also be faster. It uses a bit more memory, but the performance improvements are likely worth it.

Before:

BenchmarkBuildInMemory-16                             19          61686511 ns/op        27459340 B/op     110808 allocs/op
BenchmarkGetAuthorizedEntriesInMemory-16           15518             76428 ns/op           68281 B/op         13 allocs/op
BenchmarkEntryLookup-16                               42          26478334 ns/op           92193 B/op       1180 allocs/op

After:

BenchmarkBuildInMemory-16                             32          37114788 ns/op        31055134 B/op     103866 allocs/op
BenchmarkGetAuthorizedEntriesInMemory-16           55364             21229 ns/op            9464 B/op         12 allocs/op
BenchmarkEntryLookup-16                              260           4638031 ns/op           91262 B/op       1180 allocs/op

This brings it on par or better with the full-sync cache:

BenchmarkBuildInMemory-16                             39          28392282 ns/op         9912597 B/op     100307 allocs/op
BenchmarkGetAuthorizedEntriesInMemory-16           47872             25150 ns/op            9366 B/op         11 allocs/op
BenchmarkBuildSQL-16                                   8         136010010 ns/op        28472828 B/op     569549 allocs/op
BenchmarkEntryLookup-16                              134           8829545 ns/op          132226 B/op       1024 allocs/op

Some benchmarks were slightly broken so those also needed some fixing.

Comment thread pkg/server/authorizedentries/cache.go Outdated
EntriesByEntryID: c.entriesByEntryID.Len(),
EntriesByParentID: c.entriesByParentID.Len(),
EntriesByEntryID: len(c.entriesByEntryID),
EntriesByParentID: entryByParentIDCount,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be best to just remove this stat. It's arguably not that important. It should always be equal to EntriesByEntryID

@sorindumitru sorindumitru force-pushed the events-lookup-improvements branch 3 times, most recently from dac7e8c to 9979bb2 Compare February 14, 2026 10:16
@sorindumitru sorindumitru marked this pull request as draft February 26, 2026 19:53
@sorindumitru sorindumitru marked this pull request as ready for review March 3, 2026 18:14
@sorindumitru sorindumitru force-pushed the events-lookup-improvements branch 3 times, most recently from 4f5b225 to e51145d Compare March 3, 2026 19:00
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Copy link
Copy Markdown
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sorindumitru for this, this is a great improvement!
I have some comments / suggestions.

Comment thread pkg/server/authorizedentries/cache.go Outdated
c.entriesByParentID.ReplaceOrInsert(er)
c.entriesByEntryID.ReplaceOrInsert(er)

c.entriesByEntryID[entry.Id] = entry
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a duplicate assignment here. c.entriesByEntryID[entry.Id] = entry appears once before the entriesByParentID lookup and then again right before parentEntries[entry.Id] = entry.
The second one seems to has no effect since the value is the same. It looks like a copy-paste artifact?
Would it make sense to remove the first occurrence and keep only the one just before parentEntries[entry.Id] = entry?

if len(parentEntries) == 0 {
delete(c.entriesByParentID, entry.ParentId.Path)
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider adding an early return here after deleting a normal workload entry, similar to what the old code did? Since a workload entry is never stored in aliasesByEntryID, it seems like the alias search loop that follows will always find nothing for workload entries. I think the original code had a return after if len(entryRecordsToDelete) > 0 for this reason, and it seems worth preserving that short-circuit on the common path.

}

func (c *Cache) UpdateEntry(entry *types.Entry) {
if entry.ParentId.TrustDomain != c.trustDomain {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a brief comment here explaining that this guard is what makes the path-only keying scheme in entriesByParentID and entriesByEntryID correct.
As I understand, since all stored entries are guaranteed to belong to the same trust domain, using bare paths as map keys is unambiguous. Without that context, a future reader might wonder why the full SPIFFE ID isn't used.

for _, entry := range records[lenBefore:] {
records = c.appendDescendents(records, entry.SPIFFEID, parentSeen)
parentEntries := c.entriesByParentID[parentID]
for _, entry := range parentEntries {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether the non-deterministic iteration order of map[string]*types.Entry could cause issues. The old btree iteration had a stable order (by parentID then entryID), while map iteration in Go is randomized. I don't see any sort-before-assert in the test changes, so I wanted to flag this in case any callers or tests implicitly depend on a stable ordering.
It might not cause failures right now but could make tests flaky.
Same thing in addDescendants.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look through this and I think this should be ok. For LookupAuthorizedEntries we return a map anyway (and the callers of it seem to not depend on the order either). For GetAuthorizedEntries we have 2 users:

  • SyncAuthorizedEntries: this requires sorting by EntryID, which the previous implementation didn't provide either, so it sorts them.
  • GetAuthorizedEntries: Doesn't do any sorting, but I think the agent doesn't depend on the order either. It ends up adding them into a map anyway.

c.entriesByParentID.AscendGreaterOrEqual(pivot, func(record entryRecord) bool {
if record.ParentID != parentID {
return false
parentEntries := c.entriesByParentID[parentID]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
I think we could squeeze a bit more out of the early-exit optimization by checking len(foundEntries) == len(requestedEntries) once at the top of the function body, before entering the loop. Right now, if the direct-parent traversal already satisfies all requests, each subsequent alias call to addDescendants still enters the loop and processes one sibling before detecting the exit condition. Adding the check before the for would let alias-level calls bail out immediately. Something like:

parentEntries := c.entriesByParentID[parentID]
if len(foundEntries) == len(requestedEntries) {
    return
}
for _, entry := range parentEntries {

Seems to be really a minor thing given the benchmark numbers.

Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
@MarcosDY MarcosDY added this to the 1.15.0 milestone Mar 17, 2026
Copy link
Copy Markdown
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sorindumitru!

@sorindumitru sorindumitru added this pull request to the merge queue Mar 21, 2026
Merged via the queue into spiffe:main with commit 35c72d2 Mar 21, 2026
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants