Improve entry lookup with events based cache#6645
Conversation
d75eef2 to
7720fde
Compare
| EntriesByEntryID: c.entriesByEntryID.Len(), | ||
| EntriesByParentID: c.entriesByParentID.Len(), | ||
| EntriesByEntryID: len(c.entriesByEntryID), | ||
| EntriesByParentID: entryByParentIDCount, |
There was a problem hiding this comment.
It might be best to just remove this stat. It's arguably not that important. It should always be equal to EntriesByEntryID
dac7e8c to
9979bb2
Compare
4f5b225 to
e51145d
Compare
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
amartinezfayo
left a comment
There was a problem hiding this comment.
Thank you @sorindumitru for this, this is a great improvement!
I have some comments / suggestions.
| c.entriesByParentID.ReplaceOrInsert(er) | ||
| c.entriesByEntryID.ReplaceOrInsert(er) | ||
|
|
||
| c.entriesByEntryID[entry.Id] = entry |
There was a problem hiding this comment.
I think there's a duplicate assignment here. c.entriesByEntryID[entry.Id] = entry appears once before the entriesByParentID lookup and then again right before parentEntries[entry.Id] = entry.
The second one seems to has no effect since the value is the same. It looks like a copy-paste artifact?
Would it make sense to remove the first occurrence and keep only the one just before parentEntries[entry.Id] = entry?
| if len(parentEntries) == 0 { | ||
| delete(c.entriesByParentID, entry.ParentId.Path) | ||
| } | ||
| } |
There was a problem hiding this comment.
Should we consider adding an early return here after deleting a normal workload entry, similar to what the old code did? Since a workload entry is never stored in aliasesByEntryID, it seems like the alias search loop that follows will always find nothing for workload entries. I think the original code had a return after if len(entryRecordsToDelete) > 0 for this reason, and it seems worth preserving that short-circuit on the common path.
| } | ||
|
|
||
| func (c *Cache) UpdateEntry(entry *types.Entry) { | ||
| if entry.ParentId.TrustDomain != c.trustDomain { |
There was a problem hiding this comment.
It might be worth adding a brief comment here explaining that this guard is what makes the path-only keying scheme in entriesByParentID and entriesByEntryID correct.
As I understand, since all stored entries are guaranteed to belong to the same trust domain, using bare paths as map keys is unambiguous. Without that context, a future reader might wonder why the full SPIFFE ID isn't used.
| for _, entry := range records[lenBefore:] { | ||
| records = c.appendDescendents(records, entry.SPIFFEID, parentSeen) | ||
| parentEntries := c.entriesByParentID[parentID] | ||
| for _, entry := range parentEntries { |
There was a problem hiding this comment.
I'm wondering whether the non-deterministic iteration order of map[string]*types.Entry could cause issues. The old btree iteration had a stable order (by parentID then entryID), while map iteration in Go is randomized. I don't see any sort-before-assert in the test changes, so I wanted to flag this in case any callers or tests implicitly depend on a stable ordering.
It might not cause failures right now but could make tests flaky.
Same thing in addDescendants.
There was a problem hiding this comment.
I had a look through this and I think this should be ok. For LookupAuthorizedEntries we return a map anyway (and the callers of it seem to not depend on the order either). For GetAuthorizedEntries we have 2 users:
| c.entriesByParentID.AscendGreaterOrEqual(pivot, func(record entryRecord) bool { | ||
| if record.ParentID != parentID { | ||
| return false | ||
| parentEntries := c.entriesByParentID[parentID] |
There was a problem hiding this comment.
nit:
I think we could squeeze a bit more out of the early-exit optimization by checking len(foundEntries) == len(requestedEntries) once at the top of the function body, before entering the loop. Right now, if the direct-parent traversal already satisfies all requests, each subsequent alias call to addDescendants still enters the loop and processes one sibling before detecting the exit condition. Adding the check before the for would let alias-level calls bail out immediately. Something like:
parentEntries := c.entriesByParentID[parentID]
if len(foundEntries) == len(requestedEntries) {
return
}
for _, entry := range parentEntries {
Seems to be really a minor thing given the benchmark numbers.
Signed-off-by: Sorin Dumitru <sorin@returnze.ro>
amartinezfayo
left a comment
There was a problem hiding this comment.
Thank you @sorindumitru!
The performance of looking up registration entries with the events based cache seem to be significantly lower than the full-sync one. There are some things that we can do to improve things:
Before:
After:
This brings it on par or better with the full-sync cache:
Some benchmarks were slightly broken so those also needed some fixing.