-
Notifications
You must be signed in to change notification settings - Fork 4.1k
feat(iavl): add KV data reader & writer, and mmap wrapper #25645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #25645 +/- ##
==========================================
+ Coverage 70.40% 70.48% +0.08%
==========================================
Files 830 834 +4
Lines 54050 54380 +330
==========================================
+ Hits 38052 38332 +280
- Misses 15998 16048 +50
🚀 New features to boost your workflow:
|
| if unsafe.Sizeof(ChangesetInfo{}) != sizeChangesetInfo { | ||
| panic(fmt.Sprintf("invalid ChangesetInfo size: got %d, want %d", unsafe.Sizeof(ChangesetInfo{}), sizeChangesetInfo)) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missing in the previous PR
|
|
||
| // ValueOffset is the offset the value data for this node in the key value data file. | ||
| // The same size considerations apply here as for KeyOffset. | ||
| ValueOffset uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to efficiently cache keys, we need to allow key and value bytes to be non-contiguous in the data file. Adding a separate value offset allows us to put key and value data wherever we want to. Hopefully, the additional 4 bytes per leaf node is offset by more key caching in the kv data file.
iavl/internal/mmap.go
Outdated
| "io" | ||
| "os" | ||
| ) | ||
| import "github.com/edsrzf/mmap-go" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I am using this off-the-shelf mmap wrapper which has the highest number of known importers on pkg.go.dev: https://pkg.go.dev/github.com/edsrzf/mmap-go?tab=importedby
In the future, it may be worth considering creating our own mmap wrapper. On linux, it may be possible to apply an optimization where we can resize the mmap without unmapping memory: https://stackoverflow.com/questions/74243583/memory-map-file-with-growing-size
|
@aaronc your pull request is missing a changelog! |
|
@aaronc a few more linter compaints |
Description
This PR specifies the IAVLX KV data file format for storing the WAL as well as other key-value data (branch node keys and compacted changeset KV data), and implements the
KVDataReader,KVDataWriterandWALReadertypes. It also adds the convenienceFileWriterandMmapwrapper types.One design question for reviewers is whether we should proactively limit key and value size. I would suggest a key limit of 2^16-1 (64KB) and a value limit of 2^24-1 (16MB). Currently, this KV data file uses 32-bit offsets which limits its size to 4gb before we have to roll over. When initially writing changesets, we should probably roll over around 1 or 2gb and then compact up to 4gb. If, however, while writing a version we ran out of space, the node would crash non-deterministically. This is unlikely to happen if we roll over at 1 or 2gb unless someone introduces some really large unexpected KV data. Setting a limit to key and value size would be consensus breaking (unlikely to ever get triggered in practice), but would make such pathological scenarios cause nodes to fail more deterministically based on validation rather than just running out of disk space. We could also explore larger offsets of 40-64bits, but the larger the
kv.datfile is, the more extra disk space we need when doing compaction. And also really large key/value data should probably be considered pathological anyway. Any thoughts on all of this?