feat: new ResidentBytes trait for types which can approximate their resident memory size (stack+heap)#7049
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new ResidentBytes trait (stack + heap “resident size” estimate) and implements it across Clarity contract/context-related types to enable future memory-bounded caching.
Changes:
- Introduces
clarity-types::resident_bytes::ResidentByteswith heuristic implementations for common container types and Clarity core types. - Implements
ResidentBytesfor key VM structures (Contract,ContractContext, function signatures/callables, token/map/var metadata). - Adds targeted unit tests validating that resident byte accounting covers all
ContractContextfields and grows with richer contracts.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
stacks-common/src/util/macros.rs |
Extends guarded_string! types with heap_capacity() to support resident-size estimation. |
clarity/src/vm/types/signatures.rs |
Implements ResidentBytes for FunctionSignature. |
clarity/src/vm/database/structures.rs |
Implements ResidentBytes for contract metadata structs (FT/NFT/map/var metadata). |
clarity/src/vm/contracts.rs |
Implements ResidentBytes for Contract and adds contract-level resident-bytes tests. |
clarity/src/vm/contexts.rs |
Implements ResidentBytes for ContractContext. |
clarity/src/vm/callables.rs |
Implements ResidentBytes for DefinedFunction. |
clarity-types/src/types/mod.rs |
Adds heap_capacity() to FunctionIdentifier for resident-size estimation. |
clarity-types/src/resident_bytes.rs |
New module defining ResidentBytes, container heuristics, Clarity type impls, and extensive tests. |
clarity-types/src/lib.rs |
Exports the new resident_bytes module. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Coverage Report for CI Build 24522626084Coverage increased (+0.02%) to 85.736%Details
Uncovered Changes
Coverage Regressions3041 previously-covered lines in 82 files lost coverage.
Coverage Stats
💛 - Coveralls |
federico-stacks
left a comment
There was a problem hiding this comment.
Implementation looks good. I’ve only added a few minor comments with possible improvements.
Also, regarding the Coveralls report: we could improve coverage for structure.rs and signature.rs modules
| // Vec<Vec<u8>>: outer vec backing + each inner vec's backing | ||
| let outer = self.data.capacity() * size_of::<Vec<u8>>(); | ||
| let inner: usize = self.data.iter().map(|v| v.capacity()).sum(); | ||
| outer + inner |
There was a problem hiding this comment.
nit: could this reuse heap_bytes trait for Vec<T>?
| // Vec<Vec<u8>>: outer vec backing + each inner vec's backing | |
| let outer = self.data.capacity() * size_of::<Vec<u8>>(); | |
| let inner: usize = self.data.iter().map(|v| v.capacity()).sum(); | |
| outer + inner | |
| self.data.heap_bytes() |
There was a problem hiding this comment.
Yup, that was a miss when I changed the other ones. Updated in 68cb070
| fn heap_bytes(&self) -> usize { | ||
| // Counts the Arc allocation (header + pointee). Shared backing may be overcounted if | ||
| // multiple Arc handles to the same allocation are reachable in one measured graph. | ||
| ARC_OVERHEAD + size_of::<T>() + (**self).heap_bytes() |
There was a problem hiding this comment.
We might be able to avoid overcounting Arc by distributing the cost across handles using Arc::strong_count().
For example:
| ARC_OVERHEAD + size_of::<T>() + (**self).heap_bytes() | |
| let alloc = ARC_OVERHEAD + size_of::<T>() + (**self).heap_bytes(); | |
| alloc / Arc::strong_count(self) |
What do you think?
There was a problem hiding this comment.
Arc::strong_count() would risk overcounting in another way as it would include process-wide clones. The truly correct way would be to add a context-aware heap_bytes() which keeps count of the number of visits per Arc::as_ptr() within the measured object graph, but that felt somewhat overkill for the current use-case, since serde-json will deserialize these types and, even if they are equal-by-value, place each of them in their own Arc.
So, to optimize this completely, we'd really need:
- Deserialization which can handle aliasing,
- Context-aware
heap_bytes()with visited-ptr tracking
I'd be fine implementing pt2 so that this implementation is "textbook correct" and future-proof. pt1 as well, but I'd probably do that in a separate PR since it changes the current functionality (but would potentially materially reduce the resident size where there are many same-values placed in their own Arcs, and would require pt2 to reduce the overcounting).
| return 0; | ||
| } | ||
|
|
||
| let buckets = (cap * hashmap::LOAD_FACTOR_INV_NUM).div_ceil(hashmap::LOAD_FACTOR_INV_DEN); |
There was a problem hiding this comment.
sanity-check: If I recall correctly, hashbrown allocates buckets in powers of two.
If we want to conservatively overestimate, should we round the bucket count up to the next power of two?
Something like:
| let buckets = (cap * hashmap::LOAD_FACTOR_INV_NUM).div_ceil(hashmap::LOAD_FACTOR_INV_DEN); | |
| let buckets = (cap * hashmap::LOAD_FACTOR_INV_NUM).div_ceil(hashmap::LOAD_FACTOR_INV_DEN).next_power_of_two(); |
There was a problem hiding this comment.
Yeah, it does 👍 But the math here already ensures that we end up on the next pow2 (HashMap::capacity() returns the actual usable capacity derived from the underlying pow2 bucket count) -- will add a test that proves it.
There was a problem hiding this comment.
Reworked it to be testable and added tests in cc5c175 showing pow2 sizing
Description
Introduces a new
ResidentBytestrait which is implemented for all types used withinContract/ContractContextand yields either exact or close approximations of their stack [+heap] size ("resident size in bytes"), which is used by #7082 as part of size-informed cache admittance/eviction policy (limiting its in-memory size).This is a precursor to Clarity contract AST (
Contract) caching, where it's important that we are able to restrict the cache's overall actual memory usage. This PR provides the trait and necessary implementations, but does not wire it up.Includes heuristics for
BTreeMap/BTreeSet/HashMap/HashSet, based on currentstdimpls, to conservatively approximate their overhead.Applicable issues
Checklist