Skip to content

feat: new ResidentBytes trait for types which can approximate their resident memory size (stack+heap)#7049

Open
cylewitruk-stacks wants to merge 18 commits intostacks-network:developfrom
cylewitruk-stacks:perf/clarity-resident-bytes
Open

feat: new ResidentBytes trait for types which can approximate their resident memory size (stack+heap)#7049
cylewitruk-stacks wants to merge 18 commits intostacks-network:developfrom
cylewitruk-stacks:perf/clarity-resident-bytes

Conversation

@cylewitruk-stacks
Copy link
Copy Markdown
Contributor

@cylewitruk-stacks cylewitruk-stacks commented Mar 27, 2026

Description

Introduces a new ResidentBytes trait which is implemented for all types used within Contract/ContractContext and yields either exact or close approximations of their stack [+heap] size ("resident size in bytes"), which is used by #7082 as part of size-informed cache admittance/eviction policy (limiting its in-memory size).

This is a precursor to Clarity contract AST (Contract) caching, where it's important that we are able to restrict the cache's overall actual memory usage. This PR provides the trait and necessary implementations, but does not wire it up.

Includes heuristics for BTreeMap/BTreeSet/HashMap/HashSet, based on current std impls, to conservatively approximate their overhead.

Applicable issues

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ResidentBytes trait (stack + heap “resident size” estimate) and implements it across Clarity contract/context-related types to enable future memory-bounded caching.

Changes:

  • Introduces clarity-types::resident_bytes::ResidentBytes with heuristic implementations for common container types and Clarity core types.
  • Implements ResidentBytes for key VM structures (Contract, ContractContext, function signatures/callables, token/map/var metadata).
  • Adds targeted unit tests validating that resident byte accounting covers all ContractContext fields and grows with richer contracts.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
stacks-common/src/util/macros.rs Extends guarded_string! types with heap_capacity() to support resident-size estimation.
clarity/src/vm/types/signatures.rs Implements ResidentBytes for FunctionSignature.
clarity/src/vm/database/structures.rs Implements ResidentBytes for contract metadata structs (FT/NFT/map/var metadata).
clarity/src/vm/contracts.rs Implements ResidentBytes for Contract and adds contract-level resident-bytes tests.
clarity/src/vm/contexts.rs Implements ResidentBytes for ContractContext.
clarity/src/vm/callables.rs Implements ResidentBytes for DefinedFunction.
clarity-types/src/types/mod.rs Adds heap_capacity() to FunctionIdentifier for resident-size estimation.
clarity-types/src/resident_bytes.rs New module defining ResidentBytes, container heuristics, Clarity type impls, and extensive tests.
clarity-types/src/lib.rs Exports the new resident_bytes module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread clarity-types/src/resident_bytes.rs Outdated
Comment thread clarity-types/src/resident_bytes.rs Outdated
Comment thread clarity-types/src/resident_bytes.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread clarity/src/vm/contracts.rs Outdated
@coveralls
Copy link
Copy Markdown

coveralls commented Mar 30, 2026

Coverage Report for CI Build 24522626084

Coverage increased (+0.02%) to 85.736%

Details

  • Coverage increased (+0.02%) from the base build.
  • Patch coverage: 14 uncovered changes across 2 files (847 of 861 lines covered, 98.37%).
  • 3041 coverage regressions across 82 files.

Uncovered Changes

File Changed Covered %
clarity-types/src/resident_bytes.rs 636 623 97.96%
clarity/src/vm/contracts.rs 174 173 99.43%

Coverage Regressions

3041 previously-covered lines in 82 files lost coverage.

Top 10 Files by Coverage Loss Lines Losing Coverage Coverage
stackslib/src/net/inv/epoch2x.rs 228 79.09%
stackslib/src/net/chat.rs 202 92.95%
stackslib/src/chainstate/stacks/miner.rs 196 83.28%
stacks-node/src/nakamoto_node/miner.rs 149 87.34%
stackslib/src/chainstate/stacks/db/mod.rs 135 86.21%
stackslib/src/net/api/postblock_proposal.rs 126 80.0%
clarity/src/vm/costs/mod.rs 125 83.57%
stacks-node/src/nakamoto_node/relayer.rs 105 86.77%
stackslib/src/config/mod.rs 101 68.84%
stackslib/src/clarity_vm/database/marf.rs 99 60.67%

Coverage Stats

Coverage Status
Relevant Lines: 219112
Covered Lines: 187858
Line Coverage: 85.74%
Coverage Strength: 17247489.06 hits per line

💛 - Coveralls

Copy link
Copy Markdown
Contributor

@federico-stacks federico-stacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good. I’ve only added a few minor comments with possible improvements.

Also, regarding the Coveralls report: we could improve coverage for structure.rs and signature.rs modules

Comment thread clarity-types/src/resident_bytes.rs Outdated
Comment on lines +341 to +344
// Vec<Vec<u8>>: outer vec backing + each inner vec's backing
let outer = self.data.capacity() * size_of::<Vec<u8>>();
let inner: usize = self.data.iter().map(|v| v.capacity()).sum();
outer + inner
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could this reuse heap_bytes trait for Vec<T>?

Suggested change
// Vec<Vec<u8>>: outer vec backing + each inner vec's backing
let outer = self.data.capacity() * size_of::<Vec<u8>>();
let inner: usize = self.data.iter().map(|v| v.capacity()).sum();
outer + inner
self.data.heap_bytes()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that was a miss when I changed the other ones. Updated in 68cb070

fn heap_bytes(&self) -> usize {
// Counts the Arc allocation (header + pointee). Shared backing may be overcounted if
// multiple Arc handles to the same allocation are reachable in one measured graph.
ARC_OVERHEAD + size_of::<T>() + (**self).heap_bytes()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to avoid overcounting Arc by distributing the cost across handles using Arc::strong_count().

For example:

Suggested change
ARC_OVERHEAD + size_of::<T>() + (**self).heap_bytes()
let alloc = ARC_OVERHEAD + size_of::<T>() + (**self).heap_bytes();
alloc / Arc::strong_count(self)

What do you think?

Copy link
Copy Markdown
Contributor Author

@cylewitruk-stacks cylewitruk-stacks Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arc::strong_count() would risk overcounting in another way as it would include process-wide clones. The truly correct way would be to add a context-aware heap_bytes() which keeps count of the number of visits per Arc::as_ptr() within the measured object graph, but that felt somewhat overkill for the current use-case, since serde-json will deserialize these types and, even if they are equal-by-value, place each of them in their own Arc.

So, to optimize this completely, we'd really need:

  1. Deserialization which can handle aliasing,
  2. Context-aware heap_bytes() with visited-ptr tracking

I'd be fine implementing pt2 so that this implementation is "textbook correct" and future-proof. pt1 as well, but I'd probably do that in a separate PR since it changes the current functionality (but would potentially materially reduce the resident size where there are many same-values placed in their own Arcs, and would require pt2 to reduce the overcounting).

Comment thread clarity-types/src/resident_bytes.rs Outdated
return 0;
}

let buckets = (cap * hashmap::LOAD_FACTOR_INV_NUM).div_ceil(hashmap::LOAD_FACTOR_INV_DEN);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanity-check: If I recall correctly, hashbrown allocates buckets in powers of two.

If we want to conservatively overestimate, should we round the bucket count up to the next power of two?

Something like:

Suggested change
let buckets = (cap * hashmap::LOAD_FACTOR_INV_NUM).div_ceil(hashmap::LOAD_FACTOR_INV_DEN);
let buckets = (cap * hashmap::LOAD_FACTOR_INV_NUM).div_ceil(hashmap::LOAD_FACTOR_INV_DEN).next_power_of_two();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does 👍 But the math here already ensures that we end up on the next pow2 (HashMap::capacity() returns the actual usable capacity derived from the underlying pow2 bucket count) -- will add a test that proves it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked it to be testable and added tests in cc5c175 showing pow2 sizing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants