Skip to content

Clarify intent of Accountable#ramBytesUsed#16000

Open
javanna wants to merge 3 commits into
apache:mainfrom
javanna:accountable_heap
Open

Clarify intent of Accountable#ramBytesUsed#16000
javanna wants to merge 3 commits into
apache:mainfrom
javanna:accountable_heap

Conversation

@javanna
Copy link
Copy Markdown
Contributor

@javanna javanna commented Apr 30, 2026

The memory tracking is meant for heap memory and not off-heap.

The memory tracking is meant for heap memory and not off-heap.
@javanna javanna added this to the 10.5.0 milestone Apr 30, 2026
@javanna
Copy link
Copy Markdown
Contributor Author

javanna commented Apr 30, 2026

If we agree on this, a possible follow-up or alternative change is to rename the method, but such change has a wide blast radius as opposed to just adjusting docs. On the other hand this class is marked internal.

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct. It maps well to the memory estimator logic (which is all about estimating heap memory). So, for Lucene, it is indeed focused on heap memory.

That said, there is nothing preventing somebody from using this interface in their own way and making it refer to off heap as well.

@javanna
Copy link
Copy Markdown
Contributor Author

javanna commented Apr 30, 2026

That said, there is nothing preventing somebody from using this interface in their own way and making it refer to off heap as well.

Agreeed.

The reason why I thought it's good to clarify is that there's data structures that may use heap and off-heap memory, in which case it is not always clear what ramBytesUsed should report on. This can lead to inconsistencies and difficulties interpreting sizing.

A concrete example: completion FSTs. They moved a few versions ago from on-heap to off-heap. Their accounting reflects that. Some folks may wonder if this is expected, or whether the reporting is inaccurate.

@mikemccand
Copy link
Copy Markdown
Member

Do we have other examples where something is moved off-heap? Does off-heap always mean "accessed via IO" (the FST example), or do we ever do something like malloc(...) as "off-heap"?

Copy link
Copy Markdown
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. While the name ramBytesUsed is misleading, but JVM heap memory is what the API reports as of today!

@javanna
Copy link
Copy Markdown
Contributor Author

javanna commented May 1, 2026

Do we have other examples where something is moved off-heap?

One that comes to mind is KnnVectorsReader , which does not implement Accountable but has a specific method to retrieve the off-heap size (getOffHeapByteSize). That seems like a conscious choice.

There may be more, I was hoping that other devs would bring those up.

@javanna
Copy link
Copy Markdown
Contributor Author

javanna commented May 6, 2026

Thanks for the feedback so far! I plan on merging this soon if there's no objections.

@javanna
Copy link
Copy Markdown
Contributor Author

javanna commented May 8, 2026

I did some more research and found maybe some discrepancy in ByteBuffersDataInput and ByteBuffersDataOutput (and indirectly PrefixCodedTerm as well as NRTCachingDirectory):
these are allocator-agnostic and would count off-heap if constructed with a direct allocator.

These few above more like edge cases / exceptions though, as the vast majority of the Accountable implementations either only use heap memory, or consciously leave out the off-heap memory they use in their ramBytesUsed implementation (e.g. OffHeapFSTStore , DirectPackedReader, LegacyDirectMonotonicReader). VectorsReader impls all consistently report on heap only by design, with a separate getOffHeapByteSize as needed.

KnnVectorDict.ramBytesUsed() unambiguously includes off-heap (mmap) bytes today, but it does not implement Accountable. That's perhaps a good candidate for renaming to avoid confusion?

I would love to get some feedback on this analysis and opinions on the overall direction. Should we consider renaming the ramBytesUsed method as a follow-up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants