-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add options to skip decoding Statistics and SizeStatistics in Parquet metadata
#9008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Sorry I didn't see this one before. I'll try and review it shortly |
|
run benchmark encoding metadata |
|
🤖 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me -- thank you @etseidl
I launched a few more benchmarks off just to be sure this doesn't have some weird impact but I don't expect it to
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
scovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
I am working to prepare the arrow 57.2.0 release -- do we want to merge this one, or shall we wait for arrow 58? |
I'll defer to your judgement @alamb. I don't think there's any pressing need at the moment, but it also doesn't need any further changes AFAIK, and it's not a breaking change. |
|
Thanks @etseidl -- let's wait then -- we already have quite a lot of good stuff queued up |
|
This PR appears to have picked up some conflicts -- likely with |
|
main is now open for the next release so I think once this the conflicts are resolved we can merge this one in too |
Nice! That was me too this week (and there was quite a bit of backlog to get through!) I spent some non trivial amount of time over the break studying and profiling the parquet reader so I expect to be doing a bunch of micro optimizations there (aka reducing allocations / reallocations). Should be a fun January! |
Which issue does this PR close?
Rationale for this change
Add ability to skip the decoding of more types of statistics contained in the Parquet column metadata. While this currently doesn't have a huge impact on decode time, it can reduce the amount of memory used by the
ParquetMetaData.What changes are included in this PR?
Adds more options and tests for those options. Also adds size statistics to the metadata bench.
Are these changes tested?
Yes
Are there any user-facing changes?
Only adds new options, no breaking changes.