Describe the usage question you have. Please include as many useful details as possible.
Hi team,
I’m currently configuring parquet-go (v18) for high-performance data ingestion and I have a question regarding the utility of the Page Index (Column Index / Offset Index).
In my current setup, I see that PageIndexEnabled can be toggled in WriterProperties. However, after digging into the arrow-go reader and scanner implementations, I couldn't find clear evidence that the Page Index is being used to perform page-level skipping during queries.
Questions:
- Read-side support: Does the current arrow-go Parquet reader or the higher-level Scanner API actually implement page-level pruning using the Page Index? Or is filtering still limited to Row Group boundaries?
- Writing strategy: If the Go reader doesn't support it yet, is there any reason to enable it during the write phase other than compatibility with external engines (like Spark or Trino)?
- Overhead: Are there any significant performance penalties when writing files with Page Index enabled in a Go-centric environment, given the extra metadata management?
I want to avoid including "dead weight" metadata in my files if it doesn't provide any performance benefits within the Go ecosystem.
Looking forward to your clarification.
Component(s)
Parquet
Describe the usage question you have. Please include as many useful details as possible.
Hi team,
I’m currently configuring parquet-go (v18) for high-performance data ingestion and I have a question regarding the utility of the Page Index (Column Index / Offset Index).
In my current setup, I see that PageIndexEnabled can be toggled in WriterProperties. However, after digging into the arrow-go reader and scanner implementations, I couldn't find clear evidence that the Page Index is being used to perform page-level skipping during queries.
Questions:
I want to avoid including "dead weight" metadata in my files if it doesn't provide any performance benefits within the Go ecosystem.
Looking forward to your clarification.
Component(s)
Parquet