You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the usage question you have. Please include as many useful details as possible.
I have ~ 20KB objects that I need to write to Parquet efficiently from Java.
In C++, C#, and Python there's a direct/bulk Arrow-Parquet write (e.g. WriteTable / write_table) that avoids row-by-row iteration, but in Java I only see row-by-row paths via RecordConsumer or internal/unstable column writers.
Questions:
Is there a supported bulk/columnar Arrow-Parquet write API in Java (e.g, VectorSchemaRoot
→ Parquet) that avoids row-by-row calls?
If not, why is Java limited to row-by-row writes today? Any roadmap for feature parity with C++/Python/C#?
For now, what's the recommended optimization path to write 20KB objects at high throughput from Java (without JNI), or is JNI/Dataset the recommended route?
Any best practices (batch sizing, encodings, writer settings) to mitigate the row-by-row overhead?