-
Notifications
You must be signed in to change notification settings - Fork 42
feat(rust/sedona-geoparquet): Add read support for Geometry/Geography Parquet types #561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for reading Parquet files with native Geometry and Geography logical types, enabling SedonaDB to read geospatial data that uses the newer Parquet format's built-in spatial types instead of (or in addition to) the GeoParquet metadata specification.
Changes:
- Extended metadata parsing to extract spatial information from Parquet Geometry/Geography logical types
- Added spatial pruning based on native Parquet geospatial statistics
- Implemented CRS translation from Parquet logical types to GeoParquet format
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona-geoparquet/src/metadata.rs | Added parsing of Geometry/Geography types from Parquet schema and conversion to GeoParquet metadata |
| rust/sedona-geoparquet/src/format.rs | Updated metadata extraction to use new try_from_parquet_metadata method |
| rust/sedona-geoparquet/src/file_opener.rs | Added spatial pruning based on native Parquet geospatial statistics |
| rust/sedona-geoparquet/Cargo.toml | Added "sql" feature to datafusion dependency |
| python/sedonadb/tests/io/test_parquet.py | Added tests for reading Parquet files with native Geometry types |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This PR adds support for Parquet files with (the new) Geometry or Geography types.
In DataFusion 52 (one version after we are currently using), the Arrow version used will convert automatically do the GeoArrow type conversion (when the parquet crate is built with the geospatial feature, which we can enable); however, all the pruning code is still relevant and the tests are much improved by this PR.
One caveat is that until we switch to DataFusion 52, nested geometry columns in Parquet files won't be recognized (will need explicit ST_GeomFromWKB). It's possible to work around this but requires a more verbose approach and given that it will be supported soon without us doing anything I think it's worth leaving it.
Closes #133.
Also works with pruning: