Skip to content

Conversation

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Jan 29, 2026

This PR adds support for Parquet files with (the new) Geometry or Geography types.

In DataFusion 52 (one version after we are currently using), the Arrow version used will convert automatically do the GeoArrow type conversion (when the parquet crate is built with the geospatial feature, which we can enable); however, all the pruning code is still relevant and the tests are much improved by this PR.

One caveat is that until we switch to DataFusion 52, nested geometry columns in Parquet files won't be recognized (will need explicit ST_GeomFromWKB). It's possible to work around this but requires a more verbose approach and given that it will be supported soon without us doing anything I think it's worth leaving it.

Closes #133.

import sedona.db

sd = sedona.db.connect()

df = sd.read_parquet("https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_elevation.parquet")
df.head().show()
#> ┌───────────┬──────────────────────────┬─────────┬─────────────────────────────────────────────────┐
#> │ FEAT_CODE ┆         FEAT_DESC        ┆  ZVALUE ┆                     geometry                    │
#> │    utf8   ┆           utf8           ┆ float64 ┆                     geometry                    │
#> ╞═══════════╪══════════════════════════╪═════════╪═════════════════════════════════════════════════╡
#> │ LFTM60    ┆ DTM SPOT ELEVATION point ┆   195.9 ┆ POINT Z(388939.92339999974 4966886.8774 195.89… │
#> ├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ LFTM60    ┆ DTM SPOT ELEVATION point ┆   196.9 ┆ POINT Z(388878.42339999974 4966916.977399999 1… │
#> ├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ LFTM60    ┆ DTM SPOT ELEVATION point ┆   200.0 ┆ POINT Z(388854.92339999974 4966890.477399999 2… │
#> ├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ LFTM60    ┆ DTM SPOT ELEVATION point ┆   199.0 ┆ POINT Z(388851.0234000003 4966841.477399999 19… │
#> ├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ LFTM60    ┆ DTM SPOT ELEVATION point ┆   195.9 ┆ POINT Z(388938.3234000001 4966857.8774 195.899… │
#> └───────────┴──────────────────────────┴─────────┴─────────────────────────────────────────────────┘

Also works with pruning:

import sedona.db

sd = sedona.db.connect()

sd.read_parquet(
    "https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_elevation.parquet"
).to_view("elevation", overwrite=True)
crs = sd.view("elevation").schema.field("geometry").type.crs
sd.sql(
    f"SELECT * FROM elevation WHERE ST_DWithin(geometry, ST_Point(497344, 5020934, '{crs.to_json()}'), 100)"
).explain("analyze").show()
# ...
# row_groups_spatial_matched=1428, row_groups_spatial_pruned=1332

@paleolimbot paleolimbot requested a review from Copilot January 30, 2026 18:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for reading Parquet files with native Geometry and Geography logical types, enabling SedonaDB to read geospatial data that uses the newer Parquet format's built-in spatial types instead of (or in addition to) the GeoParquet metadata specification.

Changes:

  • Extended metadata parsing to extract spatial information from Parquet Geometry/Geography logical types
  • Added spatial pruning based on native Parquet geospatial statistics
  • Implemented CRS translation from Parquet logical types to GeoParquet format

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
rust/sedona-geoparquet/src/metadata.rs Added parsing of Geometry/Geography types from Parquet schema and conversion to GeoParquet metadata
rust/sedona-geoparquet/src/format.rs Updated metadata extraction to use new try_from_parquet_metadata method
rust/sedona-geoparquet/src/file_opener.rs Added spatial pruning based on native Parquet geospatial statistics
rust/sedona-geoparquet/Cargo.toml Added "sql" feature to datafusion dependency
python/sedonadb/tests/io/test_parquet.py Added tests for reading Parquet files with native Geometry types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@paleolimbot paleolimbot marked this pull request as ready for review January 30, 2026 19:07
@paleolimbot paleolimbot changed the title feat(rust/sedona-geoparquet): Add support for Geometry/Geography Parquet types feat(rust/sedona-geoparquet): Add read support for Geometry/Geography Parquet types Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

latest parquet spec with geometry don't seems to be suported

1 participant