feat: propagate GEOMETRY CRS as PostGIS SRID via EWKB#418
feat: propagate GEOMETRY CRS as PostGIS SRID via EWKB#418jatorre wants to merge 1 commit intoduckdb:mainfrom
Conversation
3468deb to
e3d8ea5
Compare
DuckDB 1.5 introduced CRS metadata on the GEOMETRY type (e.g.,
GEOMETRY('EPSG:4326')). When writing to PostGIS via binary COPY,
the SRID was lost because the writer sent plain WKB.
This change detects CRS on the GEOMETRY LogicalType and writes
EWKB (WKB with SRID header) instead:
WKB: [byte_order:1] [type:4] [payload...]
EWKB: [byte_order:1] [type|0x20000000:4] [srid:4] [payload...]
The SRID is extracted from the CRS identifier (e.g., "EPSG:4326" → 4326).
If no CRS is set, plain WKB is sent (preserving current behavior).
Before: DuckDB GEOMETRY('EPSG:4326') → PostGIS geometry(SRID=0)
After: DuckDB GEOMETRY('EPSG:4326') → PostGIS geometry(SRID=4326)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e3d8ea5 to
774ea01
Compare
|
Hey @Maxxen — this PR adds EWKB writing to postgres_scanner so that A couple of things I'd appreciate your input on:
Also — maybe you're already looking at completing this work somewhere else, or have a different approach in mind. Let me know if for topics like this you'd prefer an issue first rather than a PR. I thought this was small enough that a PR would be more useful, but happy to adjust to whatever workflow you prefer. |
|
Unfortunately this isnt safe to do, the postgres srid is not guaranteed to be the same as the numeric part of the auth code. Ig also doesnt work for non-integer auth codes, like OGC:CRS84. We need to do a lookup in the postgres databases SPATIAL_REF_SYS table and find the best match (which I think we need PROJ for, which only spatial bundles) to get the SRID. This is not really easy to do in the type conversion code and we dont want to add proj as a dependency here right now. |
|
I have some thought on how to eventually solve this in the future, that would involve breaking PROJ out from spatial into its own separate extension, however we dont want to do that yet until we have a stable extension c-api, which we are working on. |
|
Thanks for the context @Maxxen — that makes total sense about the SRID ≠ EPSG code edge case and That said, I think there's a pragmatic middle ground here: only handle the One of the key values of DuckDB in the geospatial world is being the intermediary between data warehouses — BigQuery, Snowflake, Databricks, Redshift — and PostGIS. Those cloud warehouses don't have a Covering that 99% case with a few lines of code would already be extremely valuable — right now every For the remaining cases ( Would you be open to that scoped approach? The contract would be:
Happy to adjust the code or close this if you'd rather wait for the PROJ-based solution — just wanted to make the case that the simple version already covers almost all real-world usage. |
|
Alright, maybe we can do it for EPSG only, I just worry that the SRIDs in PostGIS spatial_ref_sys table don't actually match with the EPSG codes, even for EPSG defined projections, not sure how easy that would be to verify... Either way we still lose the CRS on import - but I guess we can deal with that separately. Let me get back to this later :) |
|
Good news — I checked a default PostGIS SELECT COUNT(*) as total,
SUM(CASE WHEN srid = auth_srid::INT THEN 1 ELSE 0 END) as matching,
SUM(CASE WHEN srid != auth_srid::INT THEN 1 ELSE 0 END) as mismatched
FROM spatial_ref_sys WHERE auth_name = 'EPSG';
-- total: 6184, matching: 6184, mismatched: 0So for the default table, EPSG code = PostGIS SRID is a safe assumption. A user could theoretically insert custom rows that break this, but that's an edge case they'd need to handle themselves anyway. Happy to wait until you have time to look at it properly — no rush. |
Summary
DuckDB 1.5 introduced CRS metadata on the GEOMETRY type (e.g.,
GEOMETRY('EPSG:4326')). When writing to PostGIS via binary COPY, the SRID is currently lost because the binary writer sends plain WKB. PostGIS accepts the geometry but sets SRID=0.This PR detects CRS on the GEOMETRY column's LogicalType and writes EWKB (WKB with SRID header) instead, so PostGIS receives the correct SRID.
The Fix
In
postgres_binary_writer.hpp, the GEOMETRY case now:LogicalType::GetCRS(type)EPSG:4326→4326)0x20000000) on the WKB type field and inserting the 4-byte SRIDIf no CRS is set on the GEOMETRY type, plain WKB is sent (preserving current behavior).
Before:
After:
Context
This was the main remaining gap for transparent geospatial data transfer between DuckDB and PostGIS. With DuckDB 1.5's
GEOMETRY('EPSG:4326')type and this fix, the full pipeline works without any workarounds:Previously users had to run
UPDATE table SET geom = ST_SetSRID(geom, 4326)after every import.Related: duckdb/duckdb-spatial#587 (EWKB support discussion)