Skip to content

[GH-2230] Implement GeoSeries.minimum_clearance#2772

Open
piyushka-ally wants to merge 1 commit intoapache:masterfrom
piyushka-ally:gh-2230-is-ccw-minimum-clearance
Open

[GH-2230] Implement GeoSeries.minimum_clearance#2772
piyushka-ally wants to merge 1 commit intoapache:masterfrom
piyushka-ally:gh-2230-is-ccw-minimum-clearance

Conversation

@piyushka-ally
Copy link

@piyushka-ally piyushka-ally commented Mar 21, 2026

[GH-2230] Implement GeoSeries.minimum_clearance

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Implements GeoSeries.minimum_clearance() from the GH-2230 EPIC to expand Sedona's geopandas-compatible API:

GeoSeries.minimum_clearance() (method)

  • Returns the minimum clearance distance of each geometry as a float64 Series — the smallest distance by which any vertex could be moved to produce an invalid geometry.
  • Backed by ST_MinimumClearance.
  • Handles the JTS/GEOS representation difference: JTS returns Double.MAX_VALUE for degenerate geometries (e.g. Point, empty); this is converted to float('inf') to match geopandas/shapely behaviour.

The method is implemented in geoseries.py (Spark SQL logic) and base.py (GeoDataFrame delegation + docstring with examples).

How was this patch tested?

  • tests/geopandas/test_geoseries.py — hardcoded expected-value tests including GeoDataFrame delegation:
    • test_minimum_clearance: unit square → 1.0, half-unit square → 0.5, degenerate MultiPointinf
  • tests/geopandas/test_match_geopandas_series.py — comparison against geopandas:
    • test_minimum_clearance: iterates all geometry fixture types (self.geoms) and compares against geopandas results

All tests pass locally.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API documentation — it implements an existing stub method that was already part of the planned API surface (tracked in [EPIC] Implement More Geopandas Functions #2230). A docstring with examples is included inline.

@piyushka-ally piyushka-ally requested a review from jiayuasu as a code owner March 21, 2026 06:53
@jbampton jbampton added sedona-geopandas python Pull requests that update Python code labels Mar 21, 2026
Copy link
Member

@jbampton jbampton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem pre-commit is failing and it reformatted python/sedona/spark/geopandas/geoseries.py

We are not using Python 3.14

https://github.com/apache/sedona/actions/runs/23374281851/job/68015517846?pr=2772

@piyushka-ally piyushka-ally force-pushed the gh-2230-is-ccw-minimum-clearance branch from 171e20c to 5caea51 Compare March 21, 2026 13:40
Copy link
Contributor

@petern48 petern48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! Overall, the changes look good 👏 . Except that I would prefer skipping the implementation of is_ccw() for now (see my comments)

Comment on lines +499 to +504
Notes
-----
Unlike geopandas, which also supports LinearRing inputs, this
implementation uses Sedona's ``ST_IsPolygonCCW`` which only recognises
Polygon and MultiPolygon geometries. All other geometry types return
``False``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually had a previous contribution attempt for is_ccw(), and encountered this problem, and we agreed not to implement it for now (#2386 (comment)), given it's not easy to replicate the full desired behavior. We're not just missing proper behavior for LinearRings (which are rare), but we would have the incorrect behavior for LineStrings, which are very common. As you can see in the code snippet below, LineString can evaluate to True here too, but our existing ST_IsPolygonCCW() isn't capable of determining that at the moment.

import shapely
from shapely import LineString
from geopandas import GeoSeries

ls = LineString([[0, 0], [1, 0], [1, 1], [2, 2]])
gs = GeoSeries(ls)
print(gs.is_ccw)

In order to implement proper support for is_ccw(), I think we would need to investigate doing some more changes in the actual Java code and potentially look into the upstream JTS library. Given that complexity, I would rather skip implementing it entirely for now (or at least in a separate PR if you really want to investigate it, though it would be harder). Could we remove it for now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please.

No worries, I will try to pick some more functions I can implement. Thanks for reviewing my PR.

Copy link
Author

@piyushka-ally piyushka-ally left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please approve. Thanks.

@petern48
Copy link
Contributor

Please approve. Thanks.

You haven't addressed the review feedback that I left (I don't see any new commits pushed). Do you plan to fulfill my request? I don't think this PR is ready to merge without it.

@jiayuasu
Copy link
Member

Note that all Python CIs are currently failing due to #2774

We cannot merge this PR without all green python CIs. Meanwhile, please fix what Peter suggested.

@piyushka-ally
Copy link
Author

Hey sorry, I thought this was my other PR. I am working on this one, will push shortly.

@piyushka-ally piyushka-ally force-pushed the gh-2230-is-ccw-minimum-clearance branch from 5caea51 to b6140e2 Compare March 23, 2026 05:11
@piyushka-ally
Copy link
Author

Pre-commit hooks pass locally, and I have removed the is_ccw() implementation. Thanks.

Copy link
Contributor

@petern48 petern48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions. But we should be ready after this.

Could you also remove is_ccw() from the PR title and description? The title ends up in our commits when the PR is merged, and we don't want to mislead others when they look back.

return _delegate_to_geometry_column("is_ring", self)

# @property
# def is_ccw(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# def is_ccw(self):
# @property
# def is_ccw(self):

Let's undo this deletion too, since we're doing one more iteration anyways.

Comment on lines +1082 to +1087
spark_col = stf.ST_MinimumClearance(self.spark.column)
# JTS returns Double.MAX_VALUE for degenerate geometries (e.g. Point, empty);
# convert to float('inf') to match geopandas/shapely behaviour.
spark_expr = F.when(
spark_col >= sys.float_info.max, F.lit(float("inf"))
).otherwise(spark_col)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked out this code and confirmed this extra conditional logic is necessary to pass tests 👍

(Just a note, no action needed by you)

Comment on lines +1132 to +1136
"""Return the minimum clearance of each geometry.

The minimum clearance is the smallest distance by which a vertex of
a geometry could be moved to produce an invalid geometry. A larger
value indicates a more robust geometry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you replace this entire docstring with a copy-paste from the original geopandas one here? This is what we generally prefer to do. For this case, I like that their docstring mentions the infinity edge case behavior, and the examples they give are more helpful than the current ones that only show examples for polygons.

Comment on lines +1620 to +1623
Polygon([(0, 0), (0.5, 0), (0.5, 0.5), (0, 0.5)]),
]
)
expected = pd.Series([1.0, 0.5])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Polygon([(0, 0), (0.5, 0), (0.5, 0.5), (0, 0.5)]),
]
)
expected = pd.Series([1.0, 0.5])
Polygon([(0, 0), (0.5, 0), (0.5, 0.5), (0, 0.5)]),
MultiPoint([(1, 1), (1, 1)]),
]
)
expected = pd.Series([1.0, 0.5, float("inf")])

I'd like to add this edge case, since it's not covered here or in test_match_geopandas_series.py. MultiPoint with identical points should return an inf, as Point would. Tested this locally, and already confirmed this test should pass.

@piyushka-ally
Copy link
Author

Sure let me work on it

@piyushka-ally piyushka-ally force-pushed the gh-2230-is-ccw-minimum-clearance branch from b6140e2 to 43243a1 Compare March 24, 2026 05:39
@piyushka-ally piyushka-ally changed the title [GH-2230] Implement GeoSeries.is_ccw and GeoSeries.minimum_clearance [GH-2230] Implement GeoSeries.minimum_clearance Mar 24, 2026
@piyushka-ally
Copy link
Author

Done please recheck and let me know if any further changes are necessary. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Pull requests that update Python code sedona-geopandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants