[GH-2230] Implement GeoSeries.minimum_clearance#2772
[GH-2230] Implement GeoSeries.minimum_clearance#2772piyushka-ally wants to merge 1 commit intoapache:masterfrom
Conversation
jbampton
left a comment
There was a problem hiding this comment.
Seem pre-commit is failing and it reformatted python/sedona/spark/geopandas/geoseries.py
We are not using Python 3.14
https://github.com/apache/sedona/actions/runs/23374281851/job/68015517846?pr=2772
171e20c to
5caea51
Compare
petern48
left a comment
There was a problem hiding this comment.
Thank you for working on this! Overall, the changes look good 👏 . Except that I would prefer skipping the implementation of is_ccw() for now (see my comments)
| Notes | ||
| ----- | ||
| Unlike geopandas, which also supports LinearRing inputs, this | ||
| implementation uses Sedona's ``ST_IsPolygonCCW`` which only recognises | ||
| Polygon and MultiPolygon geometries. All other geometry types return | ||
| ``False``. |
There was a problem hiding this comment.
We actually had a previous contribution attempt for is_ccw(), and encountered this problem, and we agreed not to implement it for now (#2386 (comment)), given it's not easy to replicate the full desired behavior. We're not just missing proper behavior for LinearRings (which are rare), but we would have the incorrect behavior for LineStrings, which are very common. As you can see in the code snippet below, LineString can evaluate to True here too, but our existing ST_IsPolygonCCW() isn't capable of determining that at the moment.
import shapely
from shapely import LineString
from geopandas import GeoSeries
ls = LineString([[0, 0], [1, 0], [1, 1], [2, 2]])
gs = GeoSeries(ls)
print(gs.is_ccw)In order to implement proper support for is_ccw(), I think we would need to investigate doing some more changes in the actual Java code and potentially look into the upstream JTS library. Given that complexity, I would rather skip implementing it entirely for now (or at least in a separate PR if you really want to investigate it, though it would be harder). Could we remove it for now?
There was a problem hiding this comment.
Yes please.
No worries, I will try to pick some more functions I can implement. Thanks for reviewing my PR.
piyushka-ally
left a comment
There was a problem hiding this comment.
Please approve. Thanks.
You haven't addressed the review feedback that I left (I don't see any new commits pushed). Do you plan to fulfill my request? I don't think this PR is ready to merge without it. |
|
Note that all Python CIs are currently failing due to #2774 We cannot merge this PR without all green python CIs. Meanwhile, please fix what Peter suggested. |
|
Hey sorry, I thought this was my other PR. I am working on this one, will push shortly. |
5caea51 to
b6140e2
Compare
|
Pre-commit hooks pass locally, and I have removed the is_ccw() implementation. Thanks. |
petern48
left a comment
There was a problem hiding this comment.
A few suggestions. But we should be ready after this.
Could you also remove is_ccw() from the PR title and description? The title ends up in our commits when the PR is merged, and we don't want to mislead others when they look back.
| return _delegate_to_geometry_column("is_ring", self) | ||
|
|
||
| # @property | ||
| # def is_ccw(self): |
There was a problem hiding this comment.
| # def is_ccw(self): | |
| # @property | |
| # def is_ccw(self): |
Let's undo this deletion too, since we're doing one more iteration anyways.
| spark_col = stf.ST_MinimumClearance(self.spark.column) | ||
| # JTS returns Double.MAX_VALUE for degenerate geometries (e.g. Point, empty); | ||
| # convert to float('inf') to match geopandas/shapely behaviour. | ||
| spark_expr = F.when( | ||
| spark_col >= sys.float_info.max, F.lit(float("inf")) | ||
| ).otherwise(spark_col) |
There was a problem hiding this comment.
I checked out this code and confirmed this extra conditional logic is necessary to pass tests 👍
(Just a note, no action needed by you)
| """Return the minimum clearance of each geometry. | ||
|
|
||
| The minimum clearance is the smallest distance by which a vertex of | ||
| a geometry could be moved to produce an invalid geometry. A larger | ||
| value indicates a more robust geometry. |
There was a problem hiding this comment.
Could you replace this entire docstring with a copy-paste from the original geopandas one here? This is what we generally prefer to do. For this case, I like that their docstring mentions the infinity edge case behavior, and the examples they give are more helpful than the current ones that only show examples for polygons.
| Polygon([(0, 0), (0.5, 0), (0.5, 0.5), (0, 0.5)]), | ||
| ] | ||
| ) | ||
| expected = pd.Series([1.0, 0.5]) |
There was a problem hiding this comment.
| Polygon([(0, 0), (0.5, 0), (0.5, 0.5), (0, 0.5)]), | |
| ] | |
| ) | |
| expected = pd.Series([1.0, 0.5]) | |
| Polygon([(0, 0), (0.5, 0), (0.5, 0.5), (0, 0.5)]), | |
| MultiPoint([(1, 1), (1, 1)]), | |
| ] | |
| ) | |
| expected = pd.Series([1.0, 0.5, float("inf")]) |
I'd like to add this edge case, since it's not covered here or in test_match_geopandas_series.py. MultiPoint with identical points should return an inf, as Point would. Tested this locally, and already confirmed this test should pass.
|
Sure let me work on it |
Co-authored-by: Isaac
b6140e2 to
43243a1
Compare
|
Done please recheck and let me know if any further changes are necessary. Thanks. |
[GH-2230] Implement GeoSeries.minimum_clearance
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Part of [EPIC] Implement More Geopandas Functions #2230What changes were proposed in this PR?
Implements
GeoSeries.minimum_clearance()from the GH-2230 EPIC to expand Sedona's geopandas-compatible API:GeoSeries.minimum_clearance()(method)float64Series — the smallest distance by which any vertex could be moved to produce an invalid geometry.ST_MinimumClearance.Double.MAX_VALUEfor degenerate geometries (e.g.Point, empty); this is converted tofloat('inf')to match geopandas/shapely behaviour.The method is implemented in
geoseries.py(Spark SQL logic) andbase.py(GeoDataFrame delegation + docstring with examples).How was this patch tested?
tests/geopandas/test_geoseries.py— hardcoded expected-value tests including GeoDataFrame delegation:test_minimum_clearance: unit square →1.0, half-unit square →0.5, degenerateMultiPoint→inftests/geopandas/test_match_geopandas_series.py— comparison against geopandas:test_minimum_clearance: iterates all geometry fixture types (self.geoms) and compares against geopandas resultsAll tests pass locally.
Did this PR include necessary documentation updates?