Support modifying/filtering labels elements via join operations#946
Conversation
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #946 +/- ##
==========================================
+ Coverage 92.03% 92.04% +0.01%
==========================================
Files 51 51
Lines 7761 7785 +24
==========================================
+ Hits 7143 7166 +23
- Misses 618 619 +1
🚀 New features to boost your workflow:
|
for more information, see https://pre-commit.ci
…om/selmanozleyen/spatialdata into feature/filter_operations_on_label
for more information, see https://pre-commit.ci
timtreis
left a comment
There was a problem hiding this comment.
Can this PR make use of
and ?| from spatialdata.datasets import blobs_annotating_element | ||
|
|
||
|
|
||
| def test_filter_labels2dmodel_by_instance_ids(): |
There was a problem hiding this comment.
Parametrise, don't loop over inputs
There was a problem hiding this comment.
the loop here is for the multiple scales. Or did you mean something else?
|
@timtreis About using |
|
@timtreis now I make use of |
| ) | ||
|
|
||
|
|
||
| def _get_scale_factors(labels_element: DataTree) -> list[tuple[float, float]]: |
There was a problem hiding this comment.
partially redundant with _get_scale() and _set_transformation() defined in spatialdata/transformations/_utils.py. I will remove from here as now I removed _filter_by_instance_ids (replaced by filter_by_table_query().
… to use existing API - Remove _get_scale_factors (duplicated logic already in transformations/_utils.py) - Remove _filter_by_instance_ids and subset_sdata_by_table_mask (superseded by match_sdata_to_table / filter_by_table_query) - Parametrize test_subset_sdata_by_table_mask over both API functions - Replace test_filter_2d_labels_by_instance_ids with test_filter_out_instances, parametrized over both API functions and element types (2D / multiscale labels) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ble_query Threads a filter_label_pixels: bool = False parameter through the full join stack (filter_by_table_query → match_sdata_to_table → join_spatialelement_table → _call_join → _right/_inner_join_spatialelement_table). When True, label pixels for removed instances are zeroed via a new _filter_labels_element helper (handles both DataArray and multiscale DataTree). When False (default), the existing warning is preserved but now also hints at the new flag. Tests no longer need manual _set_instance_ids_in_labels_to_zero calls or warnings suppression. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- None (default): warn that label pixels are not filtered, hint at the flag - True: filter label pixels (set removed instance pixels to zero) - False: skip silently, no warning Updated docstrings in join_spatialelement_table, match_sdata_to_table, and filter_by_table_query to document all three states. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_match_sdata_to_table - Replace test_match_sdata_to_table_match_labels_error with test_filter_out_instances: parametrized over both API functions and element types; tests all three filter_label_pixels states (None→warn, False→nullcontext noop, True→pixels filtered) - Add test_subset_sdata_by_table_mask for mixed-element subsetting - Delete test_relational_query_subset_sdata_by_table_mask.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ate in tests - Raise NotImplementedError in _filter_labels_element when element is Labels3DModel - Add test_filter_out_instances_3d_labels_not_supported parametrized over both API functions - Use an.col().is_in() instead of == [list] in 3D test (narwhals does not support nested literals) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I picked this up today. The PR before today's rework introduced a series of functions, some of them new, other redundant with other parts of the codebase. I will comment on them one by one.
The initial purpose of the PR is preserved (you can see the tests). Therefore squidpy, which was blocked by merging this work, can build on this. The new syntax is: filter_by_table_query(
sdata, "table", obs_expr=an.col("instance_id").is_in([1, 3, 4]), filter_label_pixels=True
)Here below is a self-contained example (showing also how to use # /// script
# requires-python = ">=3.11"
# dependencies = [
# "spatialdata @ git+https://github.com/scverse/spatialdata.git@8d6552ecf24a5b4e46b0c81267b18ab95c7e5660",
# "annsel",
# ]
# ///
from __future__ import annotations
import annsel as an
import numpy as np
from spatialdata import match_sdata_to_table
from spatialdata._core.query.relational_query import filter_by_table_query
from spatialdata.datasets import blobs_annotating_element
sdata = blobs_annotating_element("blobs_labels")
keep_ids = [1, 3, 4]
# match_sdata_to_table: pre-filter the table to the desired instances
table = sdata.tables["table"]
subset_table = table[table.obs["instance_id"].isin(keep_ids)]
result_match = match_sdata_to_table(sdata, "table", table=subset_table, filter_label_pixels=True)
# filter_by_table_query: express the same filter as an annsel predicate
result_query = filter_by_table_query(
sdata, "table", obs_expr=an.col("instance_id").is_in(keep_ids), filter_label_pixels=True
)
for name, result in [("match_sdata_to_table", result_match), ("filter_by_table_query", result_query)]:
remaining = set(np.unique(result["blobs_labels"].data.compute())) - {0}
print(f"{name}: remaining instance ids = {sorted(remaining)}") |
Hi @timtreis @ilan-gold ,
I am working on this PR also and thought maybe it will be smart to have the code I use there supported here because they seem useful at first glance. So I wrote and tested subset_sdata_by_table_mask in this PR.
@LucaMarconato I can't ask for reviews other than @ilan-gold do you know why?
Some notes
PointsModel(GeoDataFrame) it assumes the index isinstance_idalwaysLabel2DModelthe image is assumed to have theinstance_ids as values themselvesLabel2DModelwhen the element is axr.DataTreeit assumes the keys are the different scalesscanpy.pp.filter_cellsCode excerpt from tests to demonstrate the usage: