Skip to content

Support modifying/filtering labels elements via join operations#946

Merged
LucaMarconato merged 23 commits into
scverse:mainfrom
selmanozleyen:feature/filter_operations_on_label
May 11, 2026
Merged

Support modifying/filtering labels elements via join operations#946
LucaMarconato merged 23 commits into
scverse:mainfrom
selmanozleyen:feature/filter_operations_on_label

Conversation

@selmanozleyen
Copy link
Copy Markdown
Member

@selmanozleyen selmanozleyen commented Jun 30, 2025

Hi @timtreis @ilan-gold ,

I am working on this PR also and thought maybe it will be smart to have the code I use there supported here because they seem useful at first glance. So I wrote and tested subset_sdata_by_table_mask in this PR.

@LucaMarconato I can't ask for reviews other than @ilan-gold do you know why?


Some notes

  • For PointsModel (GeoDataFrame) it assumes the index is instance_id always
  • For Label2DModel the image is assumed to have the instance_ids as values themselves
  • For Label2DModel when the element is a xr.DataTree it assumes the keys are the different scales
  • I didn't use the relational operations to get the shapes and points for this reason because they assume the merge is on the element indices themselves
  • I didn't add support for AnnData because we would have to also document the return types and stuff in the case of AnnData input which doesn't seem reasonable to me if its going to make it 1-1 same as scanpy.pp.filter_cells

Code excerpt from tests to demonstrate the usage:

sdata = concatenate(
        {
            "labels": blobs_annotating_element("blobs_labels"),
            "shapes": blobs_annotating_element("blobs_circles"),
            "points": blobs_annotating_element("blobs_points"),
            "multiscale_labels": blobs_annotating_element("blobs_multiscale_labels"),
        },
        concatenate_tables=True,
    )
    third_elems = sdata.tables["table"].obs["instance_id"] == 3
    subset_sdata = subset_sdata_by_table_mask(sdata, "table", third_elems)

    labels_remaining_ids = set(np.unique(subset_sdata.labels["blobs_labels-labels"].data.compute())) - {0}
    assert labels_remaining_ids == {3}

    for scale in subset_sdata.labels["blobs_multiscale_labels-multiscale_labels"]:
        ms_labels_remaining_ids = set(
            np.unique(subset_sdata.labels["blobs_multiscale_labels-multiscale_labels"][scale].image.compute())
        ) - {0}
        assert ms_labels_remaining_ids == {3}

    points_remaining_ids = set(np.unique(subset_sdata.points["blobs_points-points"]["instance_id"].compute())) - {0}
    assert points_remaining_ids == {3}

    shapes_remaining_ids = set(np.unique(subset_sdata.shapes["blobs_circles-shapes"].index)) - {0}
    assert shapes_remaining_ids == {3}

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 30, 2025

Codecov Report

❌ Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.04%. Comparing base (5c241f9) to head (8d6552e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/_core/query/relational_query.py 96.55% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #946      +/-   ##
==========================================
+ Coverage   92.03%   92.04%   +0.01%     
==========================================
  Files          51       51              
  Lines        7761     7785      +24     
==========================================
+ Hits         7143     7166      +23     
- Misses        618      619       +1     
Files with missing lines Coverage Δ
src/spatialdata/_core/query/relational_query.py 91.83% <96.55%> (+0.20%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/spatialdata/_core/query/masking.py Outdated
Copy link
Copy Markdown
Member

@timtreis timtreis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this PR make use of

and ?

Comment thread src/spatialdata/_core/query/relational_query.py Outdated
Comment thread src/spatialdata/_core/query/relational_query.py Outdated
from spatialdata.datasets import blobs_annotating_element


def test_filter_labels2dmodel_by_instance_ids():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parametrise, don't loop over inputs

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the loop here is for the multiple scales. Or did you mean something else?

Comment thread tests/core/query/test_relational_query_subset_sdata_by_table_mask.py Outdated
Comment thread src/spatialdata/_core/query/relational_query.py Outdated
@selmanozleyen
Copy link
Copy Markdown
Member Author

selmanozleyen commented Jul 10, 2025

@timtreis About using def match_sdata_to_table( and the other. I noticed for shapes the index is assumed to be the instance_id but this doesn't match how the blobs are filled and would fail in the tests. It was documented that way so I assumed this was intentional but match_sdata_to_table wouldn't (and didn't) pass the current tests because it assumed the element index was the instance_id.

@selmanozleyen
Copy link
Copy Markdown
Member Author

@timtreis now I make use of match_element_to_table. Other than one comment all seems resolved

)


def _get_scale_factors(labels_element: DataTree) -> list[tuple[float, float]]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partially redundant with _get_scale() and _set_transformation() defined in spatialdata/transformations/_utils.py. I will remove from here as now I removed _filter_by_instance_ids (replaced by filter_by_table_query().

… to use existing API

- Remove _get_scale_factors (duplicated logic already in transformations/_utils.py)
- Remove _filter_by_instance_ids and subset_sdata_by_table_mask (superseded by match_sdata_to_table / filter_by_table_query)
- Parametrize test_subset_sdata_by_table_mask over both API functions
- Replace test_filter_2d_labels_by_instance_ids with test_filter_out_instances, parametrized over both API functions and element types (2D / multiscale labels)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LucaMarconato and others added 4 commits May 11, 2026 16:58
…ble_query

Threads a filter_label_pixels: bool = False parameter through the full join
stack (filter_by_table_query → match_sdata_to_table → join_spatialelement_table
→ _call_join → _right/_inner_join_spatialelement_table).

When True, label pixels for removed instances are zeroed via a new
_filter_labels_element helper (handles both DataArray and multiscale DataTree).
When False (default), the existing warning is preserved but now also hints at
the new flag.

Tests no longer need manual _set_instance_ids_in_labels_to_zero calls or
warnings suppression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- None (default): warn that label pixels are not filtered, hint at the flag
- True: filter label pixels (set removed instance pixels to zero)
- False: skip silently, no warning

Updated docstrings in join_spatialelement_table, match_sdata_to_table,
and filter_by_table_query to document all three states.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_match_sdata_to_table

- Replace test_match_sdata_to_table_match_labels_error with
  test_filter_out_instances: parametrized over both API functions and
  element types; tests all three filter_label_pixels states (None→warn,
  False→nullcontext noop, True→pixels filtered)
- Add test_subset_sdata_by_table_mask for mixed-element subsetting
- Delete test_relational_query_subset_sdata_by_table_mask.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ate in tests

- Raise NotImplementedError in _filter_labels_element when element is Labels3DModel
- Add test_filter_out_instances_3d_labels_not_supported parametrized over both API functions
- Use an.col().is_in() instead of == [list] in 3D test (narwhals does not support nested literals)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@LucaMarconato
Copy link
Copy Markdown
Member

I picked this up today.

The PR before today's rework introduced a series of functions, some of them new, other redundant with other parts of the codebase. I will comment on them one by one.

  • _mask_block() new, I kept it.
  • _set_instance_ids_in_labels_to_zero() new. I kept it and added a check to detect the 3D case and throw a NotImplementedError
  • _get_scale_factors(), redundant, removed. See https://github.com/scverse/spatialdata/pull/946/changes/BASE..631fe2ad10bd2f5fca78b7beccbe7ed272376406#r3219789350
  • _filter_by_instance_ids() already available for points and shapes, new for labels. I removed the dispatch mechanism and called this function (after rework), _filter_labels_element().
  • subset_sdata_by_table_mask() the only public function of the PR. I removed this function as it was redundant with match_sdata_to_table and filter_by_table_query with the assumption of supporting right joins for labels. I now implemented right joins (calling _set_instance_ids_in_labels_to_zero() internally.

The initial purpose of the PR is preserved (you can see the tests). Therefore squidpy, which was blocked by merging this work, can build on this. The new syntax is:

filter_by_table_query(
	sdata, "table", obs_expr=an.col("instance_id").is_in([1, 3, 4]), filter_label_pixels=True
)

Here below is a self-contained example (showing also how to use match_sdata_to_table().

# /// script                                                                                                                                                                                                                                                                   
# requires-python = ">=3.11"
# dependencies = [
#   "spatialdata @ git+https://github.com/scverse/spatialdata.git@8d6552ecf24a5b4e46b0c81267b18ab95c7e5660",
#   "annsel",
# ]
# ///
from __future__ import annotations

import annsel as an
import numpy as np

from spatialdata import match_sdata_to_table
from spatialdata._core.query.relational_query import filter_by_table_query
from spatialdata.datasets import blobs_annotating_element

sdata = blobs_annotating_element("blobs_labels")
keep_ids = [1, 3, 4]

# match_sdata_to_table: pre-filter the table to the desired instances
table = sdata.tables["table"]
subset_table = table[table.obs["instance_id"].isin(keep_ids)]
result_match = match_sdata_to_table(sdata, "table", table=subset_table, filter_label_pixels=True)

# filter_by_table_query: express the same filter as an annsel predicate
result_query = filter_by_table_query(
    sdata, "table", obs_expr=an.col("instance_id").is_in(keep_ids), filter_label_pixels=True
)

for name, result in [("match_sdata_to_table", result_match), ("filter_by_table_query", result_query)]:
    remaining = set(np.unique(result["blobs_labels"].data.compute())) - {0}
    print(f"{name}: remaining instance ids = {sorted(remaining)}")

@LucaMarconato LucaMarconato changed the title Filter Operations on Label2DModel and Shape Add support for modifying/filtering labels elements with joins May 11, 2026
@LucaMarconato LucaMarconato changed the title Add support for modifying/filtering labels elements with joins Support modifying/filtering labels elements with joins May 11, 2026
@LucaMarconato LucaMarconato changed the title Support modifying/filtering labels elements with joins Support modifying/filtering labels elements via join operations May 11, 2026
@LucaMarconato LucaMarconato enabled auto-merge (squash) May 11, 2026 15:41
@LucaMarconato LucaMarconato disabled auto-merge May 11, 2026 15:46
@LucaMarconato LucaMarconato merged commit 835a035 into scverse:main May 11, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants