Skip to content

Plotting categorical obs onto Xenium cell boundaries #532

@krademaker

Description

@krademaker

Hi Spatialdata team!

I'm really struggling to get categorical cell information plotted for Xenium data. I'm working with a basic Conda environment:

conda create -n spatialdata python=3.10
Retrieving notices: done
Channels:
 - pytorch
 - nvidia
 - nodefaults
 - conda-forge
 - bioconda
 pip install "spatialdata[extra]"
  • spatialdata (v.0.5.0)
  • spatialdata_plot (v.0.2.12)

Reproducing blobs

@LucaMarconato Suggested to first check the blobs() dataset:

%%time
import spatialdata as sd
from spatialdata.datasets import blobs
import spatialdata_plot
from napari_spatialdata import Interactive
import pandas as pd
import numpy as np

# Load blobs dataset
sdata = blobs()

# Plot blobs 
sdata.pl.render_labels('blobs_labels').pl.show(title='blobs_labels')

# Randomly generate cell types
rng = np.random.default_rng(0)  # seed for reproducibility (change/remove if you want)
cell_types = ["T-cell", "B-cell", "Myeloid", "Endothelial", "Fibroblast"]
sdata.tables["table"].obs["simulated_cell_type"] = pd.Categorical(
    rng.choice(cell_types, size=sdata.tables['table'].n_obs, replace=True),
    categories=cell_types,
)
# Render transcript points
sdata["blobs_points"].compute()

# Plot transcripts and blobs
sdata.pl.render_labels('blobs_labels', color='simulated_cell_type', outline_alpha=1).pl.render_points(color='genes', size=5, groups='gene_a', palette='black').pl.show(title='simulated_cell_types + gene_A')

Giving me some expected simulated cell types + a gene of interest:

Image

Working with actual Xenium data

# Path to Xenium Ranger output directory
xenium_path = '/.../output-XETG...'

# Load as SpatialData, using cell_boundaries for simplicity
sdata = xenium(
    path=str(xenium_path),
    n_jobs=8,
    cell_boundaries=True,
    nucleus_boundaries=False,
    morphology_focus=True,
    cells_as_circles=False,
)

# Query a smaller region of interest
# Bounding box to region of interest
bb_xmin = 22000
bb_ymin = 5000
bb_w = 1000
bb_h = 1000
bb_xmax = bb_xmin + bb_w
bb_ymax = bb_ymin + bb_h

# Highlight region of interest
f, ax = plt.subplots(figsize=(7, 77))
sdata.pl.render_shapes().pl.show(ax=ax)
rect = patches.Rectangle((bb_xmin, bb_ymin), bb_w, bb_h, linewidth=1, edgecolor="red", facecolor="none")
ax.add_patch(rect)

To give you an idea:
Image

Description of SpatialData object

SpatialData object
├── Images
│     └── 'morphology_focus': DataTree[cyx] (4, 17110, 48309), (4, 8555, 24154), (4, 4277, 12077), (4, 2138, 6038), (4, 1069, 3019)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (17110, 48309), (8555, 24154), (4277, 12077), (2138, 6038), (1069, 3019)
│     └── 'nucleus_labels': DataTree[yx] (17110, 48309), (8555, 24154), (4277, 12077), (2138, 6038), (1069, 3019)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 13) (3D points)
├── Shapes
│     └── 'cell_boundaries': GeoDataFrame shape: (51326, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (51326, 474)
with coordinate systems:
    ▸ 'global', with elements:
        morphology_focus (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes)

Subsetting to region of interest

sdata_xenium_subset = sdata.query.bounding_box(
    axes=["x", "y"],
    min_coordinate=[bb_xmin, bb_ymin],
    max_coordinate=[bb_xmax, bb_ymax],
    target_coordinate_system="global",
)

And I did make a copy of tables to have the categorical data to plot (does this make sense?):

subset_cell_ids = sdata_xenium_subset.shapes["cell_boundaries"].index.astype(str)
table = sdata.tables["table"]
mask = table.obs["cell_id"].astype(str).isin(subset_cell_ids).to_numpy()
table_subset = table[mask].copy()
sdata_xenium_subset.tables["table"] = table_subset

to yield:

SpatialData object
├── Images
│     └── 'morphology_focus': DataTree[cyx] (4, 1000, 1000), (4, 500, 500), (4, 250, 250), (4, 125, 125), (4, 63, 62)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (1000, 1000), (500, 500), (250, 250), (125, 125), (63, 62)
│     └── 'nucleus_labels': DataTree[yx] (1000, 1000), (500, 500), (250, 250), (125, 125), (63, 62)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 13) (3D points)
├── Shapes
│     └── 'cell_boundaries': GeoDataFrame shape: (400, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (400, 474)
with coordinate systems:
    ▸ 'global', with elements:
        morphology_focus (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes)

and:

<head></head>
  | cell_id | transcript_counts | control_probe_counts | genomic_control_counts | control_codeword_counts | unassigned_codeword_counts | deprecated_codeword_counts | total_counts | cell_area | nucleus_area | nucleus_count | segmentation_method | region | z_level
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
adeaegji-1 | 528 | 0 | 0 | 0 | 0 | 0 | 528 | 38.834376 | 33.415626 | NaN | Segmented by boundary stain (ATP1A1+CD45+E-Cad... | cell_labels | NaN
adedfahn-1 | 874 | 0 | 0 | 0 | 0 | 0 | 874 | 73.017659 | 51.387814 | NaN | Segmented by boundary stain (ATP1A1+CD45+E-Cad... | cell_labels | NaN
adefbodo-1 | 792 | 0 | 0 | 0 | 0 | 0 | 792 | 63.399377 | 30.661095 | NaN | Segmented by boundary stain (ATP1A1+CD45+E-Cad... | cell_labels | NaN
cdpamedj-1 | 181 | 0 | 0 | 0 | 0 | 0 | 181 | 26.010001 | 9.076407 | NaN | Segmented by interior stain (18S) | cell_labels | NaN
cdpcemeb-1 | 415 | 0 | 0 | 0 | 0 | 0 | 415 | 38.337658 | 22.081407 | NaN | Segmented by interior stain (18S) | cell_labels | NaN
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ...
danegmoa-1 | 2158 | 0 | 0 | 0 | 0 | 0 | 2158 | 186.224382 | 87.332191 | NaN | Segmented by interior stain (18S) | cell_labels | NaN
danoeini-1 | 1428 | 0 | 0 | 0 | 0 | 0 | 1428 | 160.349850 | 45.698127 | NaN | Segmented by interior stain (18S) | cell_labels | NaN
ochhbmid-1 | 171 | 1 | 0 | 0 | 0 | 0 | 172 | 56.942033 | 51.703908 | NaN | Segmented by nucleus expansion of 5.0µm | cell_labels | NaN
ochkepmc-1 | 185 | 0 | 0 | 0 | 0 | 0 | 185 | 49.400939 | 49.400939 | NaN | Segmented by nucleus expansion of 5.0µm | cell_labels | NaN
ochllnmj-1 | 89 | 1 | 0 | 0 | 0 | 0 | 90 | 45.336877 | 40.460001 | NaN | Segmented by nucleus expansion of 5.0µm | cell_labels | NaN

Plotting basic cells works:
sdata_xenium_subset.pl.render_labels('cell_labels').pl.show(title='cell_labels')
Image

Simulated categorical cell labels

Before doing the real thing (we have particular spatial niche annotations per cell ID that we ultimately want to plot), I try to simulate cell types and plot these:

import pandas as pd
import numpy as np
# Simulate cell types
rng = np.random.default_rng(0)  # seed for reproducibility (change/remove if you want)
cell_types = ["T-cell", "B-cell", "Myeloid", "Endothelial", "Fibroblast"]
sdata_xenium_subset.tables["table"].obs["simulated_cell_type"] = pd.Categorical(
    rng.choice(cell_types, size=sdata_xenium_subset.tables['table'].n_obs, replace=True),
    categories=cell_types,
)

This create the following error:

sdata_xenium_subset.pl.render_labels('cell_labels', color='simulated_cell_type', outline_alpha=1).pl.show(title='simulated_cell_type')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[161], [line 1](vscode-notebook-cell:?execution_count=161&line=1)
----> [1](vscode-notebook-cell:?execution_count=161&line=1) sdata_xenium_subset.pl.render_labels('cell_labels', color='simulated_cell_type', outline_alpha=1).pl.show(title='simulated_cell_type')

File /software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/basic.py:985, in PlotAccessor.show(self, coordinate_systems, legend_fontsize, legend_fontweight, legend_loc, legend_fontoutline, na_in_legend, colorbar, wspace, hspace, ncols, frameon, figsize, dpi, fig, title, share_extent, pad_extent, ax, return_ax, save)
    973                 _maybe_set_colors(
    974                     source=sdata[table],
    975                     target=sdata[table],
    976                     key=params_copy.color,
    977                     palette=params_copy.palette,
    978                 )
    980         rasterize = (params_copy.scale is None) or (
    981             isinstance(params_copy.scale, str)
    982             and params_copy.scale != "full"
    983             and (dpi is not None or figsize is not None)
    984         )
--> [985](https://vscode-remote+ssh-002dremote-002bfarm22.vscode-resource.vscode-cdn.net/software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/basic.py:985)         _render_labels(
    986             sdata=sdata,
    987             render_params=params_copy,
    988             coordinate_system=cs,
    989             ax=ax,
    990             fig_params=fig_params,
    991             scalebar_params=scalebar_params,
    992             legend_params=legend_params,
    993             rasterize=rasterize,
    994         )
    996 if title is None:
    997     t = cs

File /software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/render.py:1062, in _render_labels(sdata, render_params, coordinate_system, ax, fig_params, scalebar_params, legend_params, rasterize)
   1058     instance_id = np.unique(table.obs[instance_key].values)
   1060 _, trans_data = _prepare_transformation(label, coordinate_system, ax)
-> [1062](https://vscode-remote+ssh-002dremote-002bfarm22.vscode-resource.vscode-cdn.net/software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/render.py:1062) color_source_vector, color_vector, categorical = _set_color_source_vec(
   1063     sdata=sdata_filt,
   1064     element=label,
   1065     element_name=element,
   1066     value_to_plot=color,
   1067     groups=groups,
   1068     palette=palette,
   1069     na_color=render_params.cmap_params.na_color,
   1070     cmap_params=render_params.cmap_params,
   1071     table_name=table_name,
   1072     table_layer=table_layer,
   1073 )
   1075 # rasterize could have removed labels from label
   1076 # only problematic if color is specified
   1077 if rasterize and color is not None:

File /software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/utils.py:757, in _set_color_source_vec(sdata, element, value_to_plot, na_color, element_name, groups, palette, cmap_params, alpha, table_name, table_layer, render_type)
    752     raise ValueError(
    753         f"Color key '{value_to_plot}' for element '{element_name}' been found in multiple locations: {origins}."
    754     )
    756 if len(origins) == 1:
--> [757](https://vscode-remote+ssh-002dremote-002bfarm22.vscode-resource.vscode-cdn.net/software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/utils.py:757)     color_source_vector = get_values(
    758         value_key=value_to_plot,
    759         sdata=sdata,
    760         element_name=element_name,
    761         table_name=table_name,
    762         table_layer=table_layer,
    763     )[value_to_plot]
    765     # numerical case, return early
    766     # TODO temporary split until refactor is complete
    767     if color_source_vector is not None and not isinstance(color_source_vector.dtype, pd.CategoricalDtype):

File /software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata/_core/query/relational_query.py:978, in get_values(value_key, element, sdata, element_name, table_name, table_layer, return_obsm_as_is)
    976     instance_key = matched_table.uns[TableModel.ATTRS_KEY][TableModel.INSTANCE_KEY]
    977     obs = matched_table.obs
--> [978](https://vscode-remote+ssh-002dremote-002bfarm22.vscode-resource.vscode-cdn.net/software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata/_core/query/relational_query.py:978)     assert obs[region_key].nunique() == 1
    979     assert obs[instance_key].nunique() == len(matched_table)
    980 else:

AssertionError:

Which unfortuantely is not really helping to pinpoint the cause of the issue. Do you have a clear precedent for how to query such Xenium data and either simulate or load categorical variables into tables['table'].obs so that they can be plotted on the cell boundaries? Thanks so much in advance! I've been in touch with @LucaMarconato already, happy to (privately) share more details on the dataset.

Best,
Koen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions