-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Hi Spatialdata team!
I'm really struggling to get categorical cell information plotted for Xenium data. I'm working with a basic Conda environment:
conda create -n spatialdata python=3.10
Retrieving notices: done
Channels:
- pytorch
- nvidia
- nodefaults
- conda-forge
- bioconda
pip install "spatialdata[extra]"
- spatialdata (v.0.5.0)
- spatialdata_plot (v.0.2.12)
Reproducing blobs
@LucaMarconato Suggested to first check the blobs() dataset:
%%time
import spatialdata as sd
from spatialdata.datasets import blobs
import spatialdata_plot
from napari_spatialdata import Interactive
import pandas as pd
import numpy as np
# Load blobs dataset
sdata = blobs()
# Plot blobs
sdata.pl.render_labels('blobs_labels').pl.show(title='blobs_labels')
# Randomly generate cell types
rng = np.random.default_rng(0) # seed for reproducibility (change/remove if you want)
cell_types = ["T-cell", "B-cell", "Myeloid", "Endothelial", "Fibroblast"]
sdata.tables["table"].obs["simulated_cell_type"] = pd.Categorical(
rng.choice(cell_types, size=sdata.tables['table'].n_obs, replace=True),
categories=cell_types,
)
# Render transcript points
sdata["blobs_points"].compute()
# Plot transcripts and blobs
sdata.pl.render_labels('blobs_labels', color='simulated_cell_type', outline_alpha=1).pl.render_points(color='genes', size=5, groups='gene_a', palette='black').pl.show(title='simulated_cell_types + gene_A')
Giving me some expected simulated cell types + a gene of interest:
Working with actual Xenium data
# Path to Xenium Ranger output directory
xenium_path = '/.../output-XETG...'
# Load as SpatialData, using cell_boundaries for simplicity
sdata = xenium(
path=str(xenium_path),
n_jobs=8,
cell_boundaries=True,
nucleus_boundaries=False,
morphology_focus=True,
cells_as_circles=False,
)
# Query a smaller region of interest
# Bounding box to region of interest
bb_xmin = 22000
bb_ymin = 5000
bb_w = 1000
bb_h = 1000
bb_xmax = bb_xmin + bb_w
bb_ymax = bb_ymin + bb_h
# Highlight region of interest
f, ax = plt.subplots(figsize=(7, 77))
sdata.pl.render_shapes().pl.show(ax=ax)
rect = patches.Rectangle((bb_xmin, bb_ymin), bb_w, bb_h, linewidth=1, edgecolor="red", facecolor="none")
ax.add_patch(rect)
Description of SpatialData object
SpatialData object
├── Images
│ └── 'morphology_focus': DataTree[cyx] (4, 17110, 48309), (4, 8555, 24154), (4, 4277, 12077), (4, 2138, 6038), (4, 1069, 3019)
├── Labels
│ ├── 'cell_labels': DataTree[yx] (17110, 48309), (8555, 24154), (4277, 12077), (2138, 6038), (1069, 3019)
│ └── 'nucleus_labels': DataTree[yx] (17110, 48309), (8555, 24154), (4277, 12077), (2138, 6038), (1069, 3019)
├── Points
│ └── 'transcripts': DataFrame with shape: (<Delayed>, 13) (3D points)
├── Shapes
│ └── 'cell_boundaries': GeoDataFrame shape: (51326, 1) (2D shapes)
└── Tables
└── 'table': AnnData (51326, 474)
with coordinate systems:
▸ 'global', with elements:
morphology_focus (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes)
Subsetting to region of interest
sdata_xenium_subset = sdata.query.bounding_box(
axes=["x", "y"],
min_coordinate=[bb_xmin, bb_ymin],
max_coordinate=[bb_xmax, bb_ymax],
target_coordinate_system="global",
)
And I did make a copy of tables to have the categorical data to plot (does this make sense?):
subset_cell_ids = sdata_xenium_subset.shapes["cell_boundaries"].index.astype(str)
table = sdata.tables["table"]
mask = table.obs["cell_id"].astype(str).isin(subset_cell_ids).to_numpy()
table_subset = table[mask].copy()
sdata_xenium_subset.tables["table"] = table_subset
to yield:
SpatialData object
├── Images
│ └── 'morphology_focus': DataTree[cyx] (4, 1000, 1000), (4, 500, 500), (4, 250, 250), (4, 125, 125), (4, 63, 62)
├── Labels
│ ├── 'cell_labels': DataTree[yx] (1000, 1000), (500, 500), (250, 250), (125, 125), (63, 62)
│ └── 'nucleus_labels': DataTree[yx] (1000, 1000), (500, 500), (250, 250), (125, 125), (63, 62)
├── Points
│ └── 'transcripts': DataFrame with shape: (<Delayed>, 13) (3D points)
├── Shapes
│ └── 'cell_boundaries': GeoDataFrame shape: (400, 1) (2D shapes)
└── Tables
└── 'table': AnnData (400, 474)
with coordinate systems:
▸ 'global', with elements:
morphology_focus (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes)
and:
<head></head>
| cell_id | transcript_counts | control_probe_counts | genomic_control_counts | control_codeword_counts | unassigned_codeword_counts | deprecated_codeword_counts | total_counts | cell_area | nucleus_area | nucleus_count | segmentation_method | region | z_level
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
adeaegji-1 | 528 | 0 | 0 | 0 | 0 | 0 | 528 | 38.834376 | 33.415626 | NaN | Segmented by boundary stain (ATP1A1+CD45+E-Cad... | cell_labels | NaN
adedfahn-1 | 874 | 0 | 0 | 0 | 0 | 0 | 874 | 73.017659 | 51.387814 | NaN | Segmented by boundary stain (ATP1A1+CD45+E-Cad... | cell_labels | NaN
adefbodo-1 | 792 | 0 | 0 | 0 | 0 | 0 | 792 | 63.399377 | 30.661095 | NaN | Segmented by boundary stain (ATP1A1+CD45+E-Cad... | cell_labels | NaN
cdpamedj-1 | 181 | 0 | 0 | 0 | 0 | 0 | 181 | 26.010001 | 9.076407 | NaN | Segmented by interior stain (18S) | cell_labels | NaN
cdpcemeb-1 | 415 | 0 | 0 | 0 | 0 | 0 | 415 | 38.337658 | 22.081407 | NaN | Segmented by interior stain (18S) | cell_labels | NaN
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ...
danegmoa-1 | 2158 | 0 | 0 | 0 | 0 | 0 | 2158 | 186.224382 | 87.332191 | NaN | Segmented by interior stain (18S) | cell_labels | NaN
danoeini-1 | 1428 | 0 | 0 | 0 | 0 | 0 | 1428 | 160.349850 | 45.698127 | NaN | Segmented by interior stain (18S) | cell_labels | NaN
ochhbmid-1 | 171 | 1 | 0 | 0 | 0 | 0 | 172 | 56.942033 | 51.703908 | NaN | Segmented by nucleus expansion of 5.0µm | cell_labels | NaN
ochkepmc-1 | 185 | 0 | 0 | 0 | 0 | 0 | 185 | 49.400939 | 49.400939 | NaN | Segmented by nucleus expansion of 5.0µm | cell_labels | NaN
ochllnmj-1 | 89 | 1 | 0 | 0 | 0 | 0 | 90 | 45.336877 | 40.460001 | NaN | Segmented by nucleus expansion of 5.0µm | cell_labels | NaN
Plotting basic cells works:
sdata_xenium_subset.pl.render_labels('cell_labels').pl.show(title='cell_labels')

Simulated categorical cell labels
Before doing the real thing (we have particular spatial niche annotations per cell ID that we ultimately want to plot), I try to simulate cell types and plot these:
import pandas as pd
import numpy as np
# Simulate cell types
rng = np.random.default_rng(0) # seed for reproducibility (change/remove if you want)
cell_types = ["T-cell", "B-cell", "Myeloid", "Endothelial", "Fibroblast"]
sdata_xenium_subset.tables["table"].obs["simulated_cell_type"] = pd.Categorical(
rng.choice(cell_types, size=sdata_xenium_subset.tables['table'].n_obs, replace=True),
categories=cell_types,
)
This create the following error:
sdata_xenium_subset.pl.render_labels('cell_labels', color='simulated_cell_type', outline_alpha=1).pl.show(title='simulated_cell_type')
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[161], [line 1](vscode-notebook-cell:?execution_count=161&line=1)
----> [1](vscode-notebook-cell:?execution_count=161&line=1) sdata_xenium_subset.pl.render_labels('cell_labels', color='simulated_cell_type', outline_alpha=1).pl.show(title='simulated_cell_type')
File /software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/basic.py:985, in PlotAccessor.show(self, coordinate_systems, legend_fontsize, legend_fontweight, legend_loc, legend_fontoutline, na_in_legend, colorbar, wspace, hspace, ncols, frameon, figsize, dpi, fig, title, share_extent, pad_extent, ax, return_ax, save)
973 _maybe_set_colors(
974 source=sdata[table],
975 target=sdata[table],
976 key=params_copy.color,
977 palette=params_copy.palette,
978 )
980 rasterize = (params_copy.scale is None) or (
981 isinstance(params_copy.scale, str)
982 and params_copy.scale != "full"
983 and (dpi is not None or figsize is not None)
984 )
--> [985](https://vscode-remote+ssh-002dremote-002bfarm22.vscode-resource.vscode-cdn.net/software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/basic.py:985) _render_labels(
986 sdata=sdata,
987 render_params=params_copy,
988 coordinate_system=cs,
989 ax=ax,
990 fig_params=fig_params,
991 scalebar_params=scalebar_params,
992 legend_params=legend_params,
993 rasterize=rasterize,
994 )
996 if title is None:
997 t = cs
File /software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/render.py:1062, in _render_labels(sdata, render_params, coordinate_system, ax, fig_params, scalebar_params, legend_params, rasterize)
1058 instance_id = np.unique(table.obs[instance_key].values)
1060 _, trans_data = _prepare_transformation(label, coordinate_system, ax)
-> [1062](https://vscode-remote+ssh-002dremote-002bfarm22.vscode-resource.vscode-cdn.net/software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/render.py:1062) color_source_vector, color_vector, categorical = _set_color_source_vec(
1063 sdata=sdata_filt,
1064 element=label,
1065 element_name=element,
1066 value_to_plot=color,
1067 groups=groups,
1068 palette=palette,
1069 na_color=render_params.cmap_params.na_color,
1070 cmap_params=render_params.cmap_params,
1071 table_name=table_name,
1072 table_layer=table_layer,
1073 )
1075 # rasterize could have removed labels from label
1076 # only problematic if color is specified
1077 if rasterize and color is not None:
File /software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/utils.py:757, in _set_color_source_vec(sdata, element, value_to_plot, na_color, element_name, groups, palette, cmap_params, alpha, table_name, table_layer, render_type)
752 raise ValueError(
753 f"Color key '{value_to_plot}' for element '{element_name}' been found in multiple locations: {origins}."
754 )
756 if len(origins) == 1:
--> [757](https://vscode-remote+ssh-002dremote-002bfarm22.vscode-resource.vscode-cdn.net/software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata_plot/pl/utils.py:757) color_source_vector = get_values(
758 value_key=value_to_plot,
759 sdata=sdata,
760 element_name=element_name,
761 table_name=table_name,
762 table_layer=table_layer,
763 )[value_to_plot]
765 # numerical case, return early
766 # TODO temporary split until refactor is complete
767 if color_source_vector is not None and not isinstance(color_source_vector.dtype, pd.CategoricalDtype):
File /software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata/_core/query/relational_query.py:978, in get_values(value_key, element, sdata, element_name, table_name, table_layer, return_obsm_as_is)
976 instance_key = matched_table.uns[TableModel.ATTRS_KEY][TableModel.INSTANCE_KEY]
977 obs = matched_table.obs
--> [978](https://vscode-remote+ssh-002dremote-002bfarm22.vscode-resource.vscode-cdn.net/software/conda/users/kr23/spatialdata/lib/python3.10/site-packages/spatialdata/_core/query/relational_query.py:978) assert obs[region_key].nunique() == 1
979 assert obs[instance_key].nunique() == len(matched_table)
980 else:
AssertionError:
Which unfortuantely is not really helping to pinpoint the cause of the issue. Do you have a clear precedent for how to query such Xenium data and either simulate or load categorical variables into tables['table'].obs so that they can be plotted on the cell boundaries? Thanks so much in advance! I've been in touch with @LucaMarconato already, happy to (privately) share more details on the dataset.
Best,
Koen
