Skip to content

Speed up datashader rendering of points#557

Open
timtreis wants to merge 1 commit intomainfrom
fix/issue-379-datashader-points-perf
Open

Speed up datashader rendering of points#557
timtreis wants to merge 1 commit intomainfrom
fix/issue-379-datashader-points-perf

Conversation

@timtreis
Copy link
Member

@timtreis timtreis commented Mar 24, 2026

Summary

Closes #379

Datashader was consistently slower than matplotlib for rendering points due to five performance bottlenecks in the datashader code path. This PR fixes all five:

  1. Dask DataFrame passed to cvs.points()PointsModel.parse() returns dask, but datashader's dask path has ~137x scheduler overhead on already-computed data. Fix: .compute() before aggregation.
  2. Double extent computation — extent was computed via get_extent() on dask (triggering compute), then .compute() was called again. Fix: new _datashader_canvas_from_dataframe() that reads min/max directly from the materialized pandas DataFrame.
  3. Per-point _hex_no_alpha() calls — O(n) list comprehension even when no alpha stripping was needed. Fix: deduplicate + skip when all colors already in #RRGGBB format.
  4. _build_datashader_color_key iterated all points — used expensive per-element pandas Index lookups and couldn't early-exit because sentinel categories inflated the target count. Fix: numpy arrays + count only present categories.
  5. _want_decorations O(n) set creationset(cv.tolist()) on 500K+ hex strings. Fix: numpy (cv == cv.flat[0]).all().

Performance results (after JIT warmup)

No coloring:

n matplotlib datashader (before) datashader (after) speedup
10K 0.065s 0.997s (0.07x) 0.051s 1.22x
100K 0.145s 0.267s (0.58x) 0.108s 1.32x
500K 0.540s 0.763s (0.75x) 0.383s 1.37x
1M 1.040s 0.861s (1.21x) 0.724s 1.43x

Categorical coloring:

n matplotlib datashader (after) speedup
100K 0.168s 0.154s 1.09x
500K 0.605s 0.381s 1.59x

@codecov-commenter
Copy link

codecov-commenter commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.84%. Comparing base (edca5a5) to head (e1f1ad4).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #557      +/-   ##
==========================================
- Coverage   73.89%   73.84%   -0.06%     
==========================================
  Files          10       10              
  Lines        2777     2783       +6     
  Branches      645      644       -1     
==========================================
+ Hits         2052     2055       +3     
- Misses        451      454       +3     
  Partials      274      274              
Files with missing lines Coverage Δ
src/spatialdata_plot/pl/_datashader.py 89.05% <100.00%> (-2.87%) ⬇️
src/spatialdata_plot/pl/render.py 82.10% <100.00%> (-0.04%) ⬇️
src/spatialdata_plot/pl/utils.py 65.67% <100.00%> (+0.19%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@timtreis timtreis force-pushed the fix/issue-379-datashader-points-perf branch from b56c92d to 240a5d2 Compare March 24, 2026 14:41
Datashader was consistently slower than matplotlib for points due to
five performance bottlenecks:

1. Dask DataFrame passed to cvs.points() instead of pandas (~137x
   scheduler overhead on already-computed data)
2. Double extent computation (get_extent on dask, then .compute again)
3. Per-point _hex_no_alpha() calls in O(n) list comprehension
4. _build_datashader_color_key iterated all points instead of
   early-exiting after finding all categories
5. _want_decorations created O(n) Python set from color vector

After fixes, datashader is 1.2-1.4x faster than matplotlib for plain
points and up to 1.6x faster for categorical coloring at 500K+ points.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timtreis timtreis force-pushed the fix/issue-379-datashader-points-perf branch from 240a5d2 to e1f1ad4 Compare March 24, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Datashader doesn't speed up rendering of points

2 participants