Skip to content

cost_distance: add CuPy and Dask+CuPy GPU backends #860

@brendancol

Description

@brendancol

Context

PR #859 added cost_distance() with NumPy and Dask+NumPy backends. GPU backends (CuPy, Dask+CuPy) were intentionally deferred because the algorithm requires a fundamentally different approach on GPU.

Problem

The current implementation uses multi-source Dijkstra with a binary min-heap, which is inherently sequential — it processes pixels in priority-queue order. This maps poorly to GPU parallelism where thousands of threads need independent work.

Possible GPU algorithms

  • Parallel label-correcting (Bellman-Ford variant): Wavefront propagation where all pixels at the current frontier are relaxed in parallel. Simple to implement on GPU but may require many iterations to converge.
  • Delta-stepping: Bucketed relaxation that groups edges by cost into "delta" buckets. Pixels within each bucket can be relaxed in parallel. Better work-efficiency than naive Bellman-Ford but more complex to implement.
  • Parallel Dijkstra with work-efficient heap: Use a GPU-friendly priority queue (e.g., bucket queue backed by atomic operations). Harder to implement but closest to the CPU algorithm's efficiency.

References

  • Meyer & Sanders, "Delta-stepping: a parallelizable shortest path algorithm" (2003)
  • Ortega-Arranz et al., "A new GPU-based approach to the Shortest Path problem" (2013)
  • Davidson et al., "Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths" (2014)

Scope

  • CuPy backend for cost_distance()
  • Dask+CuPy backend for cost_distance()
  • Update README feature matrix GPU columns
  • Add GPU test parametrization to test_cost_distance.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions