Skip to content

Add Dask+CuPy backends for hydrology core functions #952

@brendancol

Description

@brendancol

Reason or Problem

Six hydrology functions fall back to CPU when given Dask+CuPy DataArrays: flow_accumulation, watershed, basins, stream_order, stream_link, and snap_pour_point. They all have native CuPy (single-GPU) backends already, but the Dask+CuPy path silently drops to CPU. For large DEMs (continental-scale, high-res LiDAR), this CPU fallback becomes the bottleneck in an otherwise GPU-resident pipeline.

Proposal

Design:

Add Dask+CuPy backends for each function, following the pattern from flow_direction which already supports all four backends:

  • flow_accumulation: accumulate within each Dask chunk on GPU, propagate cross-chunk flows via iterative boundary exchange (like the Dask+NumPy approach but with CuPy kernels)
  • watershed / basins: label within each chunk on GPU using the existing CuPy kernel, merge labels across chunk boundaries
  • stream_order / stream_link: depend on flow_accumulation and flow_direction results; once those are GPU-resident, stream operations run per-chunk on GPU with boundary reconciliation
  • snap_pour_point: needs a native CuPy single-GPU backend first (currently fallback), then extend to Dask+CuPy

Usage:

No API changes. Functions use GPU automatically when inputs are Dask+CuPy DataArrays instead of falling back to CPU.

Value:

This lets full watershed analysis pipelines run on GPU-equipped Dask clusters. Matters most for large rasters (e.g. national 1m LiDAR DEMs) where the CPU fallback is the slow part.

Stakeholders and Impacts

Hydrologists, water resource engineers, environmental modelers working with large DEMs. Builds on existing CuPy backends, no API changes.

Drawbacks

  • Graph-traversal algorithms (watershed, basins, stream_order) are harder to parallelize on GPU than embarrassingly-parallel operations. Cross-chunk label reconciliation adds real complexity.
  • flow_accumulation requires iterative convergence across chunks, which can limit Dask parallelism.
  • Large scope: six functions across multiple files.

Alternatives

  • WhiteboxTools: fast CPU-based hydrology, but requires Rust/C deps and no xarray integration.
  • .compute() and use the single-GPU CuPy backends: loses Dask's out-of-core capability.

Unresolved Questions

  • Best chunk-boundary reconciliation strategy (iterative sweep vs. graph-based merge).
  • Whether flow_path should get Dask+CuPy support here or in a follow-up.
  • Whether to do all six functions in one PR or split into batches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backend-coverageAdding missing dask/cupy/dask+cupy backend supportenhancementNew feature or requestgpuCuPy / CUDA GPU support

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions