-
Notifications
You must be signed in to change notification settings - Fork 86
Description
Reason or Problem
Six hydrology functions fall back to CPU when given Dask+CuPy DataArrays: flow_accumulation, watershed, basins, stream_order, stream_link, and snap_pour_point. They all have native CuPy (single-GPU) backends already, but the Dask+CuPy path silently drops to CPU. For large DEMs (continental-scale, high-res LiDAR), this CPU fallback becomes the bottleneck in an otherwise GPU-resident pipeline.
Proposal
Design:
Add Dask+CuPy backends for each function, following the pattern from flow_direction which already supports all four backends:
flow_accumulation: accumulate within each Dask chunk on GPU, propagate cross-chunk flows via iterative boundary exchange (like the Dask+NumPy approach but with CuPy kernels)watershed/basins: label within each chunk on GPU using the existing CuPy kernel, merge labels across chunk boundariesstream_order/stream_link: depend on flow_accumulation and flow_direction results; once those are GPU-resident, stream operations run per-chunk on GPU with boundary reconciliationsnap_pour_point: needs a native CuPy single-GPU backend first (currently fallback), then extend to Dask+CuPy
Usage:
No API changes. Functions use GPU automatically when inputs are Dask+CuPy DataArrays instead of falling back to CPU.
Value:
This lets full watershed analysis pipelines run on GPU-equipped Dask clusters. Matters most for large rasters (e.g. national 1m LiDAR DEMs) where the CPU fallback is the slow part.
Stakeholders and Impacts
Hydrologists, water resource engineers, environmental modelers working with large DEMs. Builds on existing CuPy backends, no API changes.
Drawbacks
- Graph-traversal algorithms (watershed, basins, stream_order) are harder to parallelize on GPU than embarrassingly-parallel operations. Cross-chunk label reconciliation adds real complexity.
- flow_accumulation requires iterative convergence across chunks, which can limit Dask parallelism.
- Large scope: six functions across multiple files.
Alternatives
- WhiteboxTools: fast CPU-based hydrology, but requires Rust/C deps and no xarray integration.
.compute()and use the single-GPU CuPy backends: loses Dask's out-of-core capability.
Unresolved Questions
- Best chunk-boundary reconciliation strategy (iterative sweep vs. graph-based merge).
- Whether
flow_pathshould get Dask+CuPy support here or in a follow-up. - Whether to do all six functions in one PR or split into batches.