Add Dask+CuPy backends for hydrology core functions

## Reason or Problem

Six hydrology functions fall back to CPU when given Dask+CuPy DataArrays: `flow_accumulation`, `watershed`, `basins`, `stream_order`, `stream_link`, and `snap_pour_point`. They all have native CuPy (single-GPU) backends already, but the Dask+CuPy path silently drops to CPU. For large DEMs (continental-scale, high-res LiDAR), this CPU fallback becomes the bottleneck in an otherwise GPU-resident pipeline.

## Proposal

**Design:**

Add Dask+CuPy backends for each function, following the pattern from `flow_direction` which already supports all four backends:

- `flow_accumulation`: accumulate within each Dask chunk on GPU, propagate cross-chunk flows via iterative boundary exchange (like the Dask+NumPy approach but with CuPy kernels)
- `watershed` / `basins`: label within each chunk on GPU using the existing CuPy kernel, merge labels across chunk boundaries
- `stream_order` / `stream_link`: depend on flow_accumulation and flow_direction results; once those are GPU-resident, stream operations run per-chunk on GPU with boundary reconciliation
- `snap_pour_point`: needs a native CuPy single-GPU backend first (currently fallback), then extend to Dask+CuPy

**Usage:**

No API changes. Functions use GPU automatically when inputs are Dask+CuPy DataArrays instead of falling back to CPU.

**Value:**

This lets full watershed analysis pipelines run on GPU-equipped Dask clusters. Matters most for large rasters (e.g. national 1m LiDAR DEMs) where the CPU fallback is the slow part.

## Stakeholders and Impacts

Hydrologists, water resource engineers, environmental modelers working with large DEMs. Builds on existing CuPy backends, no API changes.

## Drawbacks

- Graph-traversal algorithms (watershed, basins, stream_order) are harder to parallelize on GPU than embarrassingly-parallel operations. Cross-chunk label reconciliation adds real complexity.
- flow_accumulation requires iterative convergence across chunks, which can limit Dask parallelism.
- Large scope: six functions across multiple files.

## Alternatives

- WhiteboxTools: fast CPU-based hydrology, but requires Rust/C deps and no xarray integration.
- `.compute()` and use the single-GPU CuPy backends: loses Dask's out-of-core capability.

## Unresolved Questions

- Best chunk-boundary reconciliation strategy (iterative sweep vs. graph-based merge).
- Whether `flow_path` should get Dask+CuPy support here or in a follow-up.
- Whether to do all six functions in one PR or split into batches.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dask+CuPy backends for hydrology core functions #952

Reason or Problem

Proposal

Stakeholders and Impacts

Drawbacks

Alternatives

Unresolved Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Dask+CuPy backends for hydrology core functions #952

Description

Reason or Problem

Proposal

Stakeholders and Impacts

Drawbacks

Alternatives

Unresolved Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions