-
Notifications
You must be signed in to change notification settings - Fork 86
Closed
Labels
bugSomething isn't workingSomething isn't workingoomOut-of-memory risk with large datasetsOut-of-memory risk with large datasets
Description
Summary
The KDTree path in proximity() iterates over every chunk calling .compute() sequentially, accumulating all target coordinates in memory. For dense target data this can consume unbounded RAM, and the subsequent cKDTree construction requires all coordinates in RAM simultaneously.
OOM Code Paths
| Line | Code | Issue |
|---|---|---|
proximity.py:441 |
chunk_data = raster.data.blocks[iy, ix].compute() |
sequential .compute() in loop |
proximity.py:452 |
target_list.append(coords) |
unbounded accumulation of target coords |
For a 30 TB raster with dense target data, target_list could accumulate billions of (y, x) coordinate pairs, and cKDTree construction requires all of them in RAM.
Estimated peak RAM: unbounded — depends on target density.
Severity
CRITICAL — OOM for datasets with dense target pixels.
Suggested Fix
- Stream target coordinates to disk or use an out-of-core spatial index
- Limit KDTree construction to per-chunk or tiled regions with overlap
- Consider using a chunked nearest-neighbor approach that doesn't require a global spatial index
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingoomOut-of-memory risk with large datasetsOut-of-memory risk with large datasets