Skip to content

proximity KDTree: sequential .compute() loop + unbounded target accumulation #879

@brendancol

Description

@brendancol

Summary

The KDTree path in proximity() iterates over every chunk calling .compute() sequentially, accumulating all target coordinates in memory. For dense target data this can consume unbounded RAM, and the subsequent cKDTree construction requires all coordinates in RAM simultaneously.

OOM Code Paths

Line Code Issue
proximity.py:441 chunk_data = raster.data.blocks[iy, ix].compute() sequential .compute() in loop
proximity.py:452 target_list.append(coords) unbounded accumulation of target coords

For a 30 TB raster with dense target data, target_list could accumulate billions of (y, x) coordinate pairs, and cKDTree construction requires all of them in RAM.

Estimated peak RAM: unbounded — depends on target density.

Severity

CRITICAL — OOM for datasets with dense target pixels.

Suggested Fix

  • Stream target coordinates to disk or use an out-of-core spatial index
  • Limit KDTree construction to per-chunk or tiled regions with overlap
  • Consider using a chunked nearest-neighbor approach that doesn't require a global spatial index

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingoomOut-of-memory risk with large datasets

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions