Skip to content

improve the fixed chunker for better sparse file support #5565

@ThomasWaldmann

Description

@ThomasWaldmann

we have a fixed blocksize chunker since a while in master branch that is good for reading raw disk files (e.g. virtual machine disks). also, it is rather simple Cython code (not hard to change/maintain C code as the variable size buzhash chunker).

often, these disk files are sparse, so unused blocks are not stored on disk.

for sparse files, there is a os.lseek(..., SEEK_HOLE/SEEK_DATA) api to discover the ranges that actually have data (are stored on-disk) and the ranges that are holes and have no data (the fs would just generate zeros when reading them).

the chunker currently reads input files completely. this could be optimized so it does not read holes from the fs, saving some all-zeros data shuffling from the fs code (kernel) to the borg code (userspace).

with the chunker being adapted to read only some ranges from the file (which have data we want) and not other ranges (which do not have data we want), this could be also used for other purposes in future (e.g. if we get a "changed blocks" list from some CBT "changed blocks tracking" mechanism outside of borg).

💰 there is a bounty for this

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions