Skip to content

Make blobcache compression format aware#731

Merged
nalind merged 1 commit intocontainers:mainfrom
Honny1:blobcache-compression-support
Apr 7, 2026
Merged

Make blobcache compression format aware#731
nalind merged 1 commit intocontainers:mainfrom
Honny1:blobcache-compression-support

Conversation

@Honny1
Copy link
Copy Markdown
Member

@Honny1 Honny1 commented Mar 31, 2026

Record the compression algorithm in the .compressed sidecar note alongside the digest, so cached blobs are no longer assumed to be gzip-only. Update layerInfoForCopy to select the correct media type and strip zstd:chunked annotations on decompression.

Part of: #205

Related to: containers/buildah#6660

Fixes: https://redhat.atlassian.net/browse/RUN-1124

@github-actions github-actions bot added the image Related to "image" package label Mar 31, 2026
@Honny1 Honny1 marked this pull request as ready for review March 31, 2026 11:40
@Honny1
Copy link
Copy Markdown
Member Author

Honny1 commented Mar 31, 2026

/packit rebuild-failed

@@ -197,9 +203,8 @@ func (d *blobCacheDestination) PutBlobWithOptions(ctx context.Context, stream io
if n >= len(initial) {
compression = archive.DetectCompression(initial[:n])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a similar function in pkg/compression that we could call instead, so that we don't have to compare the names of algorithms using archive package.

// Note the relationship between the two files.
if err := ioutils.AtomicWriteFile(decompressedFilename+compressedNote, []byte(compressedDigest.String()), 0o600); err != nil {
logrus.Debugf("error noting that the compressed version of %q is %q: %v", digester.Digest().String(), compressedDigest.String(), err)
if err := ioutils.AtomicWriteFile(decompressedFilename+compressedNote, []byte(compressedNoteValue), 0o600); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, the cache will retain both the gzip-compressed version of a blob and a zstd-compressed version if the same image is pushed more than once, but it will lose track of one or the other.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I didn't realize that.

@Honny1 Honny1 force-pushed the blobcache-compression-support branch from 560826f to eac3d32 Compare April 2, 2026 15:51
@Honny1
Copy link
Copy Markdown
Member Author

Honny1 commented Apr 2, 2026

PTAL @nalind

Copy link
Copy Markdown
Member

@nalind nalind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking nits, but LGTM.

@Honny1 Honny1 force-pushed the blobcache-compression-support branch from eac3d32 to a28b7f2 Compare April 7, 2026 09:33
@Honny1
Copy link
Copy Markdown
Member Author

Honny1 commented Apr 7, 2026

@containers/container-libs-maintainers PTAL

Copy link
Copy Markdown
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Record the compression algorithm in the `.compressed` sidecar note
alongside the digest, so cached blobs are no longer assumed to be
gzip-only. Update `layerInfoForCopy` to select the correct media type
and strip `zstd:chunked` annotations on decompression.

Part of: containers#205

Related to: containers/buildah#6660

Fixes: https://redhat.atlassian.net/browse/RUN-1124

Signed-off-by: Jan Rodák <hony.com@seznam.cz>
@Honny1 Honny1 force-pushed the blobcache-compression-support branch from a28b7f2 to c54c30d Compare April 7, 2026 16:04
@nalind nalind enabled auto-merge April 7, 2026 17:30
@nalind nalind merged commit 18bc25c into containers:main Apr 7, 2026
24 checks passed
Copy link
Copy Markdown
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late - LGTM overall, please fix do fix the DetectCompressionFormat call.


Should layerInfoForCopy strictly match the one caller-requested algorithm? It’s certainly more useful that way than a gzip-only cache, but it requires the caller to know/guess the compression.

Generally c/image reuses any acceptable compressed version that already exists on the destination, unless ForceCompressionFormat is used. OTOH that does run into complexity, e.g. schema2 images (or OCI images when being converted during the copy to schema2 — the “source” has no idea) should not reuse Zstd layers even if they are the only candidate available [… and, ideally, schema2 images when being converted to OCI should reuse Zstd layers…].

I think it would be worth considering to add support for the “use any pre-existing layer” — IIRC the primary use case is “build things quickly and don’t re-compress”, i.e. the caller cares about speed and not so much about algorithm choice. That would require a private extension of LayerInfosForCopy, and maybe more such infrastructure; we can choose to do that anytime in the future.

Comment on lines +141 to +144
requestedCompressedAlgo := compression.Gzip
if s.reference.compressAlgorithm != nil {
requestedCompressedAlgo = *s.reference.compressAlgorithm
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Non-blocking: This could be resolved at BlobCache creation time.)

@Honny1
Copy link
Copy Markdown
Member Author

Honny1 commented Apr 9, 2026

I will take a look when I have time. Should I address the review comments right away, or together with adding support for the 'use any pre-existing layer' feature? @mtrmac

@mtrmac
Copy link
Copy Markdown
Contributor

mtrmac commented Apr 9, 2026

I would prefer the call to DetectCompressionFormat to be fixed right away.

The other comments from this review are mostly stylistic; if you have time and want to clean that up, I’d like that, but I don’t think any of that is worth tracking or prioritizing.


I think Buildah (+ OpenShift variants?) are the only callers of the cache, so that should drive prioritization of the “any pre-existing layer” feature. Would that be used by Buildah? This PR, I think, already fixed the reported bug, and that’s sufficiently valuable as is.

@mtrmac mtrmac mentioned this pull request Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

image Related to "image" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants