Make blobcache compression format aware#731
Conversation
|
/packit rebuild-failed |
image/pkg/blobcache/dest.go
Outdated
| @@ -197,9 +203,8 @@ func (d *blobCacheDestination) PutBlobWithOptions(ctx context.Context, stream io | |||
| if n >= len(initial) { | |||
| compression = archive.DetectCompression(initial[:n]) | |||
There was a problem hiding this comment.
There's a similar function in pkg/compression that we could call instead, so that we don't have to compare the names of algorithms using archive package.
image/pkg/blobcache/dest.go
Outdated
| // Note the relationship between the two files. | ||
| if err := ioutils.AtomicWriteFile(decompressedFilename+compressedNote, []byte(compressedDigest.String()), 0o600); err != nil { | ||
| logrus.Debugf("error noting that the compressed version of %q is %q: %v", digester.Digest().String(), compressedDigest.String(), err) | ||
| if err := ioutils.AtomicWriteFile(decompressedFilename+compressedNote, []byte(compressedNoteValue), 0o600); err != nil { |
There was a problem hiding this comment.
If I'm reading this right, the cache will retain both the gzip-compressed version of a blob and a zstd-compressed version if the same image is pushed more than once, but it will lose track of one or the other.
There was a problem hiding this comment.
Good catch, I didn't realize that.
b1d0748 to
560826f
Compare
560826f to
eac3d32
Compare
|
PTAL @nalind |
eac3d32 to
a28b7f2
Compare
|
@containers/container-libs-maintainers PTAL |
Record the compression algorithm in the `.compressed` sidecar note alongside the digest, so cached blobs are no longer assumed to be gzip-only. Update `layerInfoForCopy` to select the correct media type and strip `zstd:chunked` annotations on decompression. Part of: containers#205 Related to: containers/buildah#6660 Fixes: https://redhat.atlassian.net/browse/RUN-1124 Signed-off-by: Jan Rodák <hony.com@seznam.cz>
a28b7f2 to
c54c30d
Compare
mtrmac
left a comment
There was a problem hiding this comment.
Late - LGTM overall, please fix do fix the DetectCompressionFormat call.
Should layerInfoForCopy strictly match the one caller-requested algorithm? It’s certainly more useful that way than a gzip-only cache, but it requires the caller to know/guess the compression.
Generally c/image reuses any acceptable compressed version that already exists on the destination, unless ForceCompressionFormat is used. OTOH that does run into complexity, e.g. schema2 images (or OCI images when being converted during the copy to schema2 — the “source” has no idea) should not reuse Zstd layers even if they are the only candidate available [… and, ideally, schema2 images when being converted to OCI should reuse Zstd layers…].
I think it would be worth considering to add support for the “use any pre-existing layer” — IIRC the primary use case is “build things quickly and don’t re-compress”, i.e. the caller cares about speed and not so much about algorithm choice. That would require a private extension of LayerInfosForCopy, and maybe more such infrastructure; we can choose to do that anytime in the future.
| requestedCompressedAlgo := compression.Gzip | ||
| if s.reference.compressAlgorithm != nil { | ||
| requestedCompressedAlgo = *s.reference.compressAlgorithm | ||
| } |
There was a problem hiding this comment.
(Non-blocking: This could be resolved at BlobCache creation time.)
|
I will take a look when I have time. Should I address the review comments right away, or together with adding support for the 'use any pre-existing layer' feature? @mtrmac |
|
I would prefer the call to The other comments from this review are mostly stylistic; if you have time and want to clean that up, I’d like that, but I don’t think any of that is worth tracking or prioritizing. I think Buildah (+ OpenShift variants?) are the only callers of the cache, so that should drive prioritization of the “any pre-existing layer” feature. Would that be used by Buildah? This PR, I think, already fixed the reported bug, and that’s sufficiently valuable as is. |
Record the compression algorithm in the
.compressedsidecar note alongside the digest, so cached blobs are no longer assumed to be gzip-only. UpdatelayerInfoForCopyto select the correct media type and stripzstd:chunkedannotations on decompression.Part of: #205
Related to: containers/buildah#6660
Fixes: https://redhat.atlassian.net/browse/RUN-1124