-
Notifications
You must be signed in to change notification settings - Fork 205
Description
Summary
- Context: The
cagent pullcommand downloads an agent from an OCI registry and saves it to a local YAML file, using a content store as an intermediary. - Bug: The command stores artifacts under a normalized reference (without registry domain) but attempts to retrieve them using the original reference (with registry domain), causing a reference mismatch.
- Actual vs. expected: When pulling
docker.io/user/agent:latest, the artifact is stored asuser/agent:latestbut retrieved asdocker.io/user/agent:latest, which fails with "reference not found". - Impact: The
cagent pullcommand completely fails for any registry reference that includes a registry domain (e.g.,docker.io/...,myregistry.io/...,registry.example.com/...), making it impossible to pull agents from explicitly specified registries.
Code with bug
In pkg/remote/pull.go, the artifact is stored with a normalized reference:
ref, err := name.ParseReference(registryRef)
if err != nil {
return "", fmt.Errorf("parsing registry reference %s: %w", registryRef, err)
}
// ... pulling logic ...
localRef := ref.Context().RepositoryStr() + ":" + ref.Identifier() // <-- Strips registry domain
// ... more code ...
digest, err := store.StoreArtifact(img, localRef) // <-- BUG 🔴 Stores with normalized referenceIn cmd/root/pull.go, the artifact is retrieved using the original reference:
store, err := content.NewStore()
if err != nil {
return fmt.Errorf("failed to open content store: %w", err)
}
yamlFile, err := store.GetArtifact(registryRef) // <-- BUG 🔴 Retrieves with original reference
if err != nil {
return fmt.Errorf("failed to get agent yaml: %w", err)
}Evidence
Example
Let's trace through what happens when a user runs cagent pull docker.io/myagent:v1.0:
-
Storage phase (
pkg/remote/pull.go:17-32):registryRef = "docker.io/myagent:v1.0"name.ParseReference()parses this into a structured referenceref.Context().RepositoryStr()returns"myagent"(strips the registry domain)ref.Identifier()returns"v1.0"localRef = "myagent:v1.0"- Artifact is stored in content store with key
"myagent:v1.0"
-
Retrieval phase (
cmd/root/pull.go:59):registryRef = "docker.io/myagent:v1.0"(unchanged from user input)- Calls
store.GetArtifact("docker.io/myagent:v1.0")
-
Lookup phase (
pkg/content/store.go:236-261):resolveIdentifier("docker.io/myagent:v1.0")is called- Since the string contains
:, it doesn't add:latest resolveReference("docker.io/myagent:v1.0")tries to find this reference- Computes SHA256 hash of
"docker.io/myagent:v1.0" - Looks for reference file with this hash
- Fails because the stored reference was
"myagent:v1.0", not"docker.io/myagent:v1.0"
Result: User gets error "reference docker.io/myagent:v1.0 not found"
Failing test
Test script
package root
import (
"testing"
"github.com/google/go-containerregistry/pkg/name"
)
// TestPullReferenceStorageRetrievalMismatch demonstrates that there's a mismatch between
// how remote.Pull stores references and how runPullCommand retrieves them.
//
// In remote.Pull (pkg/remote/pull.go:32), the artifact is stored with:
// localRef := ref.Context().RepositoryStr() + ":" + ref.Identifier()
//
// But in runPullCommand (cmd/root/pull.go:59), it tries to retrieve with:
// yamlFile, err := store.GetArtifact(registryRef)
//
// These don't match when the registryRef includes a registry domain.
func TestPullReferenceStorageRetrievalMismatch(t *testing.T) {
testCases := []struct {
name string
registryRef string
}{
{
name: "registry with namespace",
registryRef: "myregistry.io/myagent:v1.0",
},
{
name: "docker hub with explicit registry",
registryRef: "docker.io/library/myagent:latest",
},
{
name: "private registry with namespace",
registryRef: "registry.example.com/namespace/myagent:v2.0",
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
// Simulate what remote.Pull does (pkg/remote/pull.go:17-32)
ref, err := name.ParseReference(tc.registryRef)
if err != nil {
t.Fatalf("Failed to parse reference: %v", err)
}
// This is what gets stored in the content store
localRef := ref.Context().RepositoryStr() + ":" + ref.Identifier()
// This is what runPullCommand uses to retrieve (the original registryRef)
retrievalRef := tc.registryRef
// They should match, but they don't
if localRef != retrievalRef {
t.Errorf("Reference mismatch:\n Stored as (localRef): %s\n Retrieved as (registryRef): %s\nThis will cause GetArtifact to fail with 'reference not found'",
localRef, retrievalRef)
}
})
}
}Test output
=== RUN TestPullReferenceStorageRetrievalMismatch
=== RUN TestPullReferenceStorageRetrievalMismatch/registry_with_namespace
pull_reference_bug_test.go:54: Reference mismatch:
Stored as (localRef): myagent:v1.0
Retrieved as (registryRef): myregistry.io/myagent:v1.0
This will cause GetArtifact to fail with 'reference not found'
=== RUN TestPullReferenceStorageRetrievalMismatch/docker_hub_with_explicit_registry
pull_reference_bug_test.go:54: Reference mismatch:
Stored as (localRef): library/myagent:latest
Retrieved as (registryRef): docker.io/library/myagent:latest
This will cause GetArtifact to fail with 'reference not found'
=== RUN TestPullReferenceStorageRetrievalMismatch/private_registry_with_namespace
pull_reference_bug_test.go:54: Reference mismatch:
Stored as (localRef): namespace/myagent:v2.0
Retrieved as (registryRef): registry.example.com/namespace/myagent:v2.0
This will cause GetArtifact to fail with 'reference not found'
--- FAIL: TestPullReferenceStorageRetrievalMismatch (0.00s)
--- FAIL: TestPullReferenceStorageRetrievalMismatch/registry_with_namespace (0.00s)
--- FAIL: TestPullReferenceStorageRetrievalMismatch/docker_hub_with_explicit_registry (0.00s)
--- FAIL: TestPullReferenceStorageRetrievalMismatch/private_registry_with_namespace (0.00s)
FAIL
FAIL github.com/docker/cagent/cmd/root 0.033s
FAIL
Inconsistency within the codebase
Reference code
pkg/remote/pull.go:32
localRef := ref.Context().RepositoryStr() + ":" + ref.Identifier()
// ...
digest, err := store.StoreArtifact(img, localRef)Current code
cmd/root/pull.go:59
yamlFile, err := store.GetArtifact(registryRef)Contradiction
The remote.Pull function normalizes the registry reference by stripping the registry domain using ref.Context().RepositoryStr(), which returns only the repository path without the registry. For example, docker.io/myagent:v1.0 becomes myagent:v1.0. This normalized reference is used to store the artifact in the content store.
However, cmd/root/pull.go attempts to retrieve the artifact using the original registryRef that still contains the registry domain. The content store's GetArtifact method computes a SHA256 hash of the reference string to look up the stored artifact, so docker.io/myagent:v1.0 and myagent:v1.0 produce completely different hashes and are treated as different references.
This mismatch means that any reference containing a registry domain will fail to be retrieved after being stored.
Full context
The cagent pull command is designed to download agent configurations from OCI registries (like Docker Hub or private registries) and save them as local YAML files. The workflow involves three components:
cmd/root/pull.go: The CLI command handler that coordinates the pull operationpkg/remote/pull.go: The function that downloads the OCI image from the registry and stores it in the local content storepkg/content/store.go: The content store that manages locally cached artifacts using reference-based lookups
When a user runs cagent pull <registry-ref>, the following happens:
cmd/root/pull.go:50callsremote.Pull(ctx, registryRef, f.force, opts...)remote.Pulldownloads the image and stores it in the content store with a normalized reference (line 32:localRef)- Control returns to
cmd/root/pull.go:59which attempts to retrieve the artifact usingstore.GetArtifact(registryRef) - The retrieval fails because the stored reference doesn't match the lookup reference
The same pattern is also used in pkg/config/sources.go:115-128 when resolving OCI sources for the cagent run command, meaning this bug affects multiple commands that work with OCI registries.
External documentation
ParseReference parses the string as a reference, either by tag or digest.
RepositoryStr returns the repository component of the Repository.
The RepositoryStr() method explicitly strips the registry domain and returns only the repository path. This is by design in the go-containerregistry library, as registries and repositories are separate concepts. However, for the content store lookup to work, both the store and retrieve operations must use the same reference format.
Why has this bug gone undetected?
This bug has gone undetected for several reasons:
-
Recent introduction: The bug was introduced in commit
fefeb245on November 26, 2025, only about 2 weeks ago, when the code was refactored to add OCI source support. -
Documentation uses Docker Hub shorthand: The primary documentation example (
cagent pull creek/pirate) uses Docker Hub's shorthand notation without an explicit registry domain. These references work correctly because:creek/pirateis stored ascreek/pirate:latest- It's retrieved as
creek/pirate, which becomescreek/pirate:latestafter the:latestis added inresolveIdentifier - The references match!
-
Limited test coverage: The existing tests in
pkg/remote/pull_test.goonly use simple references without registry domains (e.g.,pull-test:latest), which don't trigger the bug. -
Two-step workaround exists: Users who pull without a registry domain and then manually specify the domain might work around the issue without realizing it, though this is not the intended workflow.
-
Error message is generic: When the bug occurs, the error is "reference X not found", which could be interpreted as a network issue or a typo in the reference name, rather than an internal storage/retrieval mismatch.
-
Docker Hub is special-cased: The go-containerregistry library treats Docker Hub references specially. When you use
creek/pirate, it automatically understands this as Docker Hub, but theRepositoryStr()doesn't include the registry. This means the common use case (Docker Hub with shorthand notation) happens to work, while the documented example (Docker Hub with explicitdocker.io/) fails.
Recommended fix
The fix should use the same normalized reference format for both storage and retrieval. In cmd/root/pull.go:59, replace:
yamlFile, err := store.GetArtifact(registryRef)With:
ref, err := name.ParseReference(registryRef) // <-- FIX 🟢 Parse the reference
if err != nil {
return fmt.Errorf("failed to parse reference: %w", err)
}
localRef := ref.Context().RepositoryStr() + ":" + ref.Identifier() // <-- FIX 🟢 Normalize it
yamlFile, err := store.GetArtifact(localRef) // <-- FIX 🟢 Use normalized referenceThis ensures that the same normalization logic is applied for both storage and retrieval, making the reference formats consistent.
Related bugs
There is a separate but related bug in the filename generation on line 64 of cmd/root/pull.go:
agentName := strings.ReplaceAll(registryRef, "/", "_")
fileName := agentName + ".yaml"This code only replaces forward slashes but not colons, resulting in filenames like registry.example.com_myagent:v1.0.yaml. The colon character is invalid in Windows filenames, causing the os.WriteFile call on line 67 to fail on Windows systems.
However, this filename bug is secondary to the reference mismatch bug documented here, because the reference mismatch causes the command to fail before it even reaches the filename generation code.