Skip to content

cagent pull fails due to reference mismatch between storage and retrieval #1080

@jeanlaurent

Description

@jeanlaurent

Summary

  • Context: The cagent pull command downloads an agent from an OCI registry and saves it to a local YAML file, using a content store as an intermediary.
  • Bug: The command stores artifacts under a normalized reference (without registry domain) but attempts to retrieve them using the original reference (with registry domain), causing a reference mismatch.
  • Actual vs. expected: When pulling docker.io/user/agent:latest, the artifact is stored as user/agent:latest but retrieved as docker.io/user/agent:latest, which fails with "reference not found".
  • Impact: The cagent pull command completely fails for any registry reference that includes a registry domain (e.g., docker.io/..., myregistry.io/..., registry.example.com/...), making it impossible to pull agents from explicitly specified registries.

Code with bug

In pkg/remote/pull.go, the artifact is stored with a normalized reference:

ref, err := name.ParseReference(registryRef)
if err != nil {
    return "", fmt.Errorf("parsing registry reference %s: %w", registryRef, err)
}

// ... pulling logic ...

localRef := ref.Context().RepositoryStr() + ":" + ref.Identifier()  // <-- Strips registry domain

// ... more code ...

digest, err := store.StoreArtifact(img, localRef)  // <-- BUG 🔴 Stores with normalized reference

In cmd/root/pull.go, the artifact is retrieved using the original reference:

store, err := content.NewStore()
if err != nil {
    return fmt.Errorf("failed to open content store: %w", err)
}
yamlFile, err := store.GetArtifact(registryRef)  // <-- BUG 🔴 Retrieves with original reference
if err != nil {
    return fmt.Errorf("failed to get agent yaml: %w", err)
}

Evidence

Example

Let's trace through what happens when a user runs cagent pull docker.io/myagent:v1.0:

  1. Storage phase (pkg/remote/pull.go:17-32):

    • registryRef = "docker.io/myagent:v1.0"
    • name.ParseReference() parses this into a structured reference
    • ref.Context().RepositoryStr() returns "myagent" (strips the registry domain)
    • ref.Identifier() returns "v1.0"
    • localRef = "myagent:v1.0"
    • Artifact is stored in content store with key "myagent:v1.0"
  2. Retrieval phase (cmd/root/pull.go:59):

    • registryRef = "docker.io/myagent:v1.0" (unchanged from user input)
    • Calls store.GetArtifact("docker.io/myagent:v1.0")
  3. Lookup phase (pkg/content/store.go:236-261):

    • resolveIdentifier("docker.io/myagent:v1.0") is called
    • Since the string contains :, it doesn't add :latest
    • resolveReference("docker.io/myagent:v1.0") tries to find this reference
    • Computes SHA256 hash of "docker.io/myagent:v1.0"
    • Looks for reference file with this hash
    • Fails because the stored reference was "myagent:v1.0", not "docker.io/myagent:v1.0"

Result: User gets error "reference docker.io/myagent:v1.0 not found"

Failing test

Test script

package root

import (
	"testing"

	"github.com/google/go-containerregistry/pkg/name"
)

// TestPullReferenceStorageRetrievalMismatch demonstrates that there's a mismatch between
// how remote.Pull stores references and how runPullCommand retrieves them.
//
// In remote.Pull (pkg/remote/pull.go:32), the artifact is stored with:
//   localRef := ref.Context().RepositoryStr() + ":" + ref.Identifier()
//
// But in runPullCommand (cmd/root/pull.go:59), it tries to retrieve with:
//   yamlFile, err := store.GetArtifact(registryRef)
//
// These don't match when the registryRef includes a registry domain.
func TestPullReferenceStorageRetrievalMismatch(t *testing.T) {
	testCases := []struct {
		name        string
		registryRef string
	}{
		{
			name:        "registry with namespace",
			registryRef: "myregistry.io/myagent:v1.0",
		},
		{
			name:        "docker hub with explicit registry",
			registryRef: "docker.io/library/myagent:latest",
		},
		{
			name:        "private registry with namespace",
			registryRef: "registry.example.com/namespace/myagent:v2.0",
		},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			// Simulate what remote.Pull does (pkg/remote/pull.go:17-32)
			ref, err := name.ParseReference(tc.registryRef)
			if err != nil {
				t.Fatalf("Failed to parse reference: %v", err)
			}

			// This is what gets stored in the content store
			localRef := ref.Context().RepositoryStr() + ":" + ref.Identifier()

			// This is what runPullCommand uses to retrieve (the original registryRef)
			retrievalRef := tc.registryRef

			// They should match, but they don't
			if localRef != retrievalRef {
				t.Errorf("Reference mismatch:\n  Stored as (localRef): %s\n  Retrieved as (registryRef): %s\nThis will cause GetArtifact to fail with 'reference not found'",
					localRef, retrievalRef)
			}
		})
	}
}

Test output

=== RUN   TestPullReferenceStorageRetrievalMismatch
=== RUN   TestPullReferenceStorageRetrievalMismatch/registry_with_namespace
    pull_reference_bug_test.go:54: Reference mismatch:
          Stored as (localRef): myagent:v1.0
          Retrieved as (registryRef): myregistry.io/myagent:v1.0
        This will cause GetArtifact to fail with 'reference not found'
=== RUN   TestPullReferenceStorageRetrievalMismatch/docker_hub_with_explicit_registry
    pull_reference_bug_test.go:54: Reference mismatch:
          Stored as (localRef): library/myagent:latest
          Retrieved as (registryRef): docker.io/library/myagent:latest
        This will cause GetArtifact to fail with 'reference not found'
=== RUN   TestPullReferenceStorageRetrievalMismatch/private_registry_with_namespace
    pull_reference_bug_test.go:54: Reference mismatch:
          Stored as (localRef): namespace/myagent:v2.0
          Retrieved as (registryRef): registry.example.com/namespace/myagent:v2.0
        This will cause GetArtifact to fail with 'reference not found'
--- FAIL: TestPullReferenceStorageRetrievalMismatch (0.00s)
    --- FAIL: TestPullReferenceStorageRetrievalMismatch/registry_with_namespace (0.00s)
    --- FAIL: TestPullReferenceStorageRetrievalMismatch/docker_hub_with_explicit_registry (0.00s)
    --- FAIL: TestPullReferenceStorageRetrievalMismatch/private_registry_with_namespace (0.00s)
FAIL
FAIL	github.com/docker/cagent/cmd/root	0.033s
FAIL

Inconsistency within the codebase

Reference code

pkg/remote/pull.go:32

localRef := ref.Context().RepositoryStr() + ":" + ref.Identifier()
// ...
digest, err := store.StoreArtifact(img, localRef)

Current code

cmd/root/pull.go:59

yamlFile, err := store.GetArtifact(registryRef)

Contradiction

The remote.Pull function normalizes the registry reference by stripping the registry domain using ref.Context().RepositoryStr(), which returns only the repository path without the registry. For example, docker.io/myagent:v1.0 becomes myagent:v1.0. This normalized reference is used to store the artifact in the content store.

However, cmd/root/pull.go attempts to retrieve the artifact using the original registryRef that still contains the registry domain. The content store's GetArtifact method computes a SHA256 hash of the reference string to look up the stored artifact, so docker.io/myagent:v1.0 and myagent:v1.0 produce completely different hashes and are treated as different references.

This mismatch means that any reference containing a registry domain will fail to be retrieved after being stored.

Full context

The cagent pull command is designed to download agent configurations from OCI registries (like Docker Hub or private registries) and save them as local YAML files. The workflow involves three components:

  1. cmd/root/pull.go: The CLI command handler that coordinates the pull operation
  2. pkg/remote/pull.go: The function that downloads the OCI image from the registry and stores it in the local content store
  3. pkg/content/store.go: The content store that manages locally cached artifacts using reference-based lookups

When a user runs cagent pull <registry-ref>, the following happens:

  1. cmd/root/pull.go:50 calls remote.Pull(ctx, registryRef, f.force, opts...)
  2. remote.Pull downloads the image and stores it in the content store with a normalized reference (line 32: localRef)
  3. Control returns to cmd/root/pull.go:59 which attempts to retrieve the artifact using store.GetArtifact(registryRef)
  4. The retrieval fails because the stored reference doesn't match the lookup reference

The same pattern is also used in pkg/config/sources.go:115-128 when resolving OCI sources for the cagent run command, meaning this bug affects multiple commands that work with OCI registries.

External documentation

ParseReference parses the string as a reference, either by tag or digest.
RepositoryStr returns the repository component of the Repository.

The RepositoryStr() method explicitly strips the registry domain and returns only the repository path. This is by design in the go-containerregistry library, as registries and repositories are separate concepts. However, for the content store lookup to work, both the store and retrieve operations must use the same reference format.

Why has this bug gone undetected?

This bug has gone undetected for several reasons:

  1. Recent introduction: The bug was introduced in commit fefeb245 on November 26, 2025, only about 2 weeks ago, when the code was refactored to add OCI source support.

  2. Documentation uses Docker Hub shorthand: The primary documentation example (cagent pull creek/pirate) uses Docker Hub's shorthand notation without an explicit registry domain. These references work correctly because:

    • creek/pirate is stored as creek/pirate:latest
    • It's retrieved as creek/pirate, which becomes creek/pirate:latest after the :latest is added in resolveIdentifier
    • The references match!
  3. Limited test coverage: The existing tests in pkg/remote/pull_test.go only use simple references without registry domains (e.g., pull-test:latest), which don't trigger the bug.

  4. Two-step workaround exists: Users who pull without a registry domain and then manually specify the domain might work around the issue without realizing it, though this is not the intended workflow.

  5. Error message is generic: When the bug occurs, the error is "reference X not found", which could be interpreted as a network issue or a typo in the reference name, rather than an internal storage/retrieval mismatch.

  6. Docker Hub is special-cased: The go-containerregistry library treats Docker Hub references specially. When you use creek/pirate, it automatically understands this as Docker Hub, but the RepositoryStr() doesn't include the registry. This means the common use case (Docker Hub with shorthand notation) happens to work, while the documented example (Docker Hub with explicit docker.io/) fails.

Recommended fix

The fix should use the same normalized reference format for both storage and retrieval. In cmd/root/pull.go:59, replace:

yamlFile, err := store.GetArtifact(registryRef)

With:

ref, err := name.ParseReference(registryRef)  // <-- FIX 🟢 Parse the reference
if err != nil {
    return fmt.Errorf("failed to parse reference: %w", err)
}
localRef := ref.Context().RepositoryStr() + ":" + ref.Identifier()  // <-- FIX 🟢 Normalize it
yamlFile, err := store.GetArtifact(localRef)  // <-- FIX 🟢 Use normalized reference

This ensures that the same normalization logic is applied for both storage and retrieval, making the reference formats consistent.

Related bugs

There is a separate but related bug in the filename generation on line 64 of cmd/root/pull.go:

agentName := strings.ReplaceAll(registryRef, "/", "_")
fileName := agentName + ".yaml"

This code only replaces forward slashes but not colons, resulting in filenames like registry.example.com_myagent:v1.0.yaml. The colon character is invalid in Windows filenames, causing the os.WriteFile call on line 67 to fail on Windows systems.

However, this filename bug is secondary to the reference mismatch bug documented here, because the reference mismatch causes the command to fail before it even reaches the filename generation code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions