feat(podman): deploy workspaces as pods with proxy sidecar#247
feat(podman): deploy workspaces as pods with proxy sidecar#247feloy wants to merge 5 commits intoopenkaiden:mainfrom
Conversation
Implements openkaiden#245. Replaces the single-container model with a Podman pod containing two containers: a workspace container (kdn-<name>-workspace) and a Squid proxy sidecar (kdn-<name>-proxy). The sidecar enforces network isolation by setting an nftables OUTPUT policy of drop with CAP_NET_ADMIN, allowing only loopback and squid-user traffic. The workspace container routes HTTP/HTTPS through the proxy via injected env vars, which cannot be bypassed even if unset. RuntimeInfo.ID changes from the container ID to the pod name (kdn-<name>). Pod lifecycle (start, stop, rm, inspect) replaces the former container-level commands throughout. Co-Authored-By: Claude Code (claude-sonnet-4-6) <noreply@anthropic.com> Signed-off-by: Philippe Martin <phmartin@redhat.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
📝 WalkthroughWalkthroughPodman runtime was changed from single-container workspaces to pod-based workspaces with a Squid proxy sidecar and nftables-based network isolation; create/start/stop/remove/info/terminal flows, docs, and tests were updated to manage pods and the proxy alongside the workspace container. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/runtime/podman/create_test.go`:
- Around line 419-455: The test currently uses strings.Contains(argsStr,
imageName) which falsely passes because imageName (set from podN) also appears
in the --pod flag; update the assertions to ensure the container image is
checked exactly by either (1) using a distinct imageName value (not derived from
podN) so string search is unambiguous, or (2) asserting against the args slice
element that represents the image/command position (e.g., verify
args[len(args)-2] or the specific args index contains the exact imageName and
args[len(args)-1] equals the command) rather than searching the entire argsStr;
adjust both occurrences that reference imageName (variables args, argsStr,
imageName, podN) so the test fails if the final image argument is missing or
different.
In `@pkg/runtime/podman/create.go`:
- Around line 277-304: The current sequence in create.go can leave a pod and
containers orphaned if createPod succeeds but subsequent p.createContainer calls
fail; add a rollback pattern: after p.createPod(ctx, podN) succeeds, register a
deferred cleanup closure (e.g., using a local rollback slice or helper such as
cleanupCreatedResources) that will delete any created containers and the pod if
an error occurs, and cancel/clear that defer when the full creation completes
successfully; track created items by pushing proxyContainerName(podN) and
workspaceContainerName(podN) as each p.createContainer call succeeds (use the
same names used by buildProxyContainerArgs and buildWorkspaceContainerArgs) and
ensure the defer invokes p.removeContainer / p.removePod (or existing delete
methods) only on non-nil error to perform rollback.
In `@pkg/runtime/podman/proxy.go`:
- Around line 67-75: The nftables rules currently use the IPv4-only family "ip"
(lines with "nft add table ip filter", "nft add chain ip filter output", and
rules like "nft add rule ip filter output ... meta skuid squid accept"), so IPv6
traffic bypasses proxying; update the rule generation in proxy.go to either
create equivalent "ip6" rules for each "ip" rule or switch to the "inet" family
so both IPv4/IPv6 are covered, and if you choose "inet" add the necessary
ICMPv6/neighbor-discovery allowances; alternatively ensure pod creation disables
IPv6 if dual-stack support is not desired—apply the change where the
strings/commands for nft are built (look for the functions or variables that
emit the "nft add table/chain/rule" commands).
In `@pkg/runtime/podman/remove.go`:
- Around line 83-88: The code currently treats any error containing "failed to
inspect pod" as a not-found condition, which hides permission/runtime errors; in
the function used by Remove() (and coming from getPodInfo()), remove the broad
strings.Contains(errMsg, "failed to inspect pod") check and only match concrete
not-found variants (e.g., "no such container", "no such pod", "pod not found",
"no such object", "error getting container" or explicit "not found" patterns).
Ensure Remove() relies on these concrete matches so genuine inspect failures
bubble up instead of being treated as success.
In `@skills/working-with-runtime-system/SKILL.md`:
- Around line 133-140: The example implementation for podmanRuntime uses the
wrong argument order when calling Terminal; update all example calls (including
inside func (p *podmanRuntime) Terminal and any other examples) to match the
interface signature Terminal(ctx, instanceID, agent, command) by swapping the
currently passed agent and instanceID arguments so instanceID comes before agent
and ensure the ctx and command positions remain correct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2158c63a-2bd7-44f1-88cd-96fc72931dd1
📒 Files selected for processing (17)
README.mdpkg/runtime/podman/create.gopkg/runtime/podman/create_test.gopkg/runtime/podman/info.gopkg/runtime/podman/info_test.gopkg/runtime/podman/proxy.gopkg/runtime/podman/proxy_test.gopkg/runtime/podman/remove.gopkg/runtime/podman/remove_test.gopkg/runtime/podman/start.gopkg/runtime/podman/start_test.gopkg/runtime/podman/stop.gopkg/runtime/podman/stop_test.gopkg/runtime/podman/terminal.gopkg/runtime/podman/terminal_test.gopkg/runtime/runtime.goskills/working-with-runtime-system/SKILL.md
- isNotFoundError: remove broad "failed to inspect pod" match that silently swallowed permission/runtime errors; use case-insensitive comparison of concrete not-found variants only - Proxy start script: add ip6 filter rules mirroring the ip rules so IPv6 traffic cannot bypass the proxy on dual-stack pods - create_test.go: assert image name against exact args slice position to avoid false positives from the --pod flag value containing podN - SKILL.md: fix Terminal example argument order to match the interface signature (instanceID before agent) Co-Authored-By: Claude Code (claude-sonnet-4-6) <noreply@anthropic.com> Signed-off-by: Philippe Martin <phmartin@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/runtime/podman/remove.go (1)
67-74: Function nameremoveContaineris inconsistent with its purpose.The function name says "Container" but the comment, implementation (
pod rm), and error message all reference "pod". Consider renaming toremovePodfor clarity.♻️ Suggested rename
-// removeContainer removes a podman pod by ID. -func (p *podmanRuntime) removeContainer(ctx context.Context, id string) error { +// removePod removes a podman pod by ID. +func (p *podmanRuntime) removePod(ctx context.Context, id string) error { l := logger.FromContext(ctx) if err := p.executor.Run(ctx, l.Stdout(), l.Stderr(), "pod", "rm", id); err != nil { return fmt.Errorf("failed to remove pod: %w", err) } return nil }Also update the call site at line 59:
- if err := p.removeContainer(ctx, id); err != nil { + if err := p.removePod(ctx, id); err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/runtime/podman/remove.go` around lines 67 - 74, The function removeContainer is misnamed for a pod deletion routine: rename the method removeContainer to removePod (and update its comment to "removePod removes a podman pod by ID"), keep the implementation that calls p.executor.Run(..., "pod", "rm", id) and the error message but change any fmt.Errorf text if it still says "pod" to remain consistent, and update all call sites that reference podmanRuntime.removeContainer to call removePod instead (including tests and other package usages).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/working-with-runtime-system/SKILL.md`:
- Around line 145-155: The Podman example must validate that the loaded agent
actually defines a terminal command before assigning it to command; update the
snippet around the p.config.LoadAgent call to check if
len(agentConfig.TerminalCommand) == 0 and return a descriptive error (using
runtime.ErrInvalidParams like the original) instead of assigning an empty value,
so the documented flow matches pkg/runtime/podman/terminal.go and prevents
calling podman exec with no command.
---
Nitpick comments:
In `@pkg/runtime/podman/remove.go`:
- Around line 67-74: The function removeContainer is misnamed for a pod deletion
routine: rename the method removeContainer to removePod (and update its comment
to "removePod removes a podman pod by ID"), keep the implementation that calls
p.executor.Run(..., "pod", "rm", id) and the error message but change any
fmt.Errorf text if it still says "pod" to remain consistent, and update all call
sites that reference podmanRuntime.removeContainer to call removePod instead
(including tests and other package usages).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 59eca1c7-059c-48ba-b76a-44e6e9945ba6
📒 Files selected for processing (6)
pkg/runtime/podman/create_test.gopkg/runtime/podman/proxy.gopkg/runtime/podman/proxy_test.gopkg/runtime/podman/remove.gopkg/runtime/podman/remove_test.goskills/working-with-runtime-system/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/runtime/podman/remove_test.go
- pkg/runtime/podman/proxy.go
- pkg/runtime/podman/proxy_test.go
- pkg/runtime/podman/create_test.go
If proxy or workspace container creation failed after the pod was already created, the pod was left dangling. Introduce a cleanupOnError func() that is updated at each step and deferred so that pod rm is called on any error path after pod creation. Also add tests covering both failure scenarios and the success path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Philippe Martin <phmartin@redhat.com>
removeContainer, startContainer, and stopContainer all operate on pods, not containers. Rename them to removePod, startPod, and stopPod to match what they actually do. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Philippe Martin <phmartin@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/runtime/podman/create.go (1)
140-185:⚠️ Potential issue | 🟠 MajorResolve
SourcePathto an absolute host path before building the bind mount.
validateCreateParams()accepts relative paths, but this helper passesparams.SourcePathstraight into-v. That makes the mount depend on the caller’s current working directory and violates the repo’s host-path rule. Normalize it once withfilepath.Abs()before constructing the Podman args.♻️ Suggested fix
func (p *podmanRuntime) buildWorkspaceContainerArgs(params runtime.CreateParams, podN, containerName, imageName string) ([]string, error) { args := []string{"create", "--pod", podN, "--name", containerName} // Proxy env vars are prepended so they take effect before user-supplied vars. args = append(args, proxyEnvVars...) + + sourcePath, err := filepath.Abs(params.SourcePath) + if err != nil { + return nil, fmt.Errorf("failed to resolve source path: %w", err) + } // Add environment variables from workspace config if params.WorkspaceConfig != nil && params.WorkspaceConfig.Environment != nil { for _, env := range *params.WorkspaceConfig.Environment { if env.Value != nil { @@ // Mount the source directory at /workspace/sources // This allows symlinks to work correctly with dependencies - args = append(args, "-v", fmt.Sprintf("%s:/workspace/sources:Z", params.SourcePath)) + args = append(args, "-v", fmt.Sprintf("%s:/workspace/sources:Z", sourcePath))As per coding guidelines
**/*.go: "Usefilepath.Join()for host paths ... Convert relative paths to absolute withfilepath.Abs()."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/runtime/podman/create.go` around lines 140 - 185, In buildWorkspaceContainerArgs, resolve params.SourcePath to an absolute host path with filepath.Abs and use that absolute path for the bind mounts instead of the raw params.SourcePath; call filepath.Abs at the top of the function, return an error if it fails, then pass absSourcePath into the "-v" mount for /workspace/sources and into mountVolumeArg(...) when constructing additional mounts so mounts no longer depend on the caller's cwd.
🧹 Nitpick comments (1)
pkg/runtime/podman/create_test.go (1)
685-775: Assert the returnedRuntimeInfoon the success path.This PR changes the public return contract from container-based IDs to pod-based IDs, but the success test only checks StepLogger calls. If
Create()accidentally went back to returning a container ID, this test would still pass.♻️ Suggested assertions
- _, err := p.Create(ctx, params) + info, err := p.Create(ctx, params) if err != nil { t.Fatalf("Create() failed: %v", err) } + + if info.ID != "kdn-test-workspace" { + t.Fatalf("expected pod ID %q, got %q", "kdn-test-workspace", info.ID) + } + if info.Info["workspace_container"] != "kdn-test-workspace-workspace" { + t.Fatalf("expected workspace container metadata to be populated, got %#v", info.Info) + } + if info.Info["proxy_container"] != "kdn-test-workspace-proxy" { + t.Fatalf("expected proxy container metadata to be populated, got %#v", info.Info) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/runtime/podman/create_test.go` around lines 685 - 775, The test TestCreate_StepLogger_Success only validated logging; update it to assert the returned runtime info from podmanRuntime.Create uses the pod-based ID and not a container ID: capture the return (e.g. ri, err := p.Create(ctx, params)), assert err == nil, assert ri.PodID == "container-id-123" (or the expected pod ID returned by fakeExec.Output), and assert ri.ContainerID is empty or not set; reference the Create method on podmanRuntime and the runtime.CreateParams / runtime.RuntimeInfo types when making these assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/runtime/podman/create.go`:
- Around line 317-330: Create() sets RuntimeInfo.Info with keys proxy_container,
image_name, source_path, and agent but getPodInfo() repopulates only pod_name
and workspace_container, causing metadata loss on refresh; update getPodInfo()
to return the same key set as Create() (pod_name, workspace_container,
proxy_container, image_name, source_path, agent) by extracting those values from
the pod/container metadata the function already reads (ensure functions
getPodInfo and any helpers return/merge these fields into
runtime.RuntimeInfo.Info) or alternatively remove the extra keys from Create()
so both functions produce the identical subset; modify getPodInfo() (and callers
expecting RuntimeInfo.Info) to preserve stability across Create() and Info().
- Around line 277-289: The rollback uses context.Background() which drops
deadlines and can hang on pod removal; change the cleanupOnError closure to call
p.removePod with a bounded context (use context.WithTimeout in the closure, e.g.
a short timeout like 5s), defer the cancel, and ignore the error as before;
update the closure assigned to cleanupOnError (the one that currently calls
p.removePod(context.Background(), podN)) so it creates a timed context, calls
p.removePod(ctx, podN), cancels the timeout, and disregards the returned error.
---
Outside diff comments:
In `@pkg/runtime/podman/create.go`:
- Around line 140-185: In buildWorkspaceContainerArgs, resolve params.SourcePath
to an absolute host path with filepath.Abs and use that absolute path for the
bind mounts instead of the raw params.SourcePath; call filepath.Abs at the top
of the function, return an error if it fails, then pass absSourcePath into the
"-v" mount for /workspace/sources and into mountVolumeArg(...) when constructing
additional mounts so mounts no longer depend on the caller's cwd.
---
Nitpick comments:
In `@pkg/runtime/podman/create_test.go`:
- Around line 685-775: The test TestCreate_StepLogger_Success only validated
logging; update it to assert the returned runtime info from podmanRuntime.Create
uses the pod-based ID and not a container ID: capture the return (e.g. ri, err
:= p.Create(ctx, params)), assert err == nil, assert ri.PodID ==
"container-id-123" (or the expected pod ID returned by fakeExec.Output), and
assert ri.ContainerID is empty or not set; reference the Create method on
podmanRuntime and the runtime.CreateParams / runtime.RuntimeInfo types when
making these assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 93569e70-4415-4da3-9456-83247edd440a
📒 Files selected for processing (5)
pkg/runtime/podman/create.gopkg/runtime/podman/create_test.gopkg/runtime/podman/remove.gopkg/runtime/podman/start.gopkg/runtime/podman/stop.go
✅ Files skipped from review due to trivial changes (1)
- pkg/runtime/podman/stop.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/runtime/podman/start.go
- pkg/runtime/podman/remove.go
Create() set proxy_container and image_name that getPodInfo() did not, causing metadata loss whenever Info() or Start() refreshed the runtime info. Add the two derivable fields to getPodInfo() and drop source_path and agent from Create(), as those belong in the instance record rather than in pod-level runtime info. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Philippe Martin <phmartin@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/runtime/podman/info.go`:
- Around line 43-50: Info() and related methods (Start, Stop, Remove, Terminal)
now assume the passed id is a pod name, which breaks legacy instance records
that store container IDs in RuntimeData.InstanceID; add a container→pod fallback
in getPodInfo(): first attempt the existing pod lookup by id, and if that fails,
inspect the id as a container (using the Podman container inspect API) to
extract its Pod/PodID and then re-resolve the pod info by that PodID/name;
update getPodInfo() to return a clear error if neither pod nor container→pod
resolution succeeds so legacy container IDs are handled transparently without
requiring a separate migration step.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 31d3db23-ce84-493c-ac1f-00627dd9f7fe
📒 Files selected for processing (3)
pkg/runtime/podman/create.gopkg/runtime/podman/info.gopkg/runtime/podman/info_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/runtime/podman/create.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/runtime/podman/info_test.go
Implements #245. Replaces the single-container model with a Podman pod containing two containers: a workspace container (
kdn-<name>-workspace) and a Squid proxy sidecar (kdn-<name>-proxy). The sidecar enforces network isolation by setting an nftables OUTPUT policy of drop with CAP_NET_ADMIN, allowing only loopback and squid-user traffic. The workspace container routes HTTP/HTTPS through the proxy via injected env vars, which cannot be bypassed even if unset.RuntimeInfo.ID changes from the container ID to the pod name (kdn-). Pod lifecycle (start, stop, rm, inspect) replaces the former container-level commands throughout.
Fixes #245
The proxy container is based on fedora image. This is not optimal, but IMHO make it first work with all features (filtering, secret injection), then find the best image to use
In the proxy container, you can see in
/var/log/squid/access.logthe requests done by the agent.BREAKING CHANGE: remove all previous workspaces as they won't be correctly supported with this new version