Phase 3: Container Runtime Interface & Local Podman Management

Goal: Abstract container management operations behind a ContainerRuntime interface and implement it using Podman CLI, enabling an agent to manage containers rootlessly based on (mocked) instructions.
RFC Sections Primarily Used: 6.1 (Runtime Interface Definition), 6.2 (Default Implementation: Podman), 6.3 (Rootless Execution Strategy).

Tasks & Sub-Tasks:

Define ContainerRuntime Go Interface (internal/runtime/interface.go)
- Purpose: Abstract all container operations (build, pull, run, stop, inspect, logs, etc.).
- Details: Transcribe the Go interface from RFC 6.1 precisely. Include all specified structs (ImageSummary, ContainerStatus, BuildOptions, PortMapping, VolumeMount, ResourceSpec, ContainerCreateOptions, ContainerHealthCheck) and enums (ContainerState, HealthState).
- Verification: Code compiles. Interface and type definitions match RFC.
Implement Podman Backend for ContainerRuntime (internal/runtime/podman.go) - Core Lifecycle Methods
- Purpose: Translate ContainerRuntime calls into podman CLI commands.
- Details (for each method, focus on these first):
  - PullImage(ctx, imageName, platform):
    - Cmd: podman pull {imageName} (add --platform if specified).
    - Parse output to get image ID (e.g., from podman inspect {imageName} --format '{{.Id}}').
  - CreateContainer(ctx, opts ContainerCreateOptions):
    - Cmd: podman create ...
    - Translate ContainerCreateOptions into podman create flags:
      - --name {opts.InstanceID} (KAT's unique ID for the instance).
      - --hostname {opts.Hostname}.
      - --env for opts.Env.
      - --label for opts.Labels (include KAT ownership labels like kat.dws.rip/workload-name, kat.dws.rip/namespace, kat.dws.rip/instance-id).
      - --restart {opts.RestartPolicy} (map to Podman's "no", "on-failure", "always").
      - Resource mapping: --cpus (for quota), --cpu-shares, --memory.
      - --publish for opts.Ports.
      - --volume for opts.Volumes (source will be host path, destination is container path).
      - --network {opts.NetworkName} and --ip {opts.IPAddress} if specified.
      - --user {opts.User}.
      - --cap-add, --cap-drop, --security-opt.
      - Podman native healthcheck flags from opts.HealthCheck.
      - --systemd={opts.Systemd}.
    - Parse output for container ID.
  - StartContainer(ctx, containerID): Cmd: podman start {containerID}.
  - StopContainer(ctx, containerID, timeoutSeconds): Cmd: podman stop -t {timeoutSeconds} {containerID}.
  - RemoveContainer(ctx, containerID, force, removeVolumes): Cmd: podman rm {containerID} (add --force, --volumes).
  - GetContainerStatus(ctx, containerOrName):
    - Cmd: podman inspect {containerOrName}.
    - Parse JSON output to populate ContainerStatus struct (State, ExitCode, StartedAt, FinishedAt, Health, ImageID, ImageName, OverlayIP if available from inspect).
    - Podman health status needs to be mapped to HealthState.
  - StreamContainerLogs(ctx, containerID, follow, since, stdout, stderr):
    - Cmd: podman logs {containerID} (add --follow, --since).
    - Stream os/exec.Cmd.Stdout and os/exec.Cmd.Stderr to the provided io.Writers.
- Helper: A utility function to run podman commands as a specific rootless user (see Rootless Execution below).
- Potential Challenges: Correctly mapping all ContainerCreateOptions to Podman flags. Parsing varied podman inspect output. Managing os/exec for logs. Robust error handling from CLI output.
- Verification:
  - Unit tests for each implemented method, mocking os/exec calls to verify command construction and output parsing.
  - Requires Podman installed for integration-style unit tests: Tests that actually execute podman commands (e.g., pull alpine, create, start, inspect, stop, rm) and verify state changes.
Implement Rootless Execution Strategy (internal/runtime/podman.go helpers, internal/agent/runtime.go)
- Purpose: Ensure containers are run by unprivileged users using systemd for supervision.
- Details:
  - User Assumption: For Phase 3, assume the dedicated user (e.g., kat_wl_mywebapp) already exists on the system and loginctl enable-linger <username> has been run manually. The username could be passed in ContainerCreateOptions.User or derived.
  - Podman Command Execution Context:
    - The kat-agent process itself might run as root or a privileged user.
    - When executing podman commands for a workload, it MUST run them as the target unprivileged user.
    - This can be achieved using sudo -u {username} podman ... or more directly via nsenter/setuid if the agent has capabilities, or by setting XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS appropriately for the target user if invoking podman via systemd user session D-Bus API. Simplest for now might be sudo -u {username} podman ... if agent is root, or ensuring agent itself runs as a user who can switch to other kat_wl_* users.
    - The RFC prefers "systemd user sessions". This usually means systemctl --user .... To control another user's systemd session, the agent process (if root) can use machinectl shell {username}@.host /bin/bash -c "systemctl --user ..." or systemd-run --user --machine={username}@.host .... If the agent is not root, it cannot directly control other users' systemd sessions. This is a critical design point: how does the agent (potentially root) interact with user-level systemd?
    - RFC: "Agent uses systemctl --user --machine={username}@.host ...". This implies agent has permissions to do this (likely running as root or with specific polkit rules).
  - Systemd Unit Generation & Management:
    - After podman create ... (or instead of direct create, if podman generate systemd is used to create the definition), generate systemd unit: podman generate systemd --new --name {opts.InstanceID} --files --time 10 {imageNameUsedInCreate}. This creates a {opts.InstanceID}.service file.
    - The ContainerRuntime implementation needs to:
      1. Execute podman create to establish the container definition (this allows Podman to manage its internal state for the container ID).
      2. Execute podman generate systemd --name {containerID} (using the ID from create) to get the unit file content.
      3. Place this unit file in the target user's systemd path (e.g., /home/{username}/.config/systemd/user/{opts.InstanceID}.service or /etc/systemd/user/{opts.InstanceID}.service if agent is root and wants to enable for any user).
      4. Run systemctl --user --machine={username}@.host daemon-reload.
      5. Start/Enable: systemctl --user --machine={username}@.host enable --now {opts.InstanceID}.service.
    - To stop: systemctl --user --machine={username}@.host stop {opts.InstanceID}.service.
    - To remove: systemctl --user --machine={username}@.host disable {opts.InstanceID}.service, then podman rm {opts.InstanceID}, then remove the unit file.
    - Status: systemctl --user --machine={username}@.host status {opts.InstanceID}.service (parse output), or rely on podman inspect which should reflect systemd-managed state.
- Potential Challenges: Managing permissions for interacting with other users' systemd sessions. Correctly placing and cleaning up systemd unit files. Ensuring XDG_RUNTIME_DIR is set correctly for rootless Podman if not using systemd units for direct podman run. Systemd unit generation nuances.
- Verification:
  - A test in internal/agent/runtime_test.go (or similar) can take mock ContainerCreateOptions.
  - It calls the (mocked or real) ContainerRuntime implementation.
  - Verify:
    - Podman commands are constructed to run as the target unprivileged user.
    - A systemd unit file is generated for the container.
    - systemctl --user --machine... commands are invoked correctly to manage the service.
    - The container is actually started (verify with podman ps -a --filter label=kat.dws.rip/instance-id={instanceID} as the target user).
    - Logs can be retrieved.
    - The container can be stopped and removed, including its systemd unit.

Milestone Verification:
- The ContainerRuntime Go interface is fully defined as per RFC 6.1.
- The Podman implementation for core lifecycle methods (PullImage, CreateContainer (leading to systemd unit generation), StartContainer (via systemd enable/start), StopContainer (via systemd stop), RemoveContainer (via systemd disable + podman rm + unit file removal), GetContainerStatus, StreamContainerLogs) is functional.
- An internal/agent test (or a temporary main.go test harness) can:
  1. Define ContainerCreateOptions for a simple image like docker.io/library/alpine with a command like sleep 30.
  2. Specify a (manually pre-created and linger-enabled) unprivileged username.
  3. Call the ContainerRuntime methods.
  4. Result:
    - The alpine image is pulled (if not present).
    - A systemd user service unit is generated and placed correctly for the specified user.
    - The service is started using systemctl --user --machine....
    - podman ps --all --filter label=kat.dws.rip/instance-id=... (run as the target user or by root seeing all containers) shows the container running or having run.
    - Logs can be retrieved using the StreamContainerLogs method.
    - The container can be stopped and removed (including its systemd unit file).
- All container operations are verifiably performed by the specified unprivileged user.

This detailed plan should provide a clearer path for implementing these initial crucial phases. Remember to keep testing iterative and focused on the RFC specifications.

10 KiB Raw Blame History

Phase 3: Container Runtime Interface & Local Podman Management

10 KiB

Raw Blame History