10 KiB
10 KiB
Phase 3: Container Runtime Interface & Local Podman Management
- Goal: Abstract container management operations behind a
ContainerRuntime
interface and implement it using Podman CLI, enabling an agent to manage containers rootlessly based on (mocked) instructions. - RFC Sections Primarily Used: 6.1 (Runtime Interface Definition), 6.2 (Default Implementation: Podman), 6.3 (Rootless Execution Strategy).
Tasks & Sub-Tasks:
-
Define
ContainerRuntime
Go Interface (internal/runtime/interface.go
)- Purpose: Abstract all container operations (build, pull, run, stop, inspect, logs, etc.).
- Details: Transcribe the Go interface from RFC 6.1 precisely. Include all specified structs (
ImageSummary
,ContainerStatus
,BuildOptions
,PortMapping
,VolumeMount
,ResourceSpec
,ContainerCreateOptions
,ContainerHealthCheck
) and enums (ContainerState
,HealthState
). - Verification: Code compiles. Interface and type definitions match RFC.
-
Implement Podman Backend for
ContainerRuntime
(internal/runtime/podman.go
) - Core Lifecycle Methods- Purpose: Translate
ContainerRuntime
calls intopodman
CLI commands. - Details (for each method, focus on these first):
PullImage(ctx, imageName, platform)
:- Cmd:
podman pull {imageName}
(add--platform
if specified). - Parse output to get image ID (e.g., from
podman inspect {imageName} --format '{{.Id}}'
).
- Cmd:
CreateContainer(ctx, opts ContainerCreateOptions)
:- Cmd:
podman create ...
- Translate
ContainerCreateOptions
intopodman create
flags:--name {opts.InstanceID}
(KAT's unique ID for the instance).--hostname {opts.Hostname}
.--env
foropts.Env
.--label
foropts.Labels
(include KAT ownership labels likekat.dws.rip/workload-name
,kat.dws.rip/namespace
,kat.dws.rip/instance-id
).--restart {opts.RestartPolicy}
(map to Podman's "no", "on-failure", "always").- Resource mapping:
--cpus
(for quota),--cpu-shares
,--memory
. --publish
foropts.Ports
.--volume
foropts.Volumes
(source will be host path, destination is container path).--network {opts.NetworkName}
and--ip {opts.IPAddress}
if specified.--user {opts.User}
.--cap-add
,--cap-drop
,--security-opt
.- Podman native healthcheck flags from
opts.HealthCheck
. --systemd={opts.Systemd}
.
- Parse output for container ID.
- Cmd:
StartContainer(ctx, containerID)
: Cmd:podman start {containerID}
.StopContainer(ctx, containerID, timeoutSeconds)
: Cmd:podman stop -t {timeoutSeconds} {containerID}
.RemoveContainer(ctx, containerID, force, removeVolumes)
: Cmd:podman rm {containerID}
(add--force
,--volumes
).GetContainerStatus(ctx, containerOrName)
:- Cmd:
podman inspect {containerOrName}
. - Parse JSON output to populate
ContainerStatus
struct (State, ExitCode, StartedAt, FinishedAt, Health, ImageID, ImageName, OverlayIP if available from inspect). - Podman health status needs to be mapped to
HealthState
.
- Cmd:
StreamContainerLogs(ctx, containerID, follow, since, stdout, stderr)
:- Cmd:
podman logs {containerID}
(add--follow
,--since
). - Stream
os/exec.Cmd.Stdout
andos/exec.Cmd.Stderr
to the providedio.Writer
s.
- Cmd:
- Helper: A utility function to run
podman
commands as a specific rootless user (see Rootless Execution below). - Potential Challenges: Correctly mapping all
ContainerCreateOptions
to Podman flags. Parsing variedpodman inspect
output. Managingos/exec
for logs. Robust error handling from CLI output. - Verification:
- Unit tests for each implemented method, mocking
os/exec
calls to verify command construction and output parsing. - Requires Podman installed for integration-style unit tests: Tests that actually execute
podman
commands (e.g., pull alpine, create, start, inspect, stop, rm) and verify state changes.
- Unit tests for each implemented method, mocking
- Purpose: Translate
-
Implement Rootless Execution Strategy (
internal/runtime/podman.go
helpers,internal/agent/runtime.go
)- Purpose: Ensure containers are run by unprivileged users using systemd for supervision.
- Details:
- User Assumption: For Phase 3, assume the dedicated user (e.g.,
kat_wl_mywebapp
) already exists on the system andloginctl enable-linger <username>
has been run manually. The username could be passed inContainerCreateOptions.User
or derived. - Podman Command Execution Context:
- The
kat-agent
process itself might run as root or a privileged user. - When executing
podman
commands for a workload, it MUST run them as the target unprivileged user. - This can be achieved using
sudo -u {username} podman ...
or more directly viansenter
/setuid
if the agent has capabilities, or by settingXDG_RUNTIME_DIR
andDBUS_SESSION_BUS_ADDRESS
appropriately for the target user if invokingpodman
via systemd user session D-Bus API. Simplest for now might besudo -u {username} podman ...
if agent is root, or ensuring agent itself runs as a user who can switch to otherkat_wl_*
users. - The RFC prefers "systemd user sessions". This usually means
systemctl --user ...
. To control another user's systemd session, the agent process (if root) can usemachinectl shell {username}@.host /bin/bash -c "systemctl --user ..."
orsystemd-run --user --machine={username}@.host ...
. If the agent is not root, it cannot directly control other users' systemd sessions. This is a critical design point: how does the agent (potentially root) interact with user-level systemd? - RFC: "Agent uses
systemctl --user --machine={username}@.host ...
". This implies agent has permissions to do this (likely running as root or with specific polkit rules).
- The
- Systemd Unit Generation & Management:
- After
podman create ...
(or instead of direct create, ifpodman generate systemd
is used to create the definition), generate systemd unit:podman generate systemd --new --name {opts.InstanceID} --files --time 10 {imageNameUsedInCreate}
. This creates a{opts.InstanceID}.service
file. - The
ContainerRuntime
implementation needs to:- Execute
podman create
to establish the container definition (this allows Podman to manage its internal state for the container ID). - Execute
podman generate systemd --name {containerID}
(using the ID from create) to get the unit file content. - Place this unit file in the target user's systemd path (e.g.,
/home/{username}/.config/systemd/user/{opts.InstanceID}.service
or/etc/systemd/user/{opts.InstanceID}.service
if agent is root and wants to enable for any user). - Run
systemctl --user --machine={username}@.host daemon-reload
. - Start/Enable:
systemctl --user --machine={username}@.host enable --now {opts.InstanceID}.service
.
- Execute
- To stop:
systemctl --user --machine={username}@.host stop {opts.InstanceID}.service
. - To remove:
systemctl --user --machine={username}@.host disable {opts.InstanceID}.service
, thenpodman rm {opts.InstanceID}
, then remove the unit file. - Status:
systemctl --user --machine={username}@.host status {opts.InstanceID}.service
(parse output), or rely onpodman inspect
which should reflect systemd-managed state.
- After
- User Assumption: For Phase 3, assume the dedicated user (e.g.,
- Potential Challenges: Managing permissions for interacting with other users' systemd sessions. Correctly placing and cleaning up systemd unit files. Ensuring
XDG_RUNTIME_DIR
is set correctly for rootless Podman if not using systemd units for directpodman run
. Systemd unit generation nuances. - Verification:
- A test in
internal/agent/runtime_test.go
(or similar) can take mockContainerCreateOptions
. - It calls the (mocked or real)
ContainerRuntime
implementation. - Verify:
- Podman commands are constructed to run as the target unprivileged user.
- A systemd unit file is generated for the container.
systemctl --user --machine...
commands are invoked correctly to manage the service.- The container is actually started (verify with
podman ps -a --filter label=kat.dws.rip/instance-id={instanceID}
as the target user). - Logs can be retrieved.
- The container can be stopped and removed, including its systemd unit.
- A test in
- Milestone Verification:
- The
ContainerRuntime
Go interface is fully defined as per RFC 6.1. - The Podman implementation for core lifecycle methods (
PullImage
,CreateContainer
(leading to systemd unit generation),StartContainer
(via systemd enable/start),StopContainer
(via systemd stop),RemoveContainer
(via systemd disable + podman rm + unit file removal),GetContainerStatus
,StreamContainerLogs
) is functional. - An
internal/agent
test (or a temporarymain.go
test harness) can:- Define
ContainerCreateOptions
for a simple image likedocker.io/library/alpine
with a command likesleep 30
. - Specify a (manually pre-created and linger-enabled) unprivileged username.
- Call the
ContainerRuntime
methods. - Result:
- The alpine image is pulled (if not present).
- A systemd user service unit is generated and placed correctly for the specified user.
- The service is started using
systemctl --user --machine...
. podman ps --all --filter label=kat.dws.rip/instance-id=...
(run as the target user or by root seeing all containers) shows the container running or having run.- Logs can be retrieved using the
StreamContainerLogs
method. - The container can be stopped and removed (including its systemd unit file).
- Define
- All container operations are verifiably performed by the specified unprivileged user.
- The
This detailed plan should provide a clearer path for implementing these initial crucial phases. Remember to keep testing iterative and focused on the RFC specifications.