Init Docs
This commit is contained in:
134
docs/plan/filestructure.md
Normal file
134
docs/plan/filestructure.md
Normal file
@ -0,0 +1,134 @@
|
||||
# Directory/File Structure
|
||||
|
||||
This structure assumes a Go-based project, as hinted by the Go interface definitions in the RFC.
|
||||
|
||||
```
|
||||
kat-system/
|
||||
├── README.md # Project overview, build instructions, contribution guide
|
||||
├── LICENSE # Project license (e.g., Apache 2.0, MIT)
|
||||
├── go.mod # Go modules definition
|
||||
├── go.sum # Go modules checksums
|
||||
├── Makefile # Build, test, lint, generate code, etc.
|
||||
│
|
||||
├── api/
|
||||
│ └── v1alpha1/
|
||||
│ ├── kat.proto # Protocol Buffer definitions for all KAT resources (Workload, Node, etc.)
|
||||
│ └── generated/ # Generated Go code from .proto files (e.g., using protoc-gen-go)
|
||||
│ # Potentially OpenAPI/Swagger specs generated from protos too.
|
||||
│
|
||||
├── cmd/
|
||||
│ ├── kat-agent/
|
||||
│ │ └── main.go # Entrypoint for the kat-agent binary
|
||||
│ └── katcall/
|
||||
│ └── main.go # Entrypoint for the katcall CLI binary
|
||||
│
|
||||
├── internal/
|
||||
│ ├── agent/
|
||||
│ │ ├── agent.go # Core agent logic, heartbeating, command processing
|
||||
│ │ ├── runtime.go # Interface with ContainerRuntime (Podman)
|
||||
│ │ ├── build.go # Git-native build process logic
|
||||
│ │ └── dns_resolver.go # Embedded DNS server logic
|
||||
│ │
|
||||
│ ├── leader/
|
||||
│ │ ├── leader.go # Core leader logic, reconciliation loops
|
||||
│ │ ├── schedule.go # Scheduling algorithm implementation
|
||||
│ │ ├── ipam.go # IP Address Management logic
|
||||
│ │ ├── state_backup.go # etcd backup logic
|
||||
│ │ └── api_handler.go # HTTP API request handlers (connects to api/v1alpha1)
|
||||
│ │
|
||||
│ ├── api/ # Server-side API implementation details
|
||||
│ │ ├── server.go # HTTP server setup, middleware (auth, logging)
|
||||
│ │ ├── router.go # API route definitions
|
||||
│ │ └── auth.go # Authentication (mTLS, Bearer token) logic
|
||||
│ │
|
||||
│ ├── cli/
|
||||
│ │ ├── commands/ # Subdirectories for each katcall command (apply, get, logs, etc.)
|
||||
│ │ │ ├── apply.go
|
||||
│ │ │ └── ...
|
||||
│ │ ├── client.go # HTTP client for interacting with KAT API
|
||||
│ │ └── utils.go # CLI helper functions
|
||||
│ │
|
||||
│ ├── config/
|
||||
│ │ ├── types.go # Go structs for Quadlet file kinds if not directly from proto
|
||||
│ │ ├── parse.go # Logic for parsing and validating *.kat files (Quadlets, cluster.kat)
|
||||
│ │ └── defaults.go # Default values for configurations
|
||||
│ │
|
||||
│ ├── store/
|
||||
│ │ ├── interface.go # Definition of StateStore interface (as in RFC 5.1)
|
||||
│ │ └── etcd.go # etcd implementation of StateStore, embedded etcd setup
|
||||
│ │
|
||||
│ ├── runtime/
|
||||
│ │ ├── interface.go # Definition of ContainerRuntime interface (as in RFC 6.1)
|
||||
│ │ └── podman.go # Podman implementation of ContainerRuntime
|
||||
│ │
|
||||
│ ├── network/
|
||||
│ │ ├── wireguard.go # WireGuard setup and peer management logic
|
||||
│ │ └── types.go # Network related internal types
|
||||
│ │
|
||||
│ ├── pki/
|
||||
│ │ ├── ca.go # Certificate Authority management (generation, signing)
|
||||
│ │ └── certs.go # Certificate generation and handling utilities
|
||||
│ │
|
||||
│ ├── observability/
|
||||
│ │ ├── logging.go # Logging setup for components
|
||||
│ │ ├── metrics.go # Metrics collection and exposure logic
|
||||
│ │ └── events.go # Event recording and retrieval logic
|
||||
│ │
|
||||
│ ├── types/ # Core internal data structures if not covered by API protos
|
||||
│ │ ├── node.go
|
||||
│ │ ├── workload.go
|
||||
│ │ └── ...
|
||||
│ │
|
||||
│ ├── constants/
|
||||
│ │ └── constants.go # Global constants (etcd key prefixes, default ports, etc.)
|
||||
│ │
|
||||
│ └── utils/
|
||||
│ ├── utils.go # Common utility functions (error handling, string manipulation)
|
||||
│ └── tar.go # Utilities for handling tar.gz Quadlet archives
|
||||
│
|
||||
├── docs/
|
||||
│ ├── rfc/
|
||||
│ │ └── RFC001-KAT.md # The source RFC document
|
||||
│ ├── user-guide/ # User documentation (installation, getting started, tutorials)
|
||||
│ │ ├── installation.md
|
||||
│ │ └── basic_usage.md
|
||||
│ └── api-guide/ # API usage documentation (perhaps generated)
|
||||
│
|
||||
├── examples/
|
||||
│ ├── simple-service/ # Example Quadlet for a simple service
|
||||
│ │ ├── workload.kat
|
||||
│ │ └── VirtualLoadBalancer.kat
|
||||
│ ├── git-build-service/ # Example Quadlet for a service built from Git
|
||||
│ │ ├── workload.kat
|
||||
│ │ └── build.kat
|
||||
│ ├── job/ # Example Quadlet for a Job
|
||||
│ │ ├── workload.kat
|
||||
│ │ └── job.kat
|
||||
│ └── cluster.kat # Example cluster configuration file
|
||||
│
|
||||
├── scripts/
|
||||
│ ├── setup-dev-env.sh # Script to set up development environment
|
||||
│ ├── lint.sh # Code linting script
|
||||
│ ├── test.sh # Script to run all tests
|
||||
│ └── gen-proto.sh # Script to generate Go code from .proto files
|
||||
│
|
||||
└── test/
|
||||
├── unit/ # Unit tests (mirroring internal/ structure)
|
||||
├── integration/ # Integration tests (e.g., agent-leader interaction)
|
||||
└── e2e/ # End-to-end tests (testing full cluster operations via katcall)
|
||||
├── fixtures/ # Test Quadlet files
|
||||
└── e2e_test.go
|
||||
```
|
||||
|
||||
**Description of Key Files/Directories and Relationships:**
|
||||
|
||||
* **`api/v1alpha1/kat.proto`**: The source of truth for all resource definitions. `make generate` (or `scripts/gen-proto.sh`) would convert this into Go structs in `api/v1alpha1/generated/`. These structs will be used across the `internal/` packages.
|
||||
* **`cmd/kat-agent/main.go`**: Initializes and runs the `kat-agent`. It will instantiate components from `internal/store` (for etcd), `internal/agent`, `internal/leader`, `internal/pki`, `internal/network`, and `internal/api` (for the API server if elected leader).
|
||||
* **`cmd/katcall/main.go`**: Entry point for the CLI. It uses `internal/cli` components to parse commands and interact with the KAT API via `internal/cli/client.go`.
|
||||
* **`internal/config/parse.go`**: Used by the Leader to parse submitted Quadlet `tar.gz` archives and by `kat-agent init` to parse `cluster.kat`.
|
||||
* **`internal/store/etcd.go`**: Implements `StateStore` and manages the embedded etcd instance. Used by both Agent (for watching) and Leader (for all state modifications, leader election).
|
||||
* **`internal/runtime/podman.go`**: Implements `ContainerRuntime`. Used by `internal/agent/runtime.go` to manage containers based on Podman.
|
||||
* **`internal/agent/agent.go`** and **`internal/leader/leader.go`**: Contain the core state machines and logic for the respective roles. The `kat-agent` binary decides which role's logic to activate based on leader election status.
|
||||
* **`internal/pki/ca.go`**: Used by `kat-agent init` to create the CA, and by the Leader to sign CSRs from joining agents.
|
||||
* **`internal/network/wireguard.go`**: Used by agents to configure their local WireGuard interface based on data synced from etcd (managed by the Leader).
|
||||
* **`internal/leader/api_handler.go`**: Implements the HTTP handlers for the API, using other leader components (scheduler, IPAM, store) to fulfill requests.
|
183
docs/plan/overview.md
Normal file
183
docs/plan/overview.md
Normal file
@ -0,0 +1,183 @@
|
||||
# Implementation Plan
|
||||
|
||||
This plan breaks down the implementation into manageable phases, each with a testable milestone.
|
||||
|
||||
**Phase 0: Project Setup & Core Types**
|
||||
* **Goal**: Basic project structure, version control, build system, and core data type definitions.
|
||||
* **Tasks**:
|
||||
1. Initialize Git repository, `go.mod`.
|
||||
2. Create initial directory structure (as above).
|
||||
3. Define core Proto3 messages in `api/v1alpha1/kat.proto` for: `Workload`, `VirtualLoadBalancer`, `JobDefinition`, `BuildDefinition`, `Namespace`, `Node` (internal representation), `ClusterConfiguration`.
|
||||
4. Set up `scripts/gen-proto.sh` and generate initial Go types.
|
||||
5. Implement parsing and basic validation for `cluster.kat` (`internal/config/parse.go`).
|
||||
6. Implement parsing and basic validation for Quadlet files (`workload.kat`, etc.) and their `tar.gz` packaging/unpackaging.
|
||||
* **Milestone**:
|
||||
* `make generate` successfully creates Go types from protos.
|
||||
* Unit tests pass for parsing `cluster.kat` and a sample Quadlet directory (as `tar.gz`) into their respective Go structs.
|
||||
|
||||
**Phase 1: State Management & Leader Election**
|
||||
* **Goal**: A functional embedded etcd and leader election mechanism.
|
||||
* **Tasks**:
|
||||
1. Implement the `StateStore` interface (RFC 5.1) with an etcd backend (`internal/store/etcd.go`).
|
||||
2. Integrate embedded etcd server into `kat-agent` (RFC 2.2, 5.2), configurable via `cluster.kat` parameters.
|
||||
3. Implement leader election using `go.etcd.io/etcd/client/v3/concurrency` (RFC 5.3).
|
||||
4. Basic `kat-agent init` functionality:
|
||||
* Parse `cluster.kat`.
|
||||
* Start single-node embedded etcd.
|
||||
* Campaign for and become leader.
|
||||
* Store initial cluster configuration (UID, CIDRs from `cluster.kat`) in etcd.
|
||||
* **Milestone**:
|
||||
* A single `kat-agent init --config cluster.kat` process starts, initializes etcd, and logs that it has become the leader.
|
||||
* The cluster configuration from `cluster.kat` can be verified in etcd using an etcd client.
|
||||
* `StateStore` interface methods (`Put`, `Get`, `Delete`, `List`) are testable against the embedded etcd.
|
||||
|
||||
**Phase 2: Basic Agent & Node Lifecycle (Init, Join, PKI)**
|
||||
* **Goal**: Initial Leader setup, a second Agent joining with mTLS, and heartbeating.
|
||||
* **Tasks**:
|
||||
1. Implement Internal PKI (RFC 10.6) in `internal/pki/`:
|
||||
* CA key/cert generation on `kat-agent init`.
|
||||
* CSR generation by agent on join.
|
||||
* CSR signing by Leader.
|
||||
2. Implement initial Node Communication Protocol (RFC 2.3) for join:
|
||||
* Agent (`kat-agent join --leader-api <...> --advertise-address <...>`) sends CSR to Leader.
|
||||
* Leader validates, signs, returns certs & CA. Stores node registration (name, UID, advertise addr, WG pubkey placeholder) in etcd.
|
||||
3. Implement basic mTLS for this join communication.
|
||||
4. Implement Node Heartbeat (`POST /v1alpha1/nodes/{nodeName}/status`) from Agent to Leader (RFC 4.1.3). Leader updates node status in etcd.
|
||||
5. Leader implements basic failure detection (marks Node `NotReady` in etcd if heartbeats cease) (RFC 4.1.4).
|
||||
* **Milestone**:
|
||||
* `kat-agent init` establishes a Leader with a CA.
|
||||
* `kat-agent join` allows a second agent to securely register with the Leader, obtain certificates, and store its info in etcd.
|
||||
* Leader's API receives heartbeats from the joined Agent.
|
||||
* If a joined Agent is stopped, the Leader marks its status as `NotReady` in etcd after `nodeLossTimeoutSeconds`.
|
||||
|
||||
**Phase 3: Container Runtime Interface & Local Podman Management**
|
||||
* **Goal**: Agent can manage containers locally via Podman using the CRI.
|
||||
* **Tasks**:
|
||||
1. Define `ContainerRuntime` interface in `internal/runtime/interface.go` (RFC 6.1).
|
||||
2. Implement the Podman backend for `ContainerRuntime` in `internal/runtime/podman.go` (RFC 6.2). Focus on: `CreateContainer`, `StartContainer`, `StopContainer`, `RemoveContainer`, `GetContainerStatus`, `PullImage`, `StreamContainerLogs`.
|
||||
3. Implement rootless execution strategy (RFC 6.3):
|
||||
* Mechanism to ensure dedicated user accounts (initially, assume pre-existing or manual creation for tests).
|
||||
* Podman systemd unit generation (`podman generate systemd`).
|
||||
* Managing units via `systemctl --user`.
|
||||
* **Milestone**:
|
||||
* Agent process (upon a mocked internal command) can pull a specified image (e.g., `nginx`) and run it rootlessly using Podman and systemd user services.
|
||||
* Agent can stop, remove, and get the status/logs of this container.
|
||||
* All operations are performed via the `ContainerRuntime` interface.
|
||||
|
||||
**Phase 4: Basic Workload Deployment (Single Node, Image Source Only, No Networking)**
|
||||
* **Goal**: Leader can instruct an Agent to run a simple `Service` workload (single container, image source) on itself (if leader is also an agent) or a single joined agent.
|
||||
* **Tasks**:
|
||||
1. Implement basic API endpoints on Leader for Workload CRUD (`POST/PUT /v1alpha1/n/{ns}/workloads` accepting `tar.gz`) (RFC 8.3, 4.2). Leader stores Quadlet files in etcd.
|
||||
2. Simplistic scheduling (RFC 4.4): If only one agent node, assign workload to it. Leader creates an "assignment" or "task" for the agent in etcd.
|
||||
3. Agent watches for assigned tasks from etcd.
|
||||
4. On receiving a task, Agent uses `ContainerRuntime` to deploy the container (image from `workload.kat`).
|
||||
5. Agent reports container instance status in its heartbeat. Leader updates overall workload status in etcd.
|
||||
6. Basic `katcall apply -f <dir>` and `katcall get workload <name>` functionality.
|
||||
* **Milestone**:
|
||||
* User can deploy a simple single-container `Service` (e.g., `nginx`) using `katcall apply`.
|
||||
* The container runs on the designated Agent node.
|
||||
* `katcall get workload my-service` shows its status as running.
|
||||
* `katcall logs <instanceID>` streams container logs.
|
||||
|
||||
**Phase 5: Overlay Networking (WireGuard) & IPAM**
|
||||
* **Goal**: Nodes establish a WireGuard overlay network. Leader allocates IPs for containers.
|
||||
* **Tasks**:
|
||||
1. Implement WireGuard setup on Agents (`internal/network/wireguard.go`) (RFC 7.1):
|
||||
* Key generation, public key reporting to Leader during join/heartbeat.
|
||||
* Leader stores Node WireGuard public keys and advertise endpoints in etcd.
|
||||
* Agent configures its `kat0` interface and peers by watching etcd.
|
||||
2. Implement IPAM in Leader (`internal/leader/ipam.go`) (RFC 7.2):
|
||||
* Node subnet allocation from `clusterCIDR` (from `cluster.kat`).
|
||||
* Container IP allocation from the node's subnet when a workload instance is scheduled.
|
||||
3. Agent uses the Leader-assigned IP when creating the container network/container with Podman.
|
||||
* **Milestone**:
|
||||
* All joined KAT nodes form a WireGuard mesh; `wg show` on nodes confirms peer connections.
|
||||
* Leader allocates a unique overlay IP for each container instance.
|
||||
* Containers on different nodes can ping each other using their overlay IPs.
|
||||
|
||||
**Phase 6: Distributed Agent DNS & Service Discovery**
|
||||
* **Goal**: Basic service discovery using agent-local DNS for deployed services.
|
||||
* **Tasks**:
|
||||
1. Implement Agent-local DNS server (`internal/agent/dns_resolver.go`) using `miekg/dns` (RFC 7.3).
|
||||
2. Leader writes DNS `A` records to etcd (e.g., `<workloadName>.<namespace>.<clusterDomain> -> <containerOverlayIP>`) when service instances become healthy/active.
|
||||
3. Agent DNS server watches etcd for DNS records and updates its local zones.
|
||||
4. Agent configures `/etc/resolv.conf` in managed containers to use its `kat0` IP as nameserver.
|
||||
* **Milestone**:
|
||||
* A service (`service-a`) deployed on one node can be resolved by its DNS name (e.g., `service-a.default.kat.cluster.local`) by a container on another node.
|
||||
* DNS resolution provides the correct overlay IP(s) of `service-a` instances.
|
||||
|
||||
**Phase 7: Advanced Workload Features & Full Scheduling**
|
||||
* **Goal**: Implement `Job`, `DaemonService`, richer scheduling, health checks, volumes, and restart policies.
|
||||
* **Tasks**:
|
||||
1. Implement `Job` type (RFC 3.4, 4.8): scheduling, completion tracking, backoff.
|
||||
2. Implement `DaemonService` type (RFC 3.2): ensures one instance per eligible node.
|
||||
3. Implement full scheduling logic in Leader (RFC 4.4): resource requests (`cpu`, `memory`), `nodeSelector`, Taint/Toleration, GPU (basic), "most empty" scoring.
|
||||
4. Implement `VirtualLoadBalancer.kat` parsing and Agent-side health checks (RFC 3.3, 4.6.3). Leader uses health status for service readiness and DNS.
|
||||
5. Implement container `restartPolicy` (RFC 3.2, 4.6.4) via systemd unit configuration.
|
||||
6. Implement `volumeMounts` and `volumes` (RFC 3.2, 4.7): `HostMount`, `SimpleClusterStorage`. Agent ensures paths are set up.
|
||||
* **Milestone**:
|
||||
* `Job`s run to completion and their status is tracked.
|
||||
* `DaemonService`s run one instance on all eligible nodes.
|
||||
* Services are scheduled according to resource requests, selectors, and taints.
|
||||
* Unhealthy service instances are identified by health checks and reflected in status.
|
||||
* Containers restart based on their policy.
|
||||
* Workloads can mount host paths and simple cluster storage.
|
||||
|
||||
**Phase 8: Git-Native Builds & Workload Updates/Rollbacks**
|
||||
* **Goal**: Enable on-agent builds from Git sources and implement workload update strategies.
|
||||
* **Tasks**:
|
||||
1. Implement `BuildDefinition.kat` parsing (RFC 3.5).
|
||||
2. Implement Git-native build process on Agent (`internal/agent/build.go`) using Podman (RFC 4.3).
|
||||
3. Implement `cacheImage` pull/push for build caching (Agent needs registry credentials configured locally).
|
||||
4. Implement workload update strategies in Leader (RFC 4.5): `Simultaneous`, `Rolling` (with `maxSurge`).
|
||||
5. Implement manual rollback mechanism (`katcall rollback workload <name>`) (RFC 4.5).
|
||||
* **Milestone**:
|
||||
* A workload can be successfully deployed from a Git repository source, with the image built on the agent.
|
||||
* A deployed service can be updated using the `Rolling` strategy with observable incremental instance replacement.
|
||||
* A workload can be rolled back to its previous version.
|
||||
|
||||
**Phase 9: Full API Implementation & CLI (`katcall`) Polish**
|
||||
* **Goal**: A robust and comprehensive HTTP API and `katcall` CLI.
|
||||
* **Tasks**:
|
||||
1. Implement all remaining API endpoints and features as per RFC Section 8. Ensure Proto3/JSON contracts are met.
|
||||
2. Implement API authentication: bearer token for `katcall` (RFC 8.1, 10.1).
|
||||
3. Flesh out `katcall` with all necessary commands and options (RFC 1.5 Terminology - katcall, RFC 8.3 hints):
|
||||
* `drain <nodeName>`, `get nodes/namespaces`, `describe <resource>`, etc.
|
||||
4. Improve error reporting and user feedback in CLI and API.
|
||||
* **Milestone**:
|
||||
* All functionalities defined in the RFC can be managed and introspected via the `katcall` CLI interacting with the secure KAT API.
|
||||
* API documentation (e.g., Swagger/OpenAPI generated from protos or code) is available.
|
||||
|
||||
**Phase 10: Observability, Backup/Restore, Advanced Features & Security**
|
||||
* **Goal**: Implement observability features, state backup/restore, and other advanced functionalities.
|
||||
* **Tasks**:
|
||||
1. Implement Agent & Leader logging to systemd journal/files; API for streaming container logs already in Phase 4/Milestone (RFC 9.1).
|
||||
2. Implement basic Metrics exposure (`/metrics` JSON endpoint on Leader/Agent) (RFC 9.2).
|
||||
3. Implement Events system: Leader records significant events in etcd, API to query events (RFC 9.3).
|
||||
4. Implement Leader-driven etcd state backup (`etcdctl snapshot save`) (RFC 5.4).
|
||||
5. Document and test the etcd state restore procedure (RFC 5.5).
|
||||
6. Implement Detached Node Operation and Rejoin (RFC 4.9).
|
||||
7. Provide standard Quadlet files and documentation for the Traefik Ingress recipe (RFC 7.4).
|
||||
8. Review and harden security aspects: API security, build security, network security, secrets handling (document current limitations as per RFC 10.5).
|
||||
* **Milestone**:
|
||||
* Container logs are streamable via `katcall logs`. Agent/Leader logs are accessible.
|
||||
* Basic metrics are available via API. Cluster events can be listed.
|
||||
* Automated etcd backups are created by the Leader. Restore procedure is tested.
|
||||
* Detached node can operate locally and rejoin the main cluster.
|
||||
* Traefik can be deployed using provided Quadlets to achieve ingress.
|
||||
|
||||
**Phase 11: Testing, Documentation, and Release Preparation**
|
||||
* **Goal**: Ensure KAT v1.0 is robust, well-documented, and ready for release.
|
||||
* **Tasks**:
|
||||
1. Write comprehensive unit tests for all core logic.
|
||||
2. Develop integration tests for component interactions (e.g., Leader-Agent, Agent-Podman).
|
||||
3. Create an E2E test suite using `katcall` to simulate real user scenarios.
|
||||
4. Write detailed user documentation: installation, configuration, tutorials for all features, troubleshooting.
|
||||
5. Perform performance testing on key operations (e.g., deployment speed, agent density).
|
||||
6. Conduct a thorough security review/audit against RFC security considerations.
|
||||
7. Establish a release process: versioning, changelog, building release artifacts.
|
||||
* **Milestone**:
|
||||
* High test coverage.
|
||||
* Comprehensive user and API documentation is complete.
|
||||
* Known critical bugs are fixed.
|
||||
* KAT v1.0 is packaged and ready for its first official release.
|
81
docs/plan/phase1.md
Normal file
81
docs/plan/phase1.md
Normal file
@ -0,0 +1,81 @@
|
||||
# **Phase 1: State Management & Leader Election**
|
||||
|
||||
* **Goal**: Establish the foundational state layer using embedded etcd and implement a reliable leader election mechanism. A single `kat-agent` can initialize a cluster, become its leader, and store initial configuration.
|
||||
* **RFC Sections Primarily Used**: 2.2 (Embedded etcd), 3.9 (ClusterConfiguration), 5.1 (State Store Interface), 5.2 (etcd Implementation Details), 5.3 (Leader Election).
|
||||
|
||||
**Tasks & Sub-Tasks:**
|
||||
|
||||
1. **Define `StateStore` Go Interface (`internal/store/interface.go`)**
|
||||
* **Purpose**: Create the abstraction layer for all state operations, decoupling the rest of the system from direct etcd dependencies.
|
||||
* **Details**: Transcribe the Go interface from RFC 5.1 verbatim. Include `KV`, `WatchEvent`, `EventType`, `Compare`, `Op`, `OpType` structs/constants.
|
||||
* **Verification**: Code compiles. Interface definition matches RFC.
|
||||
|
||||
2. **Implement Embedded etcd Server Logic (`internal/store/etcd.go`)**
|
||||
* **Purpose**: Allow `kat-agent` to run its own etcd instance for single-node clusters or as part of a multi-node quorum.
|
||||
* **Details**:
|
||||
* Use `go.etcd.io/etcd/server/v3/embed`.
|
||||
* Function to start an embedded etcd server:
|
||||
* Input: configuration parameters (data directory, peer URLs, client URLs, name). These will come from `cluster.kat` or defaults.
|
||||
* Output: a running `embed.Etcd` instance or an error.
|
||||
* Graceful shutdown logic for the embedded etcd server.
|
||||
* **Verification**: A test can start and stop an embedded etcd server. Data directory is created and used.
|
||||
|
||||
3. **Implement `StateStore` with etcd Backend (`internal/store/etcd.go`)**
|
||||
* **Purpose**: Provide the concrete implementation for interacting with an etcd cluster (embedded or external).
|
||||
* **Details**:
|
||||
* Create a struct that implements the `StateStore` interface and holds an `etcd/clientv3.Client`.
|
||||
* Implement `Put(ctx, key, value)`: Use `client.Put()`.
|
||||
* Implement `Get(ctx, key)`: Use `client.Get()`. Handle key-not-found. Populate `KV.Version` with `ModRevision`.
|
||||
* Implement `Delete(ctx, key)`: Use `client.Delete()`.
|
||||
* Implement `List(ctx, prefix)`: Use `client.Get()` with `clientv3.WithPrefix()`.
|
||||
* Implement `Watch(ctx, keyOrPrefix, startRevision)`: Use `client.Watch()`. Translate etcd events to `WatchEvent`.
|
||||
* Implement `Close()`: Close the `clientv3.Client`.
|
||||
* Implement `Campaign(ctx, leaderID, leaseTTLSeconds)`:
|
||||
* Use `concurrency.NewSession()` to create a lease.
|
||||
* Use `concurrency.NewElection()` and `election.Campaign()`.
|
||||
* Return a context that is cancelled when leadership is lost (e.g., by watching the campaign context or session done channel).
|
||||
* Implement `Resign(ctx)`: Use `election.Resign()`.
|
||||
* Implement `GetLeader(ctx)`: Observe the election or query the leader key.
|
||||
* Implement `DoTransaction(ctx, checks, onSuccess, onFailure)`: Use `client.Txn()` with `clientv3.Compare` and `clientv3.Op`.
|
||||
* **Potential Challenges**: Correctly handling etcd transaction semantics, context propagation, and error translation. Efficiently managing watches.
|
||||
* **Verification**:
|
||||
* Unit tests for each `StateStore` method using a real embedded etcd instance (test-scoped).
|
||||
* Verify `Put` then `Get` retrieves the correct value and version.
|
||||
* Verify `List` with prefix.
|
||||
* Verify `Delete` removes the key.
|
||||
* Verify `Watch` receives correct events for puts/deletes.
|
||||
* Verify `DoTransaction` commits on success and rolls back on failure.
|
||||
|
||||
4. **Integrate Leader Election into `kat-agent` (`cmd/kat-agent/main.go`, `internal/leader/election.go` - new file maybe)**
|
||||
* **Purpose**: Enable an agent instance to attempt to become the cluster leader.
|
||||
* **Details**:
|
||||
* `kat-agent` main function will initialize its `StateStore` client.
|
||||
* A dedicated goroutine will call `StateStore.Campaign()`.
|
||||
* The outcome of `Campaign` (e.g., leadership acquired, context for leadership duration) will determine if the agent activates its Leader-specific logic (Phase 2+).
|
||||
* Leader ID could be `nodeName` or a UUID. Lease TTL from `cluster.kat`.
|
||||
* **Verification**:
|
||||
* Start one `kat-agent` with etcd enabled; it should log "became leader".
|
||||
* Start a second `kat-agent` configured to connect to the first's etcd; it should log "observing leader <leaderID>" or similar, but not become leader itself.
|
||||
* If the first agent (leader) is stopped, the second agent should eventually log "became leader".
|
||||
|
||||
5. **Implement Basic `kat-agent init` Command (`cmd/kat-agent/main.go`, `internal/config/parse.go`)**
|
||||
* **Purpose**: Initialize a new KAT cluster (single node initially).
|
||||
* **Details**:
|
||||
* Define `init` subcommand in `kat-agent` using a CLI library (e.g., `cobra`).
|
||||
* Flag: `--config <path_to_cluster.kat>`.
|
||||
* Parse `cluster.kat` (from Phase 0, now used to extract etcd peer/client URLs, data dir, backup paths etc.).
|
||||
* Generate a persistent Cluster UID and store it in etcd (e.g., `/kat/config/cluster_uid`).
|
||||
* Store `cluster.kat` relevant parameters (or the whole sanitized config) into etcd (e.g., under `/kat/config/cluster_config`).
|
||||
* Start the embedded etcd server using parsed configurations.
|
||||
* Initiate leader election.
|
||||
* **Potential Challenges**: Ensuring `cluster.kat` parsing is robust. Handling existing data directories.
|
||||
* **Milestone Verification**:
|
||||
* Running `kat-agent init --config examples/cluster.kat` on a clean system:
|
||||
* Starts the `kat-agent` process.
|
||||
* Creates the etcd data directory.
|
||||
* Logs "Successfully initialized etcd".
|
||||
* Logs "Became leader: <nodeName>".
|
||||
* Using `etcdctl` (or a simple `StateStore.Get` test client):
|
||||
* Verify `/kat/config/cluster_uid` exists and has a UUID.
|
||||
* Verify `/kat/config/cluster_config` (or similar keys) contains data from `cluster.kat` (e.g., `clusterCIDR`, `serviceCIDR`, `agentPort`, `apiPort`).
|
||||
* Verify the leader election key exists for the current leader.
|
98
docs/plan/phase2.md
Normal file
98
docs/plan/phase2.md
Normal file
@ -0,0 +1,98 @@
|
||||
# **Phase 2: Basic Agent & Node Lifecycle (Init, Join, PKI)**
|
||||
|
||||
* **Goal**: Implement the secure registration of a new agent node to an existing leader, including PKI for mTLS, and establish periodic heartbeating for status updates and failure detection.
|
||||
* **RFC Sections Primarily Used**: 2.3 (Node Communication Protocol), 4.1.1 (Initial Leader Setup - CA), 4.1.2 (Agent Node Join - CSR), 10.1 (API Security - mTLS), 10.6 (Internal PKI), 4.1.3 (Node Heartbeat), 4.1.4 (Node Departure and Failure Detection - basic).
|
||||
|
||||
**Tasks & Sub-Tasks:**
|
||||
|
||||
1. **Implement Internal PKI Utilities (`internal/pki/ca.go`, `internal/pki/certs.go`)**
|
||||
* **Purpose**: Create and manage the Certificate Authority and sign certificates for mTLS.
|
||||
* **Details**:
|
||||
* `GenerateCA()`: Creates a new RSA key pair and a self-signed X.509 CA certificate. Saves to disk (e.g., `/var/lib/kat/pki/ca.key`, `/var/lib/kat/pki/ca.crt`). Path from `cluster.kat` `backupPath` parent dir, or a new `pkiPath`.
|
||||
* `GenerateCertificateRequest(commonName, keyOutPath, csrOutPath)`: Agent uses this. Generates RSA key, creates a CSR.
|
||||
* `SignCertificateRequest(caKeyPath, caCertPath, csrData, certOutPath, duration)`: Leader uses this. Loads CA key/cert, parses CSR, issues a signed certificate.
|
||||
* Helper functions to load keys and certs from disk.
|
||||
* **Potential Challenges**: Handling cryptographic operations correctly and securely. Permissions for key storage.
|
||||
* **Verification**: Unit tests for `GenerateCA`, `GenerateCertificateRequest`, `SignCertificateRequest`. Generated certs should be verifiable against the CA.
|
||||
|
||||
2. **Leader: Initialize CA & Its Own mTLS Certs on `init` (`cmd/kat-agent/main.go`)**
|
||||
* **Purpose**: The first leader needs to establish the PKI and secure its own API endpoint.
|
||||
* **Details**:
|
||||
* During `kat-agent init`, after etcd is up and leadership is confirmed:
|
||||
* Call `pki.GenerateCA()` if CA files don't exist.
|
||||
* Generate its own server key and CSR (e.g., for `leader.kat.cluster.local`).
|
||||
* Sign its own CSR using the CA to get its server certificate.
|
||||
* Configure its (future) API HTTP server to use these server key/cert for TLS and require client certs (mTLS).
|
||||
* **Verification**: After `kat-agent init`, CA key/cert and leader's server key/cert exist in the configured PKI path.
|
||||
|
||||
3. **Implement Basic API Server with mTLS on Leader (`internal/api/server.go`, `internal/api/router.go`)**
|
||||
* **Purpose**: Provide the initial HTTP endpoints required for agent join, secured with mTLS.
|
||||
* **Details**:
|
||||
* Setup `http.Server` with `tls.Config`:
|
||||
* `Certificates`: Leader's server key/cert.
|
||||
* `ClientAuth: tls.RequireAndVerifyClientCert`.
|
||||
* `ClientCAs`: Pool containing the cluster CA certificate.
|
||||
* Minimal router (e.g., `gorilla/mux` or `http.ServeMux`) for:
|
||||
* `POST /internal/v1alpha1/join`: Endpoint for agent to submit CSR. (Internal as it's part of bootstrap).
|
||||
* **Verification**: An HTTPS client (e.g., `curl` with appropriate client certs) can connect to the leader's API port if it presents a cert signed by the cluster CA. Connection fails without a client cert or with a cert from a different CA.
|
||||
|
||||
4. **Agent: `join` Command & CSR Submission (`cmd/kat-agent/main.go`, `internal/cli/join.go` - or similar for agent logic)**
|
||||
* **Purpose**: Allow a new agent to request to join the cluster and obtain its mTLS credentials.
|
||||
* **Details**:
|
||||
* `kat-agent join` subcommand:
|
||||
* Flags: `--leader-api <ip:port>`, `--advertise-address <ip_or_interface_name>`, `--node-name <name>` (optional, leader can generate).
|
||||
* Generate its own key pair and CSR using `pki.GenerateCertificateRequest()`.
|
||||
* Make an HTTP POST to Leader's `/internal/v1alpha1/join` endpoint:
|
||||
* Payload: CSR data, advertise address, requested node name, initial WireGuard public key (placeholder for now).
|
||||
* For this *initial* join, the client may need to trust the leader's CA cert via an out-of-band mechanism or `--leader-ca-cert` flag, or use a token for initial auth if mTLS is strictly enforced from the start. *RFC implies mTLS is mandatory, so agent needs CA cert to trust leader, and leader needs to accept CSR perhaps based on a pre-shared token initially before agent has its own signed cert.* For simplicity in V1, the initial join POST might happen over HTTPS where the agent trusts the leader's self-signed cert (if leader has one before CA is used for client auth) or a pre-shared token authorizes the CSR signing. RFC 4.1.2 states "Leader, upon validating the join request (V1 has no strong token validation, relies on network trust)". This needs clarification. *Assume network trust for now: agent connects, sends CSR, leader signs.*
|
||||
* Receive signed certificate and CA certificate from Leader. Store them locally.
|
||||
* **Potential Challenges**: Securely bootstrapping trust for the very first communication to the leader to submit the CSR.
|
||||
* **Verification**: `kat-agent join` command:
|
||||
* Generates key/CSR.
|
||||
* Successfully POSTs CSR to leader.
|
||||
* Receives and saves its signed certificate and the CA certificate.
|
||||
|
||||
5. **Leader: CSR Signing & Node Registration (Handler for `/internal/v1alpha1/join`)**
|
||||
* **Purpose**: Validate joining agent, sign its CSR, and record its registration.
|
||||
* **Details**:
|
||||
* Handler for `/internal/v1alpha1/join`:
|
||||
* Parse CSR, advertise address, WG pubkey from request.
|
||||
* Validate (minimal for now).
|
||||
* Generate a unique Node Name if not provided. Assign a Node UID.
|
||||
* Sign the CSR using `pki.SignCertificateRequest()`.
|
||||
* Store Node registration data in etcd via `StateStore` (`/kat/nodes/registration/{nodeName}`: UID, advertise address, WG pubkey placeholder, join timestamp).
|
||||
* Return the signed agent certificate and the cluster CA certificate to the agent.
|
||||
* **Verification**:
|
||||
* After agent joins, its certificate is signed by the cluster CA.
|
||||
* Node registration data appears correctly in etcd under `/kat/nodes/registration/{nodeName}`.
|
||||
|
||||
6. **Agent: Establish mTLS Client for Subsequent Comms & Implement Heartbeating (`internal/agent/agent.go`)**
|
||||
* **Purpose**: Agent uses its new mTLS certs to communicate status to the Leader.
|
||||
* **Details**:
|
||||
* Agent configures its HTTP client to use its signed key/cert and the cluster CA cert for all future Leader communications.
|
||||
* Periodic Heartbeat (RFC 4.1.3):
|
||||
* Ticker (e.g., every `agentTickSeconds` from `cluster.kat`, default 15s).
|
||||
* On tick, gather basic node status (node name, timestamp, initial resource capacity stubs).
|
||||
* HTTP `POST` to Leader's `/v1alpha1/nodes/{nodeName}/status` endpoint using the mTLS-configured client.
|
||||
* **Verification**: Agent logs successful heartbeat POSTs.
|
||||
|
||||
7. **Leader: Receive Heartbeats & Basic Failure Detection (Handler for `/v1alpha1/nodes/{nodeName}/status`, `internal/leader/leader.go`)**
|
||||
* **Purpose**: Leader tracks agent status and detects failures.
|
||||
* **Details**:
|
||||
* API endpoint `/v1alpha1/nodes/{nodeName}/status` (mTLS required):
|
||||
* Receives status update from agent.
|
||||
* Updates node's actual state in etcd (`/kat/nodes/status/{nodeName}/heartbeat`: timestamp, reported status). Could use an etcd lease for this key, renewed by agent heartbeats.
|
||||
* Failure Detection (RFC 4.1.4):
|
||||
* Leader has a reconciliation loop or periodic check.
|
||||
* Scans `/kat/nodes/status/` in etcd.
|
||||
* If a node's last heartbeat timestamp is older than `nodeLossTimeoutSeconds` (from `cluster.kat`), update its status in etcd to `NotReady` (e.g., `/kat/nodes/status/{nodeName}/condition: NotReady`).
|
||||
* **Potential Challenges**: Efficiently scanning for dead nodes without excessive etcd load.
|
||||
* **Milestone Verification**:
|
||||
* `kat-agent init` runs as Leader, CA created, its API is up with mTLS.
|
||||
* A second `kat-agent join ...` process successfully:
|
||||
* Generates CSR, gets it signed by Leader.
|
||||
* Saves its cert and CA cert.
|
||||
* Starts sending heartbeats to Leader using mTLS.
|
||||
* Leader logs receipt of heartbeats from the joined Agent.
|
||||
* Node status (last heartbeat time) is updated in etcd by the Leader.
|
||||
* If the joined Agent process is stopped, after `nodeLossTimeoutSeconds`, the Leader updates the node's status in etcd to `NotReady`. This can be verified using `etcdctl` or a `StateStore.Get` call.
|
102
docs/plan/phase3.md
Normal file
102
docs/plan/phase3.md
Normal file
@ -0,0 +1,102 @@
|
||||
# **Phase 3: Container Runtime Interface & Local Podman Management**
|
||||
|
||||
* **Goal**: Abstract container management operations behind a `ContainerRuntime` interface and implement it using Podman CLI, enabling an agent to manage containers rootlessly based on (mocked) instructions.
|
||||
* **RFC Sections Primarily Used**: 6.1 (Runtime Interface Definition), 6.2 (Default Implementation: Podman), 6.3 (Rootless Execution Strategy).
|
||||
|
||||
**Tasks & Sub-Tasks:**
|
||||
|
||||
1. **Define `ContainerRuntime` Go Interface (`internal/runtime/interface.go`)**
|
||||
* **Purpose**: Abstract all container operations (build, pull, run, stop, inspect, logs, etc.).
|
||||
* **Details**: Transcribe the Go interface from RFC 6.1 precisely. Include all specified structs (`ImageSummary`, `ContainerStatus`, `BuildOptions`, `PortMapping`, `VolumeMount`, `ResourceSpec`, `ContainerCreateOptions`, `ContainerHealthCheck`) and enums (`ContainerState`, `HealthState`).
|
||||
* **Verification**: Code compiles. Interface and type definitions match RFC.
|
||||
|
||||
2. **Implement Podman Backend for `ContainerRuntime` (`internal/runtime/podman.go`) - Core Lifecycle Methods**
|
||||
* **Purpose**: Translate `ContainerRuntime` calls into `podman` CLI commands.
|
||||
* **Details (for each method, focus on these first):**
|
||||
* `PullImage(ctx, imageName, platform)`:
|
||||
* Cmd: `podman pull {imageName}` (add `--platform` if specified).
|
||||
* Parse output to get image ID (e.g., from `podman inspect {imageName} --format '{{.Id}}'`).
|
||||
* `CreateContainer(ctx, opts ContainerCreateOptions)`:
|
||||
* Cmd: `podman create ...`
|
||||
* Translate `ContainerCreateOptions` into `podman create` flags:
|
||||
* `--name {opts.InstanceID}` (KAT's unique ID for the instance).
|
||||
* `--hostname {opts.Hostname}`.
|
||||
* `--env` for `opts.Env`.
|
||||
* `--label` for `opts.Labels` (include KAT ownership labels like `kat.dws.rip/workload-name`, `kat.dws.rip/namespace`, `kat.dws.rip/instance-id`).
|
||||
* `--restart {opts.RestartPolicy}` (map to Podman's "no", "on-failure", "always").
|
||||
* Resource mapping: `--cpus` (for quota), `--cpu-shares`, `--memory`.
|
||||
* `--publish` for `opts.Ports`.
|
||||
* `--volume` for `opts.Volumes` (source will be host path, destination is container path).
|
||||
* `--network {opts.NetworkName}` and `--ip {opts.IPAddress}` if specified.
|
||||
* `--user {opts.User}`.
|
||||
* `--cap-add`, `--cap-drop`, `--security-opt`.
|
||||
* Podman native healthcheck flags from `opts.HealthCheck`.
|
||||
* `--systemd={opts.Systemd}`.
|
||||
* Parse output for container ID.
|
||||
* `StartContainer(ctx, containerID)`: Cmd: `podman start {containerID}`.
|
||||
* `StopContainer(ctx, containerID, timeoutSeconds)`: Cmd: `podman stop -t {timeoutSeconds} {containerID}`.
|
||||
* `RemoveContainer(ctx, containerID, force, removeVolumes)`: Cmd: `podman rm {containerID}` (add `--force`, `--volumes`).
|
||||
* `GetContainerStatus(ctx, containerOrName)`:
|
||||
* Cmd: `podman inspect {containerOrName}`.
|
||||
* Parse JSON output to populate `ContainerStatus` struct (State, ExitCode, StartedAt, FinishedAt, Health, ImageID, ImageName, OverlayIP if available from inspect).
|
||||
* Podman health status needs to be mapped to `HealthState`.
|
||||
* `StreamContainerLogs(ctx, containerID, follow, since, stdout, stderr)`:
|
||||
* Cmd: `podman logs {containerID}` (add `--follow`, `--since`).
|
||||
* Stream `os/exec.Cmd.Stdout` and `os/exec.Cmd.Stderr` to the provided `io.Writer`s.
|
||||
* **Helper**: A utility function to run `podman` commands as a specific rootless user (see Rootless Execution below).
|
||||
* **Potential Challenges**: Correctly mapping all `ContainerCreateOptions` to Podman flags. Parsing varied `podman inspect` output. Managing `os/exec` for logs. Robust error handling from CLI output.
|
||||
* **Verification**:
|
||||
* Unit tests for each implemented method, mocking `os/exec` calls to verify command construction and output parsing.
|
||||
* *Requires Podman installed for integration-style unit tests*: Tests that actually execute `podman` commands (e.g., pull alpine, create, start, inspect, stop, rm) and verify state changes.
|
||||
|
||||
3. **Implement Rootless Execution Strategy (`internal/runtime/podman.go` helpers, `internal/agent/runtime.go`)**
|
||||
* **Purpose**: Ensure containers are run by unprivileged users using systemd for supervision.
|
||||
* **Details**:
|
||||
* **User Assumption**: For Phase 3, *assume* the dedicated user (e.g., `kat_wl_mywebapp`) already exists on the system and `loginctl enable-linger <username>` has been run manually. The username could be passed in `ContainerCreateOptions.User` or derived.
|
||||
* **Podman Command Execution Context**:
|
||||
* The `kat-agent` process itself might run as root or a privileged user.
|
||||
* When executing `podman` commands for a workload, it MUST run them as the target unprivileged user.
|
||||
* This can be achieved using `sudo -u {username} podman ...` or more directly via `nsenter`/`setuid` if the agent has capabilities, or by setting `XDG_RUNTIME_DIR` and `DBUS_SESSION_BUS_ADDRESS` appropriately for the target user if invoking `podman` via systemd user session D-Bus API. *Simplest for now might be `sudo -u {username} podman ...` if agent is root, or ensuring agent itself runs as a user who can switch to other `kat_wl_*` users.*
|
||||
* The RFC prefers "systemd user sessions". This usually means `systemctl --user ...`. To control another user's systemd session, the agent process (if root) can use `machinectl shell {username}@.host /bin/bash -c "systemctl --user ..."` or `systemd-run --user --machine={username}@.host ...`. If the agent is not root, it cannot directly control other users' systemd sessions. *This is a critical design point: how does the agent (potentially root) interact with user-level systemd?*
|
||||
* RFC: "Agent uses `systemctl --user --machine={username}@.host ...`". This implies agent has permissions to do this (likely running as root or with specific polkit rules).
|
||||
* **Systemd Unit Generation & Management**:
|
||||
* After `podman create ...` (or instead of direct create, if `podman generate systemd` is used to create the definition), generate systemd unit:
|
||||
`podman generate systemd --new --name {opts.InstanceID} --files --time 10 {imageNameUsedInCreate}`. This creates a `{opts.InstanceID}.service` file.
|
||||
* The `ContainerRuntime` implementation needs to:
|
||||
1. Execute `podman create` to establish the container definition (this allows Podman to manage its internal state for the container ID).
|
||||
2. Execute `podman generate systemd --name {containerID}` (using the ID from create) to get the unit file content.
|
||||
3. Place this unit file in the target user's systemd path (e.g., `/home/{username}/.config/systemd/user/{opts.InstanceID}.service` or `/etc/systemd/user/{opts.InstanceID}.service` if agent is root and wants to enable for any user).
|
||||
4. Run `systemctl --user --machine={username}@.host daemon-reload`.
|
||||
5. Start/Enable: `systemctl --user --machine={username}@.host enable --now {opts.InstanceID}.service`.
|
||||
* To stop: `systemctl --user --machine={username}@.host stop {opts.InstanceID}.service`.
|
||||
* To remove: `systemctl --user --machine={username}@.host disable {opts.InstanceID}.service`, then `podman rm {opts.InstanceID}`, then remove the unit file.
|
||||
* Status: `systemctl --user --machine={username}@.host status {opts.InstanceID}.service` (parse output), or rely on `podman inspect` which should reflect systemd-managed state.
|
||||
* **Potential Challenges**: Managing permissions for interacting with other users' systemd sessions. Correctly placing and cleaning up systemd unit files. Ensuring `XDG_RUNTIME_DIR` is set correctly for rootless Podman if not using systemd units for direct `podman run`. Systemd unit generation nuances.
|
||||
* **Verification**:
|
||||
* A test in `internal/agent/runtime_test.go` (or similar) can take mock `ContainerCreateOptions`.
|
||||
* It calls the (mocked or real) `ContainerRuntime` implementation.
|
||||
* Verify:
|
||||
* Podman commands are constructed to run as the target unprivileged user.
|
||||
* A systemd unit file is generated for the container.
|
||||
* `systemctl --user --machine...` commands are invoked correctly to manage the service.
|
||||
* The container is actually started (verify with `podman ps -a --filter label=kat.dws.rip/instance-id={instanceID}` as the target user).
|
||||
* Logs can be retrieved.
|
||||
* The container can be stopped and removed, including its systemd unit.
|
||||
|
||||
* **Milestone Verification**:
|
||||
* The `ContainerRuntime` Go interface is fully defined as per RFC 6.1.
|
||||
* The Podman implementation for core lifecycle methods (`PullImage`, `CreateContainer` (leading to systemd unit generation), `StartContainer` (via systemd enable/start), `StopContainer` (via systemd stop), `RemoveContainer` (via systemd disable + podman rm + unit file removal), `GetContainerStatus`, `StreamContainerLogs`) is functional.
|
||||
* An `internal/agent` test (or a temporary `main.go` test harness) can:
|
||||
1. Define `ContainerCreateOptions` for a simple image like `docker.io/library/alpine` with a command like `sleep 30`.
|
||||
2. Specify a (manually pre-created and linger-enabled) unprivileged username.
|
||||
3. Call the `ContainerRuntime` methods.
|
||||
4. **Result**:
|
||||
* The alpine image is pulled (if not present).
|
||||
* A systemd user service unit is generated and placed correctly for the specified user.
|
||||
* The service is started using `systemctl --user --machine...`.
|
||||
* `podman ps --all --filter label=kat.dws.rip/instance-id=...` (run as the target user or by root seeing all containers) shows the container running or having run.
|
||||
* Logs can be retrieved using the `StreamContainerLogs` method.
|
||||
* The container can be stopped and removed (including its systemd unit file).
|
||||
* All container operations are verifiably performed by the specified unprivileged user.
|
||||
|
||||
This detailed plan should provide a clearer path for implementing these initial crucial phases. Remember to keep testing iterative and focused on the RFC specifications.
|
Reference in New Issue
Block a user