From b723a004f24edd33f100655870b79de4dde77be1 Mon Sep 17 00:00:00 2001 From: Tanishq Dubey Date: Sat, 10 May 2025 13:53:29 -0400 Subject: [PATCH] more docs --- .gitignore | 5 + docs/plan/phase0.md | 274 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 279 insertions(+) create mode 100644 docs/plan/phase0.md diff --git a/.gitignore b/.gitignore index 6f72f89..1e21f30 100644 --- a/.gitignore +++ b/.gitignore @@ -23,3 +23,8 @@ go.work.sum # env file .env + +.DS_Store +kat-agent +katcall +.aider* diff --git a/docs/plan/phase0.md b/docs/plan/phase0.md new file mode 100644 index 0000000..6ca75d2 --- /dev/null +++ b/docs/plan/phase0.md @@ -0,0 +1,274 @@ +# **Phase 0: Project Setup & Core Types** + +* **Goal**: Initialize the project structure, establish version control and build tooling, define the core data structures (primarily through Protocol Buffers as specified in the RFC), and ensure basic parsing/validation capabilities for initial configuration files. +* **RFC Sections Primarily Used**: Overall project understanding, Section 8.2 (Resource Representation Proto3 & JSON), Section 3 (Resource Model - for identifying initial protos), Section 3.9 (Cluster Configuration - for `cluster.kat`). + +**Tasks & Sub-Tasks:** + +1. **Initialize Git Repository & Go Module** + * **Purpose**: Establish version control and Go project identity. + * **Details**: + * Create the root project directory (e.g., `kat-system`). + * Navigate into the directory: `cd kat-system`. + * Initialize Git: `git init`. + * Create an initial `.gitignore` file. Add common Go and OS-specific ignores (e.g., `*.o`, `*.exe`, `*~`, `.DS_Store`, compiled binaries like `kat-agent`, `katcall`). + * Initialize Go module: `go mod init github.com/dws-llc/kat-system` (or your chosen module path). + * **Verification**: + * `.git` directory exists. + * `go.mod` file is created with the correct module path. + * Initial commit can be made. + +2. **Create Initial Directory Structure** + * **Purpose**: Lay out the skeleton of the project for organizing code and artifacts. + * **Details**: Create the top-level directories as outlined in the "Proposed Directory/File Structure" from the previous response: + ``` + kat-system/ + ├── api/ + │ └── v1alpha1/ + ├── cmd/ + │ ├── kat-agent/ + │ └── katcall/ + ├── docs/ + │ └── rfc/ + ├── examples/ + ├── internal/ + ├── pkg/ # (Optional, if you decide to have externally importable library code not part of 'internal') + ├── scripts/ + └── test/ + ``` * Place the `RFC001-KAT.md` into `docs/rfc/`. + * **Verification**: Directory structure matches the plan. + +3. **Define Initial Protocol Buffer Messages (`api/v1alpha1/kat.proto`)** + * **Purpose**: Create the canonical definitions for KAT resources that will be used for API communication and internal state representation. + * **Details**: + * Create `api/v1alpha1/kat.proto`. + * Define initial messages based on RFC Section 3 and Section 8.2. Focus on data structures, not RPC service definitions yet. + * **Common Metadata**: + ```protobuf + message ObjectMeta { + string name = 1; + string namespace = 2; + string uid = 3; + int64 generation = 4; + string resource_version = 5; // e.g., etcd ModRevision + google.protobuf.Timestamp creation_timestamp = 6; + map labels = 7; + map annotations = 8; // For future use + } + + message Timestamp { // google.protobuf.Timestamp might be better + int64 seconds = 1; + int32 nanos = 2; + } + ``` + * **`Workload` (RFC 3.2)**: + ```protobuf + enum WorkloadType { + WORKLOAD_TYPE_UNSPECIFIED = 0; + SERVICE = 1; + JOB = 2; + DAEMON_SERVICE = 3; + } + + // ... (GitSource, UpdateStrategy, RestartPolicy, Container, VolumeMount, ResourceRequests, GPUSpec, Volume definitions) + + message WorkloadSpec { + WorkloadType type = 1; + // Source source = 2; // Define GitSource, ImageSource, CacheImage + int32 replicas = 3; + // UpdateStrategy update_strategy = 4; + // RestartPolicy restart_policy = 5; + map node_selector = 6; + // repeated Toleration tolerations = 7; + Container container = 8; // Define Container fully + repeated Volume volumes = 9; // Define Volume fully (SimpleClusterStorage, HostMount) + // ... other spec fields from workload.kat + } + + message Workload { + ObjectMeta metadata = 1; + WorkloadSpec spec = 2; + // WorkloadStatus status = 3; // Define later + } + ``` + *(Start with core fields and expand. For brevity, not all sub-messages are listed here, but they need to be defined based on `workload.kat` fields in RFC 3.2)* + * **`VirtualLoadBalancer` (RFC 3.3)**: + ```protobuf + message VirtualLoadBalancerSpec { + // repeated Port ports = 1; + // HealthCheck health_check = 2; + // repeated IngressRule ingress = 3; + } + + message VirtualLoadBalancer { // This might be part of Workload or a separate resource + ObjectMeta metadata = 1; // Name likely matches Workload name + VirtualLoadBalancerSpec spec = 2; + } + ``` + *Consider if this is embedded in `Workload.spec` or a truly separate resource associated by name.* RFC shows it as a separate `*.kat` file, implying separate resource. + * **`JobDefinition` (RFC 3.4)**: Similar structure, `JobDefinitionSpec` with fields like `schedule`, `completions`. + * **`BuildDefinition` (RFC 3.5)**: Similar structure, `BuildDefinitionSpec` with fields like `buildContext`, `dockerfilePath`. + * **`Namespace` (RFC 3.7)**: + ```protobuf + message NamespaceSpec { + // Potentially finalizers or other future spec fields + } + + message Namespace { + ObjectMeta metadata = 1; + NamespaceSpec spec = 2; + // NamespaceStatus status = 3; // Define later + } + ``` + * **`Node` (Internal Representation - RFC 3.8)**: (This is for Leader's internal state, not a user-defined Quadlet) + ```protobuf + message NodeResources { + string cpu = 1; + string memory = 2; + // map custom_resources = 3; // e.g., for GPUs + } + + message NodeStatusDetails { // For status reporting by agent + NodeResources capacity = 1; + NodeResources allocatable = 2; + // repeated WorkloadInstanceStatus workload_instances = 3; + // OverlayNetworkStatus overlay_network = 4; + string condition = 5; // e.g., "Ready", "NotReady" + google.protobuf.Timestamp last_heartbeat_time = 6; + } + + message NodeSpec { // Configuration for a node, some set by leader + // repeated Taint taints = 1; + string overlay_subnet = 2; // Assigned by leader + } + + message Node { // Represents a node in the cluster + ObjectMeta metadata = 1; // Name is the unique node name + NodeSpec spec = 2; + NodeStatusDetails status = 3; + } + ``` + * **`ClusterConfiguration` (RFC 3.9)**: + ```protobuf + message ClusterConfigurationSpec { + string cluster_cidr = 1; + string service_cidr = 2; + int32 node_subnet_bits = 3; + string cluster_domain = 4; + int32 agent_port = 5; + int32 api_port = 6; + int32 etcd_peer_port = 7; + int32 etcd_client_port = 8; + string volume_base_path = 9; + string backup_path = 10; + int32 backup_interval_minutes = 11; + int32 agent_tick_seconds = 12; + int32 node_loss_timeout_seconds = 13; + } + + message ClusterConfiguration { + ObjectMeta metadata = 1; // e.g., name of the cluster + ClusterConfigurationSpec spec = 2; + } + ``` + * Include `syntax = "proto3";` and appropriate `package` and `option go_package` statements. + * Import `google/protobuf/timestamp.proto` if used. + * **Potential Challenges**: Accurately translating all nested YAML structures from Quadlet definitions into Protobuf messages. Deciding on naming conventions. + * **Verification**: `kat.proto` file is syntactically correct. It includes initial definitions for the key resources. + +4. **Set Up Protobuf Code Generation (`scripts/gen-proto.sh`, Makefile target)** + * **Purpose**: Automate the conversion of `.proto` definitions into Go code. + * **Details**: + * Install `protoc` (protobuf compiler) and `protoc-gen-go` plugin. Add to `go.mod` via `go get google.golang.org/protobuf/cmd/protoc-gen-go` and `go install google.golang.org/protobuf/cmd/protoc-gen-go`. + * Create `scripts/gen-proto.sh`: + ```bash + #!/bin/bash + set -e + + PROTOC_GEN_GO=$(go env GOBIN)/protoc-gen-go + if [ ! -f "$PROTOC_GEN_GO" ]; then + echo "protoc-gen-go not found. Please run: go install google.golang.org/protobuf/cmd/protoc-gen-go" + exit 1 + fi + + API_DIR="./api/v1alpha1" + OUT_DIR="${API_DIR}/generated" # Or directly into api/v1alpha1 if preferred + + mkdir -p "$OUT_DIR" + + protoc --proto_path="${API_DIR}" \ + --go_out="${OUT_DIR}" --go_opt=paths=source_relative \ + "${API_DIR}/kat.proto" + + echo "Protobuf Go code generated in ${OUT_DIR}" + ``` + *(Adjust paths and options as needed. `paths=source_relative` is common.)* + * Make the script executable: `chmod +x scripts/gen-proto.sh`. + * (Optional) Add a Makefile target: + ```makefile + .PHONY: generate + generate: + @echo "Generating Go code from Protobuf definitions..." + @./scripts/gen-proto.sh + ``` + * **Verification**: + * Running `scripts/gen-proto.sh` (or `make generate`) executes without errors. + * Go files (e.g., `kat.pb.go`) are generated in the specified output directory (`api/v1alpha1/generated/` or `api/v1alpha1/`). + * These generated files compile if included in a Go program. + +5. **Implement Basic Parsing and Validation for `cluster.kat` (`internal/config/parse.go`, `internal/config/types.go`)** + * **Purpose**: Enable `kat-agent init` to read and understand its initial cluster-wide configuration. + * **Details**: + * In `internal/config/types.go` (or use generated proto types directly if preferred for consistency): Define Go structs that mirror `ClusterConfiguration` from `kat.proto`. + * If using proto types: the generated `ClusterConfiguration` struct can be used directly. + * In `internal/config/parse.go`: + * `ParseClusterConfiguration(filePath string) (*ClusterConfiguration, error)`: + 1. Read the file content. + 2. Unmarshal YAML into the Go struct (e.g., using `gopkg.in/yaml.v3`). + 3. Perform basic validation: + * Check for required fields (e.g., `clusterCIDR`, `serviceCIDR`, ports). + * Validate CIDR formats. + * Ensure ports are within valid range. + * Ensure intervals are positive. + * `SetClusterConfigDefaults(config *ClusterConfiguration)`: Apply default values as per RFC 3.9 if fields are not set. + * **Potential Challenges**: Handling YAML unmarshalling intricacies, comprehensive validation logic. + * **Verification**: + * Unit tests for `ParseClusterConfiguration`: + * Test with a valid `examples/cluster.kat` file. Parsed struct should match expected values. + * Test with missing required fields; expect an error. + * Test with invalid field values (e.g., bad CIDR, invalid port); expect an error. + * Test with a file that includes some fields and omits optional ones; verify defaults are applied by `SetClusterConfigDefaults`. + * An example `examples/cluster.kat` file should be created for testing. + +6. **Implement Basic Parsing/Validation for Quadlet Files (`internal/config/parse.go`, `internal/utils/tar.go`)** + * **Purpose**: Enable the Leader to understand submitted Workload definitions. + * **Details**: + * In `internal/utils/tar.go`: + * `UntarQuadlets(reader io.Reader) (map[string][]byte, error)`: Takes a `tar.gz` stream, unpacks it in memory (or temp dir), and returns a map of `fileName -> fileContent`. + * In `internal/config/parse.go`: + * `ParseQuadletFile(fileName string, content []byte) (interface{}, error)`: + 1. Unmarshal YAML content based on `kind` field (e.g., into `Workload`, `VirtualLoadBalancer` generated proto structs). + 2. Perform basic validation on the specific Quadlet type (e.g., `Workload` must have `metadata.name`, `spec.type`). + * `ParseQuadletDirectory(files map[string][]byte) (*Workload, *VirtualLoadBalancer, ..., error)`: + 1. Iterate through files from `UntarQuadlets`. + 2. Use `ParseQuadletFile` for each. + 3. Perform cross-Quadlet file validation (e.g., if `build.kat` exists, `workload.kat` must have `spec.source.git`). Placeholder for now, more in later phases. + * **Potential Challenges**: Handling different Quadlet `kind`s, managing inter-file dependencies. + * **Verification**: + * Unit tests for `UntarQuadlets` with a sample `tar.gz` archive containing example Quadlet files. + * Unit tests for `ParseQuadletFile` for each Quadlet type (`workload.kat`, `VirtualLoadBalancer.kat` etc.) with valid and invalid content. + * An example Quadlet directory (e.g., `examples/simple-service/`) should be created and tarred for testing. + * `ParseQuadletDirectory` successfully parses a valid collection of Quadlet files from the tar. + +* **Milestone Verification (Overall Phase 0)**: + 1. Project repository is set up with Go modules and initial directory structure. + 2. `make generate` (or `scripts/gen-proto.sh`) successfully compiles `api/v1alpha1/kat.proto` into Go source files without errors. The generated Go code includes structs for `Workload`, `VirtualLoadBalancer`, `JobDefinition`, `BuildDefinition`, `Namespace`, internal `Node`, and `ClusterConfiguration`. + 3. Unit tests in `internal/config/parse_test.go` demonstrate: + * Successful parsing of a valid `cluster.kat` file into the `ClusterConfiguration` struct, including application of default values. + * Error handling for invalid or incomplete `cluster.kat` files. + 4. Unit tests in `internal/config/parse_test.go` (and potentially `internal/utils/tar_test.go`) demonstrate: + * Successful untarring of a sample `tar.gz` Quadlet archive. + * Successful parsing of individual Quadlet files (e.g., `workload.kat`, `VirtualLoadBalancer.kat`) into their respective Go structs (using generated proto types). + * Basic validation of required fields within individual Quadlet files. + 5. All code is committed to Git. + 6. (Optional but good practice) A basic `README.md` is started. \ No newline at end of file