kat/docs/plan/phase0.md
2025-05-10 13:53:29 -04:00

15 KiB

Phase 0: Project Setup & Core Types

  • Goal: Initialize the project structure, establish version control and build tooling, define the core data structures (primarily through Protocol Buffers as specified in the RFC), and ensure basic parsing/validation capabilities for initial configuration files.
  • RFC Sections Primarily Used: Overall project understanding, Section 8.2 (Resource Representation Proto3 & JSON), Section 3 (Resource Model - for identifying initial protos), Section 3.9 (Cluster Configuration - for cluster.kat).

Tasks & Sub-Tasks:

  1. Initialize Git Repository & Go Module

    • Purpose: Establish version control and Go project identity.
    • Details:
      • Create the root project directory (e.g., kat-system).
      • Navigate into the directory: cd kat-system.
      • Initialize Git: git init.
      • Create an initial .gitignore file. Add common Go and OS-specific ignores (e.g., *.o, *.exe, *~, .DS_Store, compiled binaries like kat-agent, katcall).
      • Initialize Go module: go mod init github.com/dws-llc/kat-system (or your chosen module path).
    • Verification:
      • .git directory exists.
      • go.mod file is created with the correct module path.
      • Initial commit can be made.
  2. Create Initial Directory Structure

    • Purpose: Lay out the skeleton of the project for organizing code and artifacts.
    • Details: Create the top-level directories as outlined in the "Proposed Directory/File Structure" from the previous response:
      kat-system/
      ├── api/
      │   └── v1alpha1/
      ├── cmd/
      │   ├── kat-agent/
      │   └── katcall/
      ├── docs/
      │   └── rfc/
      ├── examples/
      ├── internal/
      ├── pkg/      # (Optional, if you decide to have externally importable library code not part of 'internal')
      ├── scripts/
      └── test/
      ```        *   Place the `RFC001-KAT.md` into `docs/rfc/`.
      
    • Verification: Directory structure matches the plan.
  3. Define Initial Protocol Buffer Messages (api/v1alpha1/kat.proto)

    • Purpose: Create the canonical definitions for KAT resources that will be used for API communication and internal state representation.
    • Details:
      • Create api/v1alpha1/kat.proto.
      • Define initial messages based on RFC Section 3 and Section 8.2. Focus on data structures, not RPC service definitions yet.
      • Common Metadata:
        message ObjectMeta {
          string name = 1;
          string namespace = 2;
          string uid = 3;
          int64 generation = 4;
          string resource_version = 5; // e.g., etcd ModRevision
          google.protobuf.Timestamp creation_timestamp = 6;
          map<string, string> labels = 7;
          map<string, string> annotations = 8; // For future use
        }
        
        message Timestamp { // google.protobuf.Timestamp might be better
          int64 seconds = 1;
          int32 nanos = 2;
        }
        
      • Workload (RFC 3.2):
        enum WorkloadType {
          WORKLOAD_TYPE_UNSPECIFIED = 0;
          SERVICE = 1;
          JOB = 2;
          DAEMON_SERVICE = 3;
        }
        
        // ... (GitSource, UpdateStrategy, RestartPolicy, Container, VolumeMount, ResourceRequests, GPUSpec, Volume definitions)
        
        message WorkloadSpec {
          WorkloadType type = 1;
          // Source source = 2; // Define GitSource, ImageSource, CacheImage
          int32 replicas = 3;
          // UpdateStrategy update_strategy = 4;
          // RestartPolicy restart_policy = 5;
          map<string, string> node_selector = 6;
          // repeated Toleration tolerations = 7;
          Container container = 8; // Define Container fully
          repeated Volume volumes = 9; // Define Volume fully (SimpleClusterStorage, HostMount)
          // ... other spec fields from workload.kat
        }
        
        message Workload {
          ObjectMeta metadata = 1;
          WorkloadSpec spec = 2;
          // WorkloadStatus status = 3; // Define later
        }
        
        (Start with core fields and expand. For brevity, not all sub-messages are listed here, but they need to be defined based on workload.kat fields in RFC 3.2)
      • VirtualLoadBalancer (RFC 3.3):
        message VirtualLoadBalancerSpec {
          // repeated Port ports = 1;
          // HealthCheck health_check = 2;
          // repeated IngressRule ingress = 3;
        }
        
        message VirtualLoadBalancer { // This might be part of Workload or a separate resource
          ObjectMeta metadata = 1; // Name likely matches Workload name
          VirtualLoadBalancerSpec spec = 2;
        }
        
        Consider if this is embedded in Workload.spec or a truly separate resource associated by name. RFC shows it as a separate *.kat file, implying separate resource.
      • JobDefinition (RFC 3.4): Similar structure, JobDefinitionSpec with fields like schedule, completions.
      • BuildDefinition (RFC 3.5): Similar structure, BuildDefinitionSpec with fields like buildContext, dockerfilePath.
      • Namespace (RFC 3.7):
        message NamespaceSpec {
          // Potentially finalizers or other future spec fields
        }
        
        message Namespace {
          ObjectMeta metadata = 1;
          NamespaceSpec spec = 2;
          // NamespaceStatus status = 3; // Define later
        }
        
      • Node (Internal Representation - RFC 3.8): (This is for Leader's internal state, not a user-defined Quadlet)
        message NodeResources {
          string cpu = 1;
          string memory = 2;
          // map<string, string> custom_resources = 3; // e.g., for GPUs
        }
        
        message NodeStatusDetails { // For status reporting by agent
          NodeResources capacity = 1;
          NodeResources allocatable = 2;
          // repeated WorkloadInstanceStatus workload_instances = 3;
          // OverlayNetworkStatus overlay_network = 4;
          string condition = 5; // e.g., "Ready", "NotReady"
          google.protobuf.Timestamp last_heartbeat_time = 6;
        }
        
        message NodeSpec { // Configuration for a node, some set by leader
            // repeated Taint taints = 1;
            string overlay_subnet = 2; // Assigned by leader
        }
        
        message Node { // Represents a node in the cluster
          ObjectMeta metadata = 1; // Name is the unique node name
          NodeSpec spec = 2;
          NodeStatusDetails status = 3;
        }
        
      • ClusterConfiguration (RFC 3.9):
        message ClusterConfigurationSpec {
          string cluster_cidr = 1;
          string service_cidr = 2;
          int32 node_subnet_bits = 3;
          string cluster_domain = 4;
          int32 agent_port = 5;
          int32 api_port = 6;
          int32 etcd_peer_port = 7;
          int32 etcd_client_port = 8;
          string volume_base_path = 9;
          string backup_path = 10;
          int32 backup_interval_minutes = 11;
          int32 agent_tick_seconds = 12;
          int32 node_loss_timeout_seconds = 13;
        }
        
        message ClusterConfiguration {
          ObjectMeta metadata = 1; // e.g., name of the cluster
          ClusterConfigurationSpec spec = 2;
        }
        
      • Include syntax = "proto3"; and appropriate package and option go_package statements.
      • Import google/protobuf/timestamp.proto if used.
    • Potential Challenges: Accurately translating all nested YAML structures from Quadlet definitions into Protobuf messages. Deciding on naming conventions.
    • Verification: kat.proto file is syntactically correct. It includes initial definitions for the key resources.
  4. Set Up Protobuf Code Generation (scripts/gen-proto.sh, Makefile target)

    • Purpose: Automate the conversion of .proto definitions into Go code.
    • Details:
      • Install protoc (protobuf compiler) and protoc-gen-go plugin. Add to go.mod via go get google.golang.org/protobuf/cmd/protoc-gen-go and go install google.golang.org/protobuf/cmd/protoc-gen-go.
      • Create scripts/gen-proto.sh:
        #!/bin/bash
        set -e
        
        PROTOC_GEN_GO=$(go env GOBIN)/protoc-gen-go
        if [ ! -f "$PROTOC_GEN_GO" ]; then
            echo "protoc-gen-go not found. Please run: go install google.golang.org/protobuf/cmd/protoc-gen-go"
            exit 1
        fi
        
        API_DIR="./api/v1alpha1"
        OUT_DIR="${API_DIR}/generated" # Or directly into api/v1alpha1 if preferred
        
        mkdir -p "$OUT_DIR"
        
        protoc --proto_path="${API_DIR}" \
               --go_out="${OUT_DIR}" --go_opt=paths=source_relative \
               "${API_DIR}/kat.proto"
        
        echo "Protobuf Go code generated in ${OUT_DIR}"
        
        (Adjust paths and options as needed. paths=source_relative is common.)
      • Make the script executable: chmod +x scripts/gen-proto.sh.
      • (Optional) Add a Makefile target:
        .PHONY: generate
        generate:
        	@echo "Generating Go code from Protobuf definitions..."
        	@./scripts/gen-proto.sh
        
    • Verification:
      • Running scripts/gen-proto.sh (or make generate) executes without errors.
      • Go files (e.g., kat.pb.go) are generated in the specified output directory (api/v1alpha1/generated/ or api/v1alpha1/).
      • These generated files compile if included in a Go program.
  5. Implement Basic Parsing and Validation for cluster.kat (internal/config/parse.go, internal/config/types.go)

    • Purpose: Enable kat-agent init to read and understand its initial cluster-wide configuration.
    • Details:
      • In internal/config/types.go (or use generated proto types directly if preferred for consistency): Define Go structs that mirror ClusterConfiguration from kat.proto.
        • If using proto types: the generated ClusterConfiguration struct can be used directly.
      • In internal/config/parse.go:
        • ParseClusterConfiguration(filePath string) (*ClusterConfiguration, error):
          1. Read the file content.
          2. Unmarshal YAML into the Go struct (e.g., using gopkg.in/yaml.v3).
          3. Perform basic validation:
            • Check for required fields (e.g., clusterCIDR, serviceCIDR, ports).
            • Validate CIDR formats.
            • Ensure ports are within valid range.
            • Ensure intervals are positive.
        • SetClusterConfigDefaults(config *ClusterConfiguration): Apply default values as per RFC 3.9 if fields are not set.
    • Potential Challenges: Handling YAML unmarshalling intricacies, comprehensive validation logic.
    • Verification:
      • Unit tests for ParseClusterConfiguration:
        • Test with a valid examples/cluster.kat file. Parsed struct should match expected values.
        • Test with missing required fields; expect an error.
        • Test with invalid field values (e.g., bad CIDR, invalid port); expect an error.
        • Test with a file that includes some fields and omits optional ones; verify defaults are applied by SetClusterConfigDefaults.
      • An example examples/cluster.kat file should be created for testing.
  6. Implement Basic Parsing/Validation for Quadlet Files (internal/config/parse.go, internal/utils/tar.go)

    • Purpose: Enable the Leader to understand submitted Workload definitions.
    • Details:
      • In internal/utils/tar.go:
        • UntarQuadlets(reader io.Reader) (map[string][]byte, error): Takes a tar.gz stream, unpacks it in memory (or temp dir), and returns a map of fileName -> fileContent.
      • In internal/config/parse.go:
        • ParseQuadletFile(fileName string, content []byte) (interface{}, error):
          1. Unmarshal YAML content based on kind field (e.g., into Workload, VirtualLoadBalancer generated proto structs).
          2. Perform basic validation on the specific Quadlet type (e.g., Workload must have metadata.name, spec.type).
        • ParseQuadletDirectory(files map[string][]byte) (*Workload, *VirtualLoadBalancer, ..., error):
          1. Iterate through files from UntarQuadlets.
          2. Use ParseQuadletFile for each.
          3. Perform cross-Quadlet file validation (e.g., if build.kat exists, workload.kat must have spec.source.git). Placeholder for now, more in later phases.
    • Potential Challenges: Handling different Quadlet kinds, managing inter-file dependencies.
    • Verification:
      • Unit tests for UntarQuadlets with a sample tar.gz archive containing example Quadlet files.
      • Unit tests for ParseQuadletFile for each Quadlet type (workload.kat, VirtualLoadBalancer.kat etc.) with valid and invalid content.
      • An example Quadlet directory (e.g., examples/simple-service/) should be created and tarred for testing.
      • ParseQuadletDirectory successfully parses a valid collection of Quadlet files from the tar.
  • Milestone Verification (Overall Phase 0):
    1. Project repository is set up with Go modules and initial directory structure.
    2. make generate (or scripts/gen-proto.sh) successfully compiles api/v1alpha1/kat.proto into Go source files without errors. The generated Go code includes structs for Workload, VirtualLoadBalancer, JobDefinition, BuildDefinition, Namespace, internal Node, and ClusterConfiguration.
    3. Unit tests in internal/config/parse_test.go demonstrate:
      • Successful parsing of a valid cluster.kat file into the ClusterConfiguration struct, including application of default values.
      • Error handling for invalid or incomplete cluster.kat files.
    4. Unit tests in internal/config/parse_test.go (and potentially internal/utils/tar_test.go) demonstrate:
      • Successful untarring of a sample tar.gz Quadlet archive.
      • Successful parsing of individual Quadlet files (e.g., workload.kat, VirtualLoadBalancer.kat) into their respective Go structs (using generated proto types).
      • Basic validation of required fields within individual Quadlet files.
    5. All code is committed to Git.
    6. (Optional but good practice) A basic README.md is started.