more docs

This commit is contained in:
Tanishq Dubey 2025-05-10 13:53:29 -04:00
parent 04042795c5
commit b723a004f2
No known key found for this signature in database
GPG Key ID: CFC1931B84DFC3F9
2 changed files with 279 additions and 0 deletions

5
.gitignore vendored
View File

@ -23,3 +23,8 @@ go.work.sum
# env file
.env
.DS_Store
kat-agent
katcall
.aider*

274
docs/plan/phase0.md Normal file
View File

@ -0,0 +1,274 @@
# **Phase 0: Project Setup & Core Types**
* **Goal**: Initialize the project structure, establish version control and build tooling, define the core data structures (primarily through Protocol Buffers as specified in the RFC), and ensure basic parsing/validation capabilities for initial configuration files.
* **RFC Sections Primarily Used**: Overall project understanding, Section 8.2 (Resource Representation Proto3 & JSON), Section 3 (Resource Model - for identifying initial protos), Section 3.9 (Cluster Configuration - for `cluster.kat`).
**Tasks & Sub-Tasks:**
1. **Initialize Git Repository & Go Module**
* **Purpose**: Establish version control and Go project identity.
* **Details**:
* Create the root project directory (e.g., `kat-system`).
* Navigate into the directory: `cd kat-system`.
* Initialize Git: `git init`.
* Create an initial `.gitignore` file. Add common Go and OS-specific ignores (e.g., `*.o`, `*.exe`, `*~`, `.DS_Store`, compiled binaries like `kat-agent`, `katcall`).
* Initialize Go module: `go mod init github.com/dws-llc/kat-system` (or your chosen module path).
* **Verification**:
* `.git` directory exists.
* `go.mod` file is created with the correct module path.
* Initial commit can be made.
2. **Create Initial Directory Structure**
* **Purpose**: Lay out the skeleton of the project for organizing code and artifacts.
* **Details**: Create the top-level directories as outlined in the "Proposed Directory/File Structure" from the previous response:
```
kat-system/
├── api/
│ └── v1alpha1/
├── cmd/
│ ├── kat-agent/
│ └── katcall/
├── docs/
│ └── rfc/
├── examples/
├── internal/
├── pkg/ # (Optional, if you decide to have externally importable library code not part of 'internal')
├── scripts/
└── test/
``` * Place the `RFC001-KAT.md` into `docs/rfc/`.
* **Verification**: Directory structure matches the plan.
3. **Define Initial Protocol Buffer Messages (`api/v1alpha1/kat.proto`)**
* **Purpose**: Create the canonical definitions for KAT resources that will be used for API communication and internal state representation.
* **Details**:
* Create `api/v1alpha1/kat.proto`.
* Define initial messages based on RFC Section 3 and Section 8.2. Focus on data structures, not RPC service definitions yet.
* **Common Metadata**:
```protobuf
message ObjectMeta {
string name = 1;
string namespace = 2;
string uid = 3;
int64 generation = 4;
string resource_version = 5; // e.g., etcd ModRevision
google.protobuf.Timestamp creation_timestamp = 6;
map<string, string> labels = 7;
map<string, string> annotations = 8; // For future use
}
message Timestamp { // google.protobuf.Timestamp might be better
int64 seconds = 1;
int32 nanos = 2;
}
```
* **`Workload` (RFC 3.2)**:
```protobuf
enum WorkloadType {
WORKLOAD_TYPE_UNSPECIFIED = 0;
SERVICE = 1;
JOB = 2;
DAEMON_SERVICE = 3;
}
// ... (GitSource, UpdateStrategy, RestartPolicy, Container, VolumeMount, ResourceRequests, GPUSpec, Volume definitions)
message WorkloadSpec {
WorkloadType type = 1;
// Source source = 2; // Define GitSource, ImageSource, CacheImage
int32 replicas = 3;
// UpdateStrategy update_strategy = 4;
// RestartPolicy restart_policy = 5;
map<string, string> node_selector = 6;
// repeated Toleration tolerations = 7;
Container container = 8; // Define Container fully
repeated Volume volumes = 9; // Define Volume fully (SimpleClusterStorage, HostMount)
// ... other spec fields from workload.kat
}
message Workload {
ObjectMeta metadata = 1;
WorkloadSpec spec = 2;
// WorkloadStatus status = 3; // Define later
}
```
*(Start with core fields and expand. For brevity, not all sub-messages are listed here, but they need to be defined based on `workload.kat` fields in RFC 3.2)*
* **`VirtualLoadBalancer` (RFC 3.3)**:
```protobuf
message VirtualLoadBalancerSpec {
// repeated Port ports = 1;
// HealthCheck health_check = 2;
// repeated IngressRule ingress = 3;
}
message VirtualLoadBalancer { // This might be part of Workload or a separate resource
ObjectMeta metadata = 1; // Name likely matches Workload name
VirtualLoadBalancerSpec spec = 2;
}
```
*Consider if this is embedded in `Workload.spec` or a truly separate resource associated by name.* RFC shows it as a separate `*.kat` file, implying separate resource.
* **`JobDefinition` (RFC 3.4)**: Similar structure, `JobDefinitionSpec` with fields like `schedule`, `completions`.
* **`BuildDefinition` (RFC 3.5)**: Similar structure, `BuildDefinitionSpec` with fields like `buildContext`, `dockerfilePath`.
* **`Namespace` (RFC 3.7)**:
```protobuf
message NamespaceSpec {
// Potentially finalizers or other future spec fields
}
message Namespace {
ObjectMeta metadata = 1;
NamespaceSpec spec = 2;
// NamespaceStatus status = 3; // Define later
}
```
* **`Node` (Internal Representation - RFC 3.8)**: (This is for Leader's internal state, not a user-defined Quadlet)
```protobuf
message NodeResources {
string cpu = 1;
string memory = 2;
// map<string, string> custom_resources = 3; // e.g., for GPUs
}
message NodeStatusDetails { // For status reporting by agent
NodeResources capacity = 1;
NodeResources allocatable = 2;
// repeated WorkloadInstanceStatus workload_instances = 3;
// OverlayNetworkStatus overlay_network = 4;
string condition = 5; // e.g., "Ready", "NotReady"
google.protobuf.Timestamp last_heartbeat_time = 6;
}
message NodeSpec { // Configuration for a node, some set by leader
// repeated Taint taints = 1;
string overlay_subnet = 2; // Assigned by leader
}
message Node { // Represents a node in the cluster
ObjectMeta metadata = 1; // Name is the unique node name
NodeSpec spec = 2;
NodeStatusDetails status = 3;
}
```
* **`ClusterConfiguration` (RFC 3.9)**:
```protobuf
message ClusterConfigurationSpec {
string cluster_cidr = 1;
string service_cidr = 2;
int32 node_subnet_bits = 3;
string cluster_domain = 4;
int32 agent_port = 5;
int32 api_port = 6;
int32 etcd_peer_port = 7;
int32 etcd_client_port = 8;
string volume_base_path = 9;
string backup_path = 10;
int32 backup_interval_minutes = 11;
int32 agent_tick_seconds = 12;
int32 node_loss_timeout_seconds = 13;
}
message ClusterConfiguration {
ObjectMeta metadata = 1; // e.g., name of the cluster
ClusterConfigurationSpec spec = 2;
}
```
* Include `syntax = "proto3";` and appropriate `package` and `option go_package` statements.
* Import `google/protobuf/timestamp.proto` if used.
* **Potential Challenges**: Accurately translating all nested YAML structures from Quadlet definitions into Protobuf messages. Deciding on naming conventions.
* **Verification**: `kat.proto` file is syntactically correct. It includes initial definitions for the key resources.
4. **Set Up Protobuf Code Generation (`scripts/gen-proto.sh`, Makefile target)**
* **Purpose**: Automate the conversion of `.proto` definitions into Go code.
* **Details**:
* Install `protoc` (protobuf compiler) and `protoc-gen-go` plugin. Add to `go.mod` via `go get google.golang.org/protobuf/cmd/protoc-gen-go` and `go install google.golang.org/protobuf/cmd/protoc-gen-go`.
* Create `scripts/gen-proto.sh`:
```bash
#!/bin/bash
set -e
PROTOC_GEN_GO=$(go env GOBIN)/protoc-gen-go
if [ ! -f "$PROTOC_GEN_GO" ]; then
echo "protoc-gen-go not found. Please run: go install google.golang.org/protobuf/cmd/protoc-gen-go"
exit 1
fi
API_DIR="./api/v1alpha1"
OUT_DIR="${API_DIR}/generated" # Or directly into api/v1alpha1 if preferred
mkdir -p "$OUT_DIR"
protoc --proto_path="${API_DIR}" \
--go_out="${OUT_DIR}" --go_opt=paths=source_relative \
"${API_DIR}/kat.proto"
echo "Protobuf Go code generated in ${OUT_DIR}"
```
*(Adjust paths and options as needed. `paths=source_relative` is common.)*
* Make the script executable: `chmod +x scripts/gen-proto.sh`.
* (Optional) Add a Makefile target:
```makefile
.PHONY: generate
generate:
@echo "Generating Go code from Protobuf definitions..."
@./scripts/gen-proto.sh
```
* **Verification**:
* Running `scripts/gen-proto.sh` (or `make generate`) executes without errors.
* Go files (e.g., `kat.pb.go`) are generated in the specified output directory (`api/v1alpha1/generated/` or `api/v1alpha1/`).
* These generated files compile if included in a Go program.
5. **Implement Basic Parsing and Validation for `cluster.kat` (`internal/config/parse.go`, `internal/config/types.go`)**
* **Purpose**: Enable `kat-agent init` to read and understand its initial cluster-wide configuration.
* **Details**:
* In `internal/config/types.go` (or use generated proto types directly if preferred for consistency): Define Go structs that mirror `ClusterConfiguration` from `kat.proto`.
* If using proto types: the generated `ClusterConfiguration` struct can be used directly.
* In `internal/config/parse.go`:
* `ParseClusterConfiguration(filePath string) (*ClusterConfiguration, error)`:
1. Read the file content.
2. Unmarshal YAML into the Go struct (e.g., using `gopkg.in/yaml.v3`).
3. Perform basic validation:
* Check for required fields (e.g., `clusterCIDR`, `serviceCIDR`, ports).
* Validate CIDR formats.
* Ensure ports are within valid range.
* Ensure intervals are positive.
* `SetClusterConfigDefaults(config *ClusterConfiguration)`: Apply default values as per RFC 3.9 if fields are not set.
* **Potential Challenges**: Handling YAML unmarshalling intricacies, comprehensive validation logic.
* **Verification**:
* Unit tests for `ParseClusterConfiguration`:
* Test with a valid `examples/cluster.kat` file. Parsed struct should match expected values.
* Test with missing required fields; expect an error.
* Test with invalid field values (e.g., bad CIDR, invalid port); expect an error.
* Test with a file that includes some fields and omits optional ones; verify defaults are applied by `SetClusterConfigDefaults`.
* An example `examples/cluster.kat` file should be created for testing.
6. **Implement Basic Parsing/Validation for Quadlet Files (`internal/config/parse.go`, `internal/utils/tar.go`)**
* **Purpose**: Enable the Leader to understand submitted Workload definitions.
* **Details**:
* In `internal/utils/tar.go`:
* `UntarQuadlets(reader io.Reader) (map[string][]byte, error)`: Takes a `tar.gz` stream, unpacks it in memory (or temp dir), and returns a map of `fileName -> fileContent`.
* In `internal/config/parse.go`:
* `ParseQuadletFile(fileName string, content []byte) (interface{}, error)`:
1. Unmarshal YAML content based on `kind` field (e.g., into `Workload`, `VirtualLoadBalancer` generated proto structs).
2. Perform basic validation on the specific Quadlet type (e.g., `Workload` must have `metadata.name`, `spec.type`).
* `ParseQuadletDirectory(files map[string][]byte) (*Workload, *VirtualLoadBalancer, ..., error)`:
1. Iterate through files from `UntarQuadlets`.
2. Use `ParseQuadletFile` for each.
3. Perform cross-Quadlet file validation (e.g., if `build.kat` exists, `workload.kat` must have `spec.source.git`). Placeholder for now, more in later phases.
* **Potential Challenges**: Handling different Quadlet `kind`s, managing inter-file dependencies.
* **Verification**:
* Unit tests for `UntarQuadlets` with a sample `tar.gz` archive containing example Quadlet files.
* Unit tests for `ParseQuadletFile` for each Quadlet type (`workload.kat`, `VirtualLoadBalancer.kat` etc.) with valid and invalid content.
* An example Quadlet directory (e.g., `examples/simple-service/`) should be created and tarred for testing.
* `ParseQuadletDirectory` successfully parses a valid collection of Quadlet files from the tar.
* **Milestone Verification (Overall Phase 0)**:
1. Project repository is set up with Go modules and initial directory structure.
2. `make generate` (or `scripts/gen-proto.sh`) successfully compiles `api/v1alpha1/kat.proto` into Go source files without errors. The generated Go code includes structs for `Workload`, `VirtualLoadBalancer`, `JobDefinition`, `BuildDefinition`, `Namespace`, internal `Node`, and `ClusterConfiguration`.
3. Unit tests in `internal/config/parse_test.go` demonstrate:
* Successful parsing of a valid `cluster.kat` file into the `ClusterConfiguration` struct, including application of default values.
* Error handling for invalid or incomplete `cluster.kat` files.
4. Unit tests in `internal/config/parse_test.go` (and potentially `internal/utils/tar_test.go`) demonstrate:
* Successful untarring of a sample `tar.gz` Quadlet archive.
* Successful parsing of individual Quadlet files (e.g., `workload.kat`, `VirtualLoadBalancer.kat`) into their respective Go structs (using generated proto types).
* Basic validation of required fields within individual Quadlet files.
5. All code is committed to Git.
6. (Optional but good practice) A basic `README.md` is started.