# **Phase 1: State Management & Leader Election**

*   **Goal**: Establish the foundational state layer using embedded etcd and implement a reliable leader election mechanism. A single `kat-agent` can initialize a cluster, become its leader, and store initial configuration.
*   **RFC Sections Primarily Used**: 2.2 (Embedded etcd), 3.9 (ClusterConfiguration), 5.1 (State Store Interface), 5.2 (etcd Implementation Details), 5.3 (Leader Election).

**Tasks & Sub-Tasks:**

1.  **Define `StateStore` Go Interface (`internal/store/interface.go`)**
    *   **Purpose**: Create the abstraction layer for all state operations, decoupling the rest of the system from direct etcd dependencies.
    *   **Details**: Transcribe the Go interface from RFC 5.1 verbatim. Include `KV`, `WatchEvent`, `EventType`, `Compare`, `Op`, `OpType` structs/constants.
    *   **Verification**: Code compiles. Interface definition matches RFC.

2.  **Implement Embedded etcd Server Logic (`internal/store/etcd.go`)**
    *   **Purpose**: Allow `kat-agent` to run its own etcd instance for single-node clusters or as part of a multi-node quorum.
    *   **Details**:
        *   Use `go.etcd.io/etcd/server/v3/embed`.
        *   Function to start an embedded etcd server:
            *   Input: configuration parameters (data directory, peer URLs, client URLs, name). These will come from `cluster.kat` or defaults.
            *   Output: a running `embed.Etcd` instance or an error.
        *   Graceful shutdown logic for the embedded etcd server.
    *   **Verification**: A test can start and stop an embedded etcd server. Data directory is created and used.

3.  **Implement `StateStore` with etcd Backend (`internal/store/etcd.go`)**
    *   **Purpose**: Provide the concrete implementation for interacting with an etcd cluster (embedded or external).
    *   **Details**:
        *   Create a struct that implements the `StateStore` interface and holds an `etcd/clientv3.Client`.
        *   Implement `Put(ctx, key, value)`: Use `client.Put()`.
        *   Implement `Get(ctx, key)`: Use `client.Get()`. Handle key-not-found. Populate `KV.Version` with `ModRevision`.
        *   Implement `Delete(ctx, key)`: Use `client.Delete()`.
        *   Implement `List(ctx, prefix)`: Use `client.Get()` with `clientv3.WithPrefix()`.
        *   Implement `Watch(ctx, keyOrPrefix, startRevision)`: Use `client.Watch()`. Translate etcd events to `WatchEvent`.
        *   Implement `Close()`: Close the `clientv3.Client`.
        *   Implement `Campaign(ctx, leaderID, leaseTTLSeconds)`:
            *   Use `concurrency.NewSession()` to create a lease.
            *   Use `concurrency.NewElection()` and `election.Campaign()`.
            *   Return a context that is cancelled when leadership is lost (e.g., by watching the campaign context or session done channel).
        *   Implement `Resign(ctx)`: Use `election.Resign()`.
        *   Implement `GetLeader(ctx)`: Observe the election or query the leader key.
        *   Implement `DoTransaction(ctx, checks, onSuccess, onFailure)`: Use `client.Txn()` with `clientv3.Compare` and `clientv3.Op`.
    *   **Potential Challenges**: Correctly handling etcd transaction semantics, context propagation, and error translation. Efficiently managing watches.
    *   **Verification**:
        *   Unit tests for each `StateStore` method using a real embedded etcd instance (test-scoped).
        *   Verify `Put` then `Get` retrieves the correct value and version.
        *   Verify `List` with prefix.
        *   Verify `Delete` removes the key.
        *   Verify `Watch` receives correct events for puts/deletes.
        *   Verify `DoTransaction` commits on success and rolls back on failure.

4.  **Integrate Leader Election into `kat-agent` (`cmd/kat-agent/main.go`, `internal/leader/election.go` - new file maybe)**
    *   **Purpose**: Enable an agent instance to attempt to become the cluster leader.
    *   **Details**:
        *   `kat-agent` main function will initialize its `StateStore` client.
        *   A dedicated goroutine will call `StateStore.Campaign()`.
        *   The outcome of `Campaign` (e.g., leadership acquired, context for leadership duration) will determine if the agent activates its Leader-specific logic (Phase 2+).
        *   Leader ID could be `nodeName` or a UUID. Lease TTL from `cluster.kat`.
    *   **Verification**:
        *   Start one `kat-agent` with etcd enabled; it should log "became leader".
        *   Start a second `kat-agent` configured to connect to the first's etcd; it should log "observing leader <leaderID>" or similar, but not become leader itself.
        *   If the first agent (leader) is stopped, the second agent should eventually log "became leader".

5.  **Implement Basic `kat-agent init` Command (`cmd/kat-agent/main.go`, `internal/config/parse.go`)**
    *   **Purpose**: Initialize a new KAT cluster (single node initially).
    *   **Details**:
        *   Define `init` subcommand in `kat-agent` using a CLI library (e.g., `cobra`).
        *   Flag: `--config <path_to_cluster.kat>`.
        *   Parse `cluster.kat` (from Phase 0, now used to extract etcd peer/client URLs, data dir, backup paths etc.).
        *   Generate a persistent Cluster UID and store it in etcd (e.g., `/kat/config/cluster_uid`).
        *   Store `cluster.kat` relevant parameters (or the whole sanitized config) into etcd (e.g., under `/kat/config/cluster_config`).
        *   Start the embedded etcd server using parsed configurations.
        *   Initiate leader election.
    *   **Potential Challenges**: Ensuring `cluster.kat` parsing is robust. Handling existing data directories.
    *   **Milestone Verification**:
        *   Running `kat-agent init --config examples/cluster.kat` on a clean system:
            *   Starts the `kat-agent` process.
            *   Creates the etcd data directory.
            *   Logs "Successfully initialized etcd".
            *   Logs "Became leader: <nodeName>".
            *   Using `etcdctl` (or a simple `StateStore.Get` test client):
                *   Verify `/kat/config/cluster_uid` exists and has a UUID.
                *   Verify `/kat/config/cluster_config` (or similar keys) contains data from `cluster.kat` (e.g., `clusterCIDR`, `serviceCIDR`, `agentPort`, `apiPort`).
                *   Verify the leader election key exists for the current leader.