6.2 KiB
6.2 KiB
Phase 1: State Management & Leader Election
- Goal: Establish the foundational state layer using embedded etcd and implement a reliable leader election mechanism. A single
kat-agentcan initialize a cluster, become its leader, and store initial configuration. - RFC Sections Primarily Used: 2.2 (Embedded etcd), 3.9 (ClusterConfiguration), 5.1 (State Store Interface), 5.2 (etcd Implementation Details), 5.3 (Leader Election).
Tasks & Sub-Tasks:
-
Define
StateStoreGo Interface (internal/store/interface.go)- Purpose: Create the abstraction layer for all state operations, decoupling the rest of the system from direct etcd dependencies.
- Details: Transcribe the Go interface from RFC 5.1 verbatim. Include
KV,WatchEvent,EventType,Compare,Op,OpTypestructs/constants. - Verification: Code compiles. Interface definition matches RFC.
-
Implement Embedded etcd Server Logic (
internal/store/etcd.go)- Purpose: Allow
kat-agentto run its own etcd instance for single-node clusters or as part of a multi-node quorum. - Details:
- Use
go.etcd.io/etcd/server/v3/embed. - Function to start an embedded etcd server:
- Input: configuration parameters (data directory, peer URLs, client URLs, name). These will come from
cluster.kator defaults. - Output: a running
embed.Etcdinstance or an error.
- Input: configuration parameters (data directory, peer URLs, client URLs, name). These will come from
- Graceful shutdown logic for the embedded etcd server.
- Use
- Verification: A test can start and stop an embedded etcd server. Data directory is created and used.
- Purpose: Allow
-
Implement
StateStorewith etcd Backend (internal/store/etcd.go)- Purpose: Provide the concrete implementation for interacting with an etcd cluster (embedded or external).
- Details:
- Create a struct that implements the
StateStoreinterface and holds anetcd/clientv3.Client. - Implement
Put(ctx, key, value): Useclient.Put(). - Implement
Get(ctx, key): Useclient.Get(). Handle key-not-found. PopulateKV.VersionwithModRevision. - Implement
Delete(ctx, key): Useclient.Delete(). - Implement
List(ctx, prefix): Useclient.Get()withclientv3.WithPrefix(). - Implement
Watch(ctx, keyOrPrefix, startRevision): Useclient.Watch(). Translate etcd events toWatchEvent. - Implement
Close(): Close theclientv3.Client. - Implement
Campaign(ctx, leaderID, leaseTTLSeconds):- Use
concurrency.NewSession()to create a lease. - Use
concurrency.NewElection()andelection.Campaign(). - Return a context that is cancelled when leadership is lost (e.g., by watching the campaign context or session done channel).
- Use
- Implement
Resign(ctx): Useelection.Resign(). - Implement
GetLeader(ctx): Observe the election or query the leader key. - Implement
DoTransaction(ctx, checks, onSuccess, onFailure): Useclient.Txn()withclientv3.Compareandclientv3.Op.
- Create a struct that implements the
- Potential Challenges: Correctly handling etcd transaction semantics, context propagation, and error translation. Efficiently managing watches.
- Verification:
- Unit tests for each
StateStoremethod using a real embedded etcd instance (test-scoped). - Verify
PutthenGetretrieves the correct value and version. - Verify
Listwith prefix. - Verify
Deleteremoves the key. - Verify
Watchreceives correct events for puts/deletes. - Verify
DoTransactioncommits on success and rolls back on failure.
- Unit tests for each
-
Integrate Leader Election into
kat-agent(cmd/kat-agent/main.go,internal/leader/election.go- new file maybe)- Purpose: Enable an agent instance to attempt to become the cluster leader.
- Details:
kat-agentmain function will initialize itsStateStoreclient.- A dedicated goroutine will call
StateStore.Campaign(). - The outcome of
Campaign(e.g., leadership acquired, context for leadership duration) will determine if the agent activates its Leader-specific logic (Phase 2+). - Leader ID could be
nodeNameor a UUID. Lease TTL fromcluster.kat.
- Verification:
- Start one
kat-agentwith etcd enabled; it should log "became leader". - Start a second
kat-agentconfigured to connect to the first's etcd; it should log "observing leader " or similar, but not become leader itself. - If the first agent (leader) is stopped, the second agent should eventually log "became leader".
- Start one
-
Implement Basic
kat-agent initCommand (cmd/kat-agent/main.go,internal/config/parse.go)- Purpose: Initialize a new KAT cluster (single node initially).
- Details:
- Define
initsubcommand inkat-agentusing a CLI library (e.g.,cobra). - Flag:
--config <path_to_cluster.kat>. - Parse
cluster.kat(from Phase 0, now used to extract etcd peer/client URLs, data dir, backup paths etc.). - Generate a persistent Cluster UID and store it in etcd (e.g.,
/kat/config/cluster_uid). - Store
cluster.katrelevant parameters (or the whole sanitized config) into etcd (e.g., under/kat/config/cluster_config). - Start the embedded etcd server using parsed configurations.
- Initiate leader election.
- Define
- Potential Challenges: Ensuring
cluster.katparsing is robust. Handling existing data directories. - Milestone Verification:
- Running
kat-agent init --config examples/cluster.katon a clean system:- Starts the
kat-agentprocess. - Creates the etcd data directory.
- Logs "Successfully initialized etcd".
- Logs "Became leader: ".
- Using
etcdctl(or a simpleStateStore.Gettest client):- Verify
/kat/config/cluster_uidexists and has a UUID. - Verify
/kat/config/cluster_config(or similar keys) contains data fromcluster.kat(e.g.,clusterCIDR,serviceCIDR,agentPort,apiPort). - Verify the leader election key exists for the current leader.
- Verify
- Starts the
- Running