6.2 KiB
6.2 KiB
Phase 1: State Management & Leader Election
- Goal: Establish the foundational state layer using embedded etcd and implement a reliable leader election mechanism. A single
kat-agent
can initialize a cluster, become its leader, and store initial configuration. - RFC Sections Primarily Used: 2.2 (Embedded etcd), 3.9 (ClusterConfiguration), 5.1 (State Store Interface), 5.2 (etcd Implementation Details), 5.3 (Leader Election).
Tasks & Sub-Tasks:
-
Define
StateStore
Go Interface (internal/store/interface.go
)- Purpose: Create the abstraction layer for all state operations, decoupling the rest of the system from direct etcd dependencies.
- Details: Transcribe the Go interface from RFC 5.1 verbatim. Include
KV
,WatchEvent
,EventType
,Compare
,Op
,OpType
structs/constants. - Verification: Code compiles. Interface definition matches RFC.
-
Implement Embedded etcd Server Logic (
internal/store/etcd.go
)- Purpose: Allow
kat-agent
to run its own etcd instance for single-node clusters or as part of a multi-node quorum. - Details:
- Use
go.etcd.io/etcd/server/v3/embed
. - Function to start an embedded etcd server:
- Input: configuration parameters (data directory, peer URLs, client URLs, name). These will come from
cluster.kat
or defaults. - Output: a running
embed.Etcd
instance or an error.
- Input: configuration parameters (data directory, peer URLs, client URLs, name). These will come from
- Graceful shutdown logic for the embedded etcd server.
- Use
- Verification: A test can start and stop an embedded etcd server. Data directory is created and used.
- Purpose: Allow
-
Implement
StateStore
with etcd Backend (internal/store/etcd.go
)- Purpose: Provide the concrete implementation for interacting with an etcd cluster (embedded or external).
- Details:
- Create a struct that implements the
StateStore
interface and holds anetcd/clientv3.Client
. - Implement
Put(ctx, key, value)
: Useclient.Put()
. - Implement
Get(ctx, key)
: Useclient.Get()
. Handle key-not-found. PopulateKV.Version
withModRevision
. - Implement
Delete(ctx, key)
: Useclient.Delete()
. - Implement
List(ctx, prefix)
: Useclient.Get()
withclientv3.WithPrefix()
. - Implement
Watch(ctx, keyOrPrefix, startRevision)
: Useclient.Watch()
. Translate etcd events toWatchEvent
. - Implement
Close()
: Close theclientv3.Client
. - Implement
Campaign(ctx, leaderID, leaseTTLSeconds)
:- Use
concurrency.NewSession()
to create a lease. - Use
concurrency.NewElection()
andelection.Campaign()
. - Return a context that is cancelled when leadership is lost (e.g., by watching the campaign context or session done channel).
- Use
- Implement
Resign(ctx)
: Useelection.Resign()
. - Implement
GetLeader(ctx)
: Observe the election or query the leader key. - Implement
DoTransaction(ctx, checks, onSuccess, onFailure)
: Useclient.Txn()
withclientv3.Compare
andclientv3.Op
.
- Create a struct that implements the
- Potential Challenges: Correctly handling etcd transaction semantics, context propagation, and error translation. Efficiently managing watches.
- Verification:
- Unit tests for each
StateStore
method using a real embedded etcd instance (test-scoped). - Verify
Put
thenGet
retrieves the correct value and version. - Verify
List
with prefix. - Verify
Delete
removes the key. - Verify
Watch
receives correct events for puts/deletes. - Verify
DoTransaction
commits on success and rolls back on failure.
- Unit tests for each
-
Integrate Leader Election into
kat-agent
(cmd/kat-agent/main.go
,internal/leader/election.go
- new file maybe)- Purpose: Enable an agent instance to attempt to become the cluster leader.
- Details:
kat-agent
main function will initialize itsStateStore
client.- A dedicated goroutine will call
StateStore.Campaign()
. - The outcome of
Campaign
(e.g., leadership acquired, context for leadership duration) will determine if the agent activates its Leader-specific logic (Phase 2+). - Leader ID could be
nodeName
or a UUID. Lease TTL fromcluster.kat
.
- Verification:
- Start one
kat-agent
with etcd enabled; it should log "became leader". - Start a second
kat-agent
configured to connect to the first's etcd; it should log "observing leader " or similar, but not become leader itself. - If the first agent (leader) is stopped, the second agent should eventually log "became leader".
- Start one
-
Implement Basic
kat-agent init
Command (cmd/kat-agent/main.go
,internal/config/parse.go
)- Purpose: Initialize a new KAT cluster (single node initially).
- Details:
- Define
init
subcommand inkat-agent
using a CLI library (e.g.,cobra
). - Flag:
--config <path_to_cluster.kat>
. - Parse
cluster.kat
(from Phase 0, now used to extract etcd peer/client URLs, data dir, backup paths etc.). - Generate a persistent Cluster UID and store it in etcd (e.g.,
/kat/config/cluster_uid
). - Store
cluster.kat
relevant parameters (or the whole sanitized config) into etcd (e.g., under/kat/config/cluster_config
). - Start the embedded etcd server using parsed configurations.
- Initiate leader election.
- Define
- Potential Challenges: Ensuring
cluster.kat
parsing is robust. Handling existing data directories. - Milestone Verification:
- Running
kat-agent init --config examples/cluster.kat
on a clean system:- Starts the
kat-agent
process. - Creates the etcd data directory.
- Logs "Successfully initialized etcd".
- Logs "Became leader: ".
- Using
etcdctl
(or a simpleStateStore.Get
test client):- Verify
/kat/config/cluster_uid
exists and has a UUID. - Verify
/kat/config/cluster_config
(or similar keys) contains data fromcluster.kat
(e.g.,clusterCIDR
,serviceCIDR
,agentPort
,apiPort
). - Verify the leader election key exists for the current leader.
- Verify
- Starts the
- Running