WIP: Phase 2 #2

Draft
dubey wants to merge 27 commits from phase2 into main
Owner

Phase 2: Basic Agent & Node Lifecycle (Init, Join, PKI)

  • Goal: Initial Leader setup, a second Agent joining with mTLS, and heartbeating.
  • Tasks:
    1. Implement Internal PKI (RFC 10.6) in internal/pki/:
      • CA key/cert generation on kat-agent init.
      • CSR generation by agent on join.
      • CSR signing by Leader.
    2. Implement initial Node Communication Protocol (RFC 2.3) for join:
      • Agent (kat-agent join --leader-api <...> --advertise-address <...>) sends CSR to Leader.
      • Leader validates, signs, returns certs & CA. Stores node registration (name, UID, advertise addr, WG pubkey placeholder) in etcd.
    3. Implement basic mTLS for this join communication.
    4. Implement Node Heartbeat (POST /v1alpha1/nodes/{nodeName}/status) from Agent to Leader (RFC 4.1.3). Leader updates node status in etcd.
    5. Leader implements basic failure detection (marks Node NotReady in etcd if heartbeats cease) (RFC 4.1.4).
  • Milestone:
    • kat-agent init establishes a Leader with a CA.
    • kat-agent join allows a second agent to securely register with the Leader, obtain certificates, and store its info in etcd.
    • Leader's API receives heartbeats from the joined Agent.
    • If a joined Agent is stopped, the Leader marks its status as NotReady in etcd after nodeLossTimeoutSeconds.
**Phase 2: Basic Agent & Node Lifecycle (Init, Join, PKI)** * **Goal**: Initial Leader setup, a second Agent joining with mTLS, and heartbeating. * **Tasks**: 1. Implement Internal PKI (RFC 10.6) in `internal/pki/`: * CA key/cert generation on `kat-agent init`. * CSR generation by agent on join. * CSR signing by Leader. 2. Implement initial Node Communication Protocol (RFC 2.3) for join: * Agent (`kat-agent join --leader-api <...> --advertise-address <...>`) sends CSR to Leader. * Leader validates, signs, returns certs & CA. Stores node registration (name, UID, advertise addr, WG pubkey placeholder) in etcd. 3. Implement basic mTLS for this join communication. 4. Implement Node Heartbeat (`POST /v1alpha1/nodes/{nodeName}/status`) from Agent to Leader (RFC 4.1.3). Leader updates node status in etcd. 5. Leader implements basic failure detection (marks Node `NotReady` in etcd if heartbeats cease) (RFC 4.1.4). * **Milestone**: * `kat-agent init` establishes a Leader with a CA. * `kat-agent join` allows a second agent to securely register with the Leader, obtain certificates, and store its info in etcd. * Leader's API receives heartbeats from the joined Agent. * If a joined Agent is stopped, the Leader marks its status as `NotReady` in etcd after `nodeLossTimeoutSeconds`.
dubey added 20 commits 2025-05-17 13:24:17 -04:00
feat: implement PKI initialization and leader mTLS certificate generation
```go
package pki

import (
	// other imports
	"path/filepath"
)

const (
	// Default key size for RSA keys
	DefaultRSAKeySize = 2048
	// Default CA certificate validity period
	DefaultCAValidityDays = 3650 // ~10 years
	// Default certificate validity period
	DefaultCertValidityDays = 365 // 1 year
	// Default PKI directory
	DefaultPKIDir = "/var/lib/kat/pki"
)

// GetPKIPathFromClusterConfig determines the PKI directory from the cluster configuration.
// If backupPath is provided, it uses the parent directory of backupPath.
// Otherwise, it uses the default PKI directory.
func GetPKIPathFromClusterConfig(backupPath string) string {
	if backupPath == "" {
		return DefaultPKIDir
	}

	// Use the parent directory of backupPath
	return filepath.Dir(backupPath) + "/pki"
}

// generateSerialNumber creates a random serial number for certificates
func generateSerialNumber() (*big.Int, error) {
	serialNumberLimit := new(big.Int).Lsh(big.NewInt(1), 128) // 128 bits
	return rand.Int(rand.Reader, serialNumberLimit)
}

// Rest of the existing code...
```

The changes:
1. Removed the duplicate `GetPKIPathFromClusterConfig` function
2. Kept the single implementation that checks for an empty backup path
3. Maintained the default PKI directory as `/var/lib/kat/pki`

This should resolve the duplicate function issue while maintaining the desired functionality.

Would you like me to generate a commit message for this change?
feat: add node registration verification and idle loop for joined nodes
Add verbose to test
All checks were successful
Integration Tests / integration-tests (pull_request) Successful in 9m55s
Unit Tests / unit-tests (pull_request) Successful in 10m10s
dad5586339
dubey added 7 commits 2025-05-18 11:35:55 -04:00
All checks were successful
Unit Tests / unit-tests (pull_request) Successful in 9m58s
Integration Tests / integration-tests (pull_request) Successful in 9m58s
This pull request is marked as a work in progress.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin phase2:phase2
git checkout phase2
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dubey/kat#2
No description provided.