k3k: agent-less k3s in Kubernetes

Summary in Three Points

Controlplane on demand: With k3k, you can run a fully-fledged k3s control plane as a Kubernetes workload – without agent nodes. This enables lightning-fast spin-up/down of complete, valid k3s clusters for testing, tenants, or edge scenarios. (Background: agent-less servers are a feature of k3s where the control plane is not a normal node in the cluster, as is usually the case)
On-Prem turbo: Where provisioning full VMs for each control plane can take hours/days, k3k creates reproducible environments in minutes – ideal for end-to-end tests in CI, multi-tenant K3s setups, dev clusters, and resource-limited edge.
Compatible & open: k3s is a lightweight, CNCF-compliant Kubernetes with ARM support; Helm remains the familiar package manager. k3k leverages this – with Helm install, declarative, GitOps-friendly.

Why “Kubernetes in Kubernetes” at All?

Two motives dominate: speed and density.

Speed: In data centers with traditional VM provisioning, control plane rollouts are cumbersome: IaaS tickets, network segmentation, firewalls, load balancer configuration, image hardening, patch scheduling, etc. Even with good automation, many moving parts remain. A control plane as a pod set in the existing management cluster reduces this path to “install chart → create CR → done.” For CI pipelines (feature branches, E2E tests, reproducible benchmarks), this is a game changer.
Density: Multiple fully-fledged k3s clusters side by side – without dedicated master VMs – significantly increase tenant density in dev/stage. Each cluster gets its own API server, controller manager, and datastore; the overhead per control plane decreases. The principle is not new: Projects like Kamaji and Gardener also host the control plane as pods (hosted control plane/seed-shoot) to provide many clusters scalably and cost-efficiently.

k3k implements this idea specifically for k3s – the “lightweight” Kubernetes that has proven itself well in edge/IoT and dev scenarios.

What Does “Agent-less” Mean in k3s – and Why Is It Relevant?

k3s distinguishes between server (control plane) and agent (node for workload scheduling). In agent-less mode, you operate only the server, without kubelet – as a result, the control plane does not appear as a worker node in the cluster. This is exactly what you want for embedded control planes: clean separation of “control” and “workload.” The k3s project explicitly documents agent-less operation, including notes (e.g., network/egress selector modes, since without kube-proxy/flannel, other paths apply).

Advantages:

Security & Clarity: No application pods on control plane nodes, no lateral usage “by accident.”
Fast Lifecycles: Control planes are rolled/replaced like any other app (e.g., blue/green), without affecting worker scheduling.
Resource Efficiency: Minimal footprint; the k3s server is lean, and startup times are short.

Important: If you want to run workloads, add separate agent nodes (or “shared agents” within the management cluster). This remains deliberately decoupled.

Overview of k3k

The k3k chart maintained by ayedo models a k3s control plane in Kubernetes. In the “Kubernetes-in-Kubernetes” topology, an isolated control plane namespace with the corresponding pods (api-server, controller-manager, datastore) is created for each desired cluster. The cluster lifecycle (create, upgrade, delete) can be automated via Helm values/CRDs – attractive for CI, multi-tenant dev, staging, and short-lived test environments.

To put it in context: There is also a Rancher project k3k (Kubernetes in Kubernetes) that follows the same basic idea (controller + CLI, Helm deploy, CRD “clusters.k3k.io”). This underscores the general architectural trend to roll out control planes as pods instead of maintaining master VMs for each cluster.

Speed and Innovation Advantage – Especially On-Prem

On-premises is traditionally VM-driven. Requesting master VMs for each test cluster results in lead times of hours to days (change windows, storage allocation, network approvals, LB config, security controls). This slows down:

Development/QA: Feature branches wait for “representative” clusters.
SRE/Platform: Effort for maintenance, patching, dismantling “forgotten” test VMs.
Innovation: Experiments with API server flags, versions, add-ons remain stagnant.

k3k turns this around: Helm install in the management cluster, set values, apply CR – a fully functional k3s control plane is ready within minutes. This is not only faster but also reproducible (GitOps, PR-based reviews) and cost-efficient, as the control planes share resources of existing workers. For edge projects (ARM, low RAM/CPU), k3s is already a perfect fit.

Typical Use Cases

End-to-End Tests in CI

For each PR, a fresh k3s cluster is spun up, and tests run realistically against a real control plane. After merge/close, the cluster is dismantled. No more “YAML mocking” – real API semantics, real admission, real controller paths.
Multi-Tenant K3s

Teams/tenants get dedicated k3s clusters instead of namespace isolation. Advantages: clear blast radius boundaries, individual API server policies, separate CRDs and admission. Full isolation without new master VMs for each tenant.
Dev Clusters

Developers start “their” k3s cluster on demand. Test version changes? Evaluate API flags? Compare add-ons? All in minutes – and parallel to existing clusters, without risk to shared control planes.
Edge & Resource-Limited

k3s is small and ARM-capable. In edge scenarios, a central control layer (management cluster) can quickly provision control planes, connect agents on-site (VPN/ZTNA), and thus orchestrate hundreds of edge clusters.

Comparison & Alternatives

Kamaji (Clastix)

Kamaji operates hosted control planes as pods in the management cluster and scales many user clusters without master VMs. It is Cluster-API integrated and focuses on large-scale multi-cluster operations in enterprise environments. If you want to run many “full” Kubernetes clusters (not k3s) with centralized lifecycle, Kamaji is a mature option.

When to Use k3k Instead of Kamaji?

If k3s features (single binary, ARM optimization, low RAM) are important, CI spin-ups need to be lightning-fast, or if you explicitly prefer agent-less k3s to keep the control plane cleanly separated from scheduling.

Gardener (Seed/Shoot)

Gardener hosts the control planes of shoot clusters in the seed cluster – also “Kubeception.” Advantage: mature ecosystem, scaling large fleets, multi-provider operation. Disadvantage: higher entry complexity and focus on “full” K8s distributions. For k3s-specific, lightweight, and CI-close use cases, k3k is more agile and smaller.

Technical Key Points & Pitfalls

Agent-less Peculiarities: Without kubelet/kube-proxy on the control plane, network paths must be considered (e.g., egress-selector-mode pod/cluster, service access of the API server). The k3s docs list corresponding options and limits.
Upgrade Strategy: Control planes are “just” workloads – use rolling/blue-green and pinned container images (k3s versions) for reproducible upgrades.
Datastore Choice: k3s can use sqlite or etcd as well as external DBs. For short-lived CI clusters, sqlite is sufficient; persistent dev/stage or tenant clusters should plan for etcd or Postgres.
RBAC/Network: Dedicated namespaces per embedded cluster, network policies, separate service accounts, and secret scopes – avoids “cross-talk” between control planes.
Observability: Each control plane needs its own dashboards/alerts (API latency, etcd health, controller queues), otherwise signals are missing in case of disruptions.

Practice: Installing k3k via Helm and Creating a k3s Cluster

Note: Helm is the standard package manager for Kubernetes; good documentation and quickstart are available.

1) Define Values

The central settings are in the sections loadbalancer and k3s. If loadbalancer.enabled=false, the k3s API is not accessible from outside – in practice, you almost always want true and a fixed IP. Optionally, you can specify an LB class (e.g., MetalLB/Cilium LB). Additionally, in the k3s block, you define the agent-less operation and common deactivations (flannel, kube-proxy, traefik, etc. – relevant for using Cilium or ingress-nginx) as well as egress_selector_mode.

loadbalancer:
  enabled: true
  class_name: ""        # e.g., "metallb" or specific LB class
  ip: "1.2.3.4"         # fixed, routable LB IP in the host cluster

k3s:
  # basic secrets/tokens
  token: "secret"
  agent_token: "secret"

  # cluster networks matching your environment
  cluster_cidr: "10.42.0.0/16"
  service_cidr: "10.43.0.0/16"
  cluster_dns: "10.43.0.10"
  cluster_domain: "cluster.local"

  # data path & logging
  data_dir: /var/lib/rancher/k3s
  log_level: "0"
  debug: false

  # Agent-less & keep network stack lean
  disable:
    agent: true            # <— agent-less control plane
    flannel: true
    helm_controller: true
    network_policy: true
    traefik: true
    localstorage: true
    servicelb: true
    kube_proxy: true
    cloud_controller: false

  # Egress handling for the control plane
  egress_selector_mode: "pod"

  # Kubeconfig storage (created in the container)
  kubeconfig:
    filename: kubeconfig.yaml
    mode: "0644"

2) Install Chart

# Install k3k chart from the OCI registry
helm upgrade --install k3k oci://cargo.ayedo.cloud/library/k3k --namespace k3k --values values.yaml

3) Retrieve kubeconfig

After starting k3k, the chart automatically creates a secret in the same namespace that contains the kubeconfig of the embedded k3s control plane.

# 1) Find secret (by label or name pattern)
kubectl -n k3k get secrets

# Optionally, comfortably by label selector, if set:
kubectl -n k3k get secret k3k-k3k-kubeconfig

CI/CD with k3k: E2E Tests per Pull Request

A major promise of Kubernetes and cloud-native stacks is to radically accelerate development cycles. But those who want to set up a complete test environment for each pull request today quickly hit limits: Shared dev clusters are often overloaded, dedicated test environments too expensive or simply not fast enough. This is where k3k comes in: A control plane is just a Helm release away – lightweight, reproducible, and isolated.

Let’s imagine the typical process:

A developer opens a pull request. Instead of testing code in a shared cluster, the pipeline automatically rolls out a new k3s control plane via k3k. This control plane gets a fixed load balancer IP, starts in agent-less mode, and delivers a fully-fledged Kubernetes API within minutes. The pipeline waits until the API server is ready and then retrieves the automatically created kubeconfig from the namespace secret.

Now the actual application – the app’s Helm chart – can be deployed into this fresh cluster. Whether a simple web app or a complex microservice suite: the tests run in a clean, isolated environment.