WebRTC at Scale: Transitioning from Jitsi to LiveKit on Kubernetes
Real-time video communication today relies almost exclusively on WebRTC. However, WebRTC is not a …

When planning cross-site infrastructure, architects often face a fundamental decision: Do we stretch a single Kubernetes cluster across two geographic locations (Stretched Cluster) or operate an independent cluster in each region?
The idea of a Stretched Cluster initially seems elegant: There is only one control plane, and Kubernetes automatically distributes workloads across both locations. However, what sounds simple in theory often proves to be a risky complexity trap in critical infrastructure environments.
A Stretched Cluster requires an extremely lossless and low-latency connection between locations. Creating this tight coupling introduces new dependencies:
For critical infrastructure scenarios, a multi-region architecture with decoupled clusters has proven to be the more robust path. Here, a fully autonomous Kubernetes cluster is operated in each region.
Since each cluster has its own control plane, it is completely independent. A technical issue or failed update in Region A has no physical impact on Region B. This “Shared-Nothing” approach is the safest form of isolation.
If the network connection between locations fails, both clusters continue to operate locally without restrictions. There is no leadership struggle and no downtime due to missing quorums over long distances.
To allow the clusters to communicate with each other (e.g., for database replication), modern network layers like Cilium Cluster Mesh are used. This enables secure service-level communication across cluster boundaries without tightly coupling the fate of the two clusters.
While a Stretched Cluster may work for local campus networks with direct fiber connections, it is often too fragile for true geo-redundancy over long distances. The architecture with autonomous clusters per region provides the necessary stability and predictability that critical infrastructure operators need. It trades the illusion of a “single truth” for the reality of two strong, independent pillars.
Is the administrative overhead with two clusters not twice as high? Technically, yes, but this overhead is neutralized through automation (GitOps). Tools like ArgoCD ensure that configurations and applications are rolled out identically in both clusters without manual duplication of work.
How do services in Cluster A find a service in Cluster B? A global service discovery system is used for this purpose (e.g., Cilium Cluster Mesh or external DNS solutions). A service in Region A can thus address a database endpoint in Region B via a standardized name as if it were locally available.
When is a Stretched Cluster even sensible? A Stretched Cluster is primarily suitable for scenarios with very short distances (e.g., two buildings on a campus) where extremely low latency (< 1-2ms) and dedicated lines are guaranteed, and regulatory requirements for site isolation are less strict.
How is quorum ensured with two autonomous clusters? Since each cluster manages its own quorum (etcd) within the site (ideally across three availability zones within a site), the issue of cross-site quorum is completely eliminated.
Real-time video communication today relies almost exclusively on WebRTC. However, WebRTC is not a …
Nothing is more frustrating for an operations team than a 3 AM alarm that turns out to be a …
In traditional data processing, “batch processes” dominated for a long time: data was …