Build or Buy Kubernetes? Part 2
The Hidden Costs of Kubernetes: Why Infrastructure is Usually the Smallest Expense Part 2 of our …

Part 1 of our series “Build or Buy Kubernetes”
When discussing the use of Kubernetes today, the conversation often revolves around [container] orchestration, scalability, or cloud-native architectures. However, the real question begins at a completely different point. It is not the introduction of Kubernetes that determines the success of a platform strategy, but the decision on who will take long-term responsibility for its operation.
This distinction may initially seem trivial. In practice, however, it is the source of many misconceptions.
Almost any company can deploy a functional Kubernetes cluster within a few hours today. Managed offerings from hyperscalers, distributions like RKE2 or Talos Linux, and automation tools like Terraform and Cluster API have significantly lowered the technical entry barrier. What once required specialized knowledge can now be reproducibly automated with just a few infrastructure definitions.
However, this development leads to a dangerous perception: because Kubernetes can be deployed relatively easily, there is an impression that its long-term operation is equally straightforward.
This is precisely where the real challenge begins.
There is a world of difference between a functioning cluster and a production-ready platform.
A cluster initially only provides the runtime environment in which containerized applications can be executed. However, it hardly answers any of the questions that become truly relevant in productive operation.
How are new Kubernetes versions introduced without endangering productive workloads? What security policies apply to deployments? How are secrets managed, certificates automatically renewed, or [container] images checked for vulnerabilities? What processes are in place in the event of a control plane failure? How is compliance with regulatory requirements like NIS2, DORA, or the Cyber Resilience Act demonstrated? And who ultimately bears the responsibility when all these mechanisms must function under real operating conditions?
The answers to these questions do not arise from Kubernetes itself. They arise from platform engineering.
This is precisely why the effort is systematically underestimated in many companies. While the technical discussion often revolves around distributions, CNI plugins, or ingress controllers, the actual complexity increasingly shifts to organizational, security-relevant, and operational processes. Kubernetes merely forms the foundation of a platform whose quality is measured not by the number of running pods but by its long-term operational capability.
The classic discussion usually boils down to three operating models.
Should the platform be fully self-operated?
Should a managed Kubernetes offering from a cloud provider be used?
Or should a specialized service provider take over the operation?
This consideration falls short because it implicitly treats Kubernetes as a product.
In reality, Kubernetes is a platform technology whose economic value only emerges through the surrounding processes. The real decision, therefore, is not who installs the cluster, but who takes responsibility for its entire lifecycle.
This responsibility encompasses far more than regular updates or applying security patches.
It begins with architectural decisions and extends through network design, identity management, backup strategies, software supply chain security, observability, incident response, compliance, documentation, and organizational operational processes to the continuous development of the platform itself.
In other words, choosing Kubernetes is not about selecting software. It is about building or utilizing a platform organization.
Many companies start their Kubernetes journey with a self-managed approach, and at first glance, this decision is not irrational. Operating Kubernetes oneself retains full control over architecture, infrastructure, security model, release cycles, and integration decisions. Especially for technically strong organizations, self-operation initially seems like the most consistent path: open source instead of proprietary platforms, own standards instead of vendor specifications, maximum adaptability instead of productized constraints.
These arguments are not wrong. They are just often incompletely evaluated.
Because operating a Kubernetes cluster oneself does not merely mean that a company installs, configures, and extends a technical platform according to its own requirements. It means that this company permanently assumes full responsibility for the lifecycle of a complex, distributed operating environment, whose stability does not arise from a single component but from the interplay of network, storage, identity, security, observability, automation, governance, and operational experience.
This is precisely where the real misconception arises: many organizations confuse the ability to set up Kubernetes with the ability to operate Kubernetes reliably, securely, auditable, and economically over the years.
A cluster is quickly created. A platform only emerges through the consistent standardization of operational reality.
This includes not only applying Kubernetes versions but systematically evaluating their impact on APIs, controllers, admission policies, custom resource definitions, ingress behavior, network components, and dependent workloads. It involves not viewing CNI and CSI components as interchangeable add-ons but as critical parts of the runtime environment, whose misbehavior can have immediate effects on accessibility, data consistency, and recoverability. It involves not only registering security vulnerabilities but evaluating their relevance to one’s own platform under time pressure, planning maintenance windows, and rolling out changes in a way that keeps production systems stable.
The longer a cluster is operated productively, the more evident it becomes that Kubernetes is not a completed infrastructure project. It evolves into a long-lived operational product with a continuous lifecycle.
One of the most underestimated consequences of Kubernetes is that not only the technical architecture changes, but also the organizational structure.
As soon as multiple development teams use the same platform, internal customers automatically emerge. Developers expect reproducible development environments, standardized deployment processes, self-service, understandable documentation, and short provisioning times. Security officers expect traceable policies and audit-proof processes. Auditors demand reliable evidence, while management requires availability, predictability, and calculable risks.
This inevitably changes the role of the infrastructure team.
Administrators become platform engineers. Infrastructure becomes an internal product. And an originally technical decision becomes an organizational responsibility that permanently requires personnel resources, governance structures, and clear product responsibility.
This is precisely why the term Platform Engineering has established itself in recent years. It does not describe a new technology but a change in perspective. The platform is no longer understood as a collection of technical components but as a product that provides internal development teams with a standardized, secure, and efficient working framework.
Therefore, introducing Kubernetes almost inevitably also introduces Platform Engineering—regardless of whether this decision was made consciously.
The discussion about Build or Buy Kubernetes often begins with technical questions. Which distribution should be used? Which cloud is the right one? Which add-ons are needed?
These questions are important, but they are not decisive.
The real decision is whether a company is ready to take on the organizational responsibility for a platform whose lifecycle spans many years and whose complexity goes far beyond operating a cluster.
Kubernetes is no longer an infrastructure technology today but the foundation of modern software platforms. Those who want to build this foundation themselves are not just choosing a technical architecture but are committing to the long-term development of a platform organization.
In the second part of this series, we leave the technical perspective and look at the economic reality of platform operations. We analyze why hardware and cloud resources often make up the smallest part of the total cost, the role of cognitive load, platform teams, and organizational complexity, and why the most expensive component of a Kubernetes platform is almost always the people who operate it.
The Hidden Costs of Kubernetes: Why Infrastructure is Usually the Smallest Expense Part 2 of our …
TL;DR A Kubernetes multi-region architecture reduces downtime through geo-redundancy but increases …
TL;DR This piece demonstrates how Kubernetes disaster recovery is pragmatically implemented: …