From Binary Alerts to Observability: Revolutionizing Capacity Planning
In the history of medium-sized IT infrastructures and system houses, having one’s own data …

In the history of mid-sized IT infrastructures and system houses, having one’s own data center was considered an undeniable competitive advantage for decades. Those who control the hardware have absolute data sovereignty, manage update cycles independently, and can flexibly address compliance issues. To manage the growing number of servers and customer applications, clever administrators early on adopted automation tools: VMware for virtualization, Ansible for provisioning, and custom shell scripts or cron jobs for recurring Day-2 tasks.
However, these organically grown structures hit an invisible but relentless boundary as the portfolio grows and customer demands increase. What starts as sensible, pragmatic automation gradually becomes operational debt. The risk rarely lies with the tools themselves but with a fundamental conceptual flaw: the lack of a consistent system and platform logic.
When IT infrastructures grow without a standardized platform architecture, control over operational processes also fragments. In practice, three critical weaknesses emerge in grown models:
Ansible playbooks or Bash scripts exist, but they are often maintained individually and are not centrally versioned. This means that the automation is imperative and person-dependent. If Administrator A writes a script, it works on their workstation, with their specific environment variables and under their implicit assumptions. If Administrator B faces a slightly shifted reality in the data center, execution fails - the automation itself becomes an unpredictable risk factor.
Maintenance tasks like backups, log rotations, TLS certificate renewals, or capacity scaling run isolated over local cron jobs on the respective VMs. These tasks are not systemically and centrally monitored. The result: Problems like expired certificates or full disks are not discovered by the system but are only reported by the customer after failure. Operations remain in a permanently reactive mode.
Who made what change to a customer application and when? With scattered scripts and manual ad-hoc interventions on servers via SSH, there is no consistent, tamper-proof record. However, what is not documented, versioned, and exportable in the course of strict EU security directives like NIS-2 or DORA does not officially exist in an audit. The risk of contractual penalties or liability issues increases with every manual adjustment.
To break the vicious cycle of person-dependent administration, a radical departure from the imperative principle (“Do step A, then step B”) is required. The modern solution lies in establishing a declarative operating model, realized through the combination of Kubernetes and GitOps.
This paradigm fundamentally shifts the logic of infrastructure management:
[ Git-Repository: Single Source of Truth ]
(Desired State as declarative code)
|
v (Automatic Pull / Webhook)
[ GitOps-Controller (ArgoCD) ] <==============+
| |
| (Continuous Reconciliation) | (Current State)
v |
[ Kubernetes API-Gateway ] |
| |
v |
[ Central Resource Pool ] =====================+
(Managed Nodes, Networks, Storage)In the declarative model, developers and platform engineers describe only the desired target state (Desired State) of an application or infrastructure component in standardized YAML files (e.g., via Helm or Kustomize). It is specified how many instances must run, which storage is connected, and which environment variables apply. How this state is achieved is autonomously decided by the platform.
All configuration files reside in a central, version-controlled Git repository. A GitOps controller (like ArgoCD) within the cluster continuously monitors this repository. Every change to the system - whether app update, scaling, or configuration adjustment - must be documented as a commit or pull request in Git. Manual “quick fixes” via SSH directly on the servers are thus a thing of the past.
The platform continuously compares the defined desired state in the Git repository with the actual current state in the data center. If the two states diverge, for example, because a service crashes or a configuration is manually manipulated, the platform intervenes to correct it. It autonomously restores the state defined in the code in the same split second (Self-Healing).
The transformation from a grown infrastructure to a consistent operating platform changes the resilience and cost-effectiveness of the entire data center operation:
Scaling a modern portfolio of customer applications cannot be solved by paratactically stringing together more automation tools. Those who combat complexity with individual scripts reap operational instability. True digital sovereignty in one’s own data center only arises when operations are transformed from a person-dependent service to a standardized, measurable, and auditable operating platform. Only through this architectural step is full control over the infrastructure maintained, while simultaneously relieving the engineering team for tomorrow’s innovations.
No. The transition to declarative platform logic is an evolutionary process. Ansible can be excellently used in the transition phase to provide the underlying bare operating systems (Bare Metal or VMs) and the basic network configuration on which the Kubernetes cluster is based. However, the management, scaling, and securing of the actual customer applications and their Day-2 services are consistently handed over to the declarative level of Kubernetes and GitOps.
This is the core question in a consistent GitOps approach. Since plaintext secrets must never be checked into the Git repository, the platform is extended with a specialized operator (like the External Secrets Operator). Only declarative placeholder manifests remain in the Git repository. At the moment of deployment, the operator resolves these placeholders and securely retrieves the real, AES-256-encrypted passwords from a central identity fortress (like OpenBao or a Key Vault).
The platform operates autonomously. Should the central Git repository be temporarily unavailable due to a network disruption, all active customer applications and Day-2 processes in the cluster continue to run completely undisturbed in the last known desired state. In this phase, only the rollout of new software releases or the making of structural configuration changes is blocked until the connection to the source of truth is restored.
In the history of medium-sized IT infrastructures and system houses, having one’s own data …
TL;DR The EU Data Act requires clear governance of data flows, transparent access controls, and …
TL;DR Digital sovereignty requires cloud independence, avoiding reliance on individual providers. …