When the Checkout Stops: Why High Availability Decides the Survival of Retail
In retail, timing is ruthless. A system failure on a Saturday afternoon, during the peak sales …

Many IT managers in medium-sized businesses feel secure because they “do backups.” However, in a serious incident—such as a massive cloud provider outage, a ransomware attack, or a human error in the root config—they realize: A backup is not a recovery plan.
In the Cloud-Native world, Disaster Recovery (DR) means more than just restoring data. It means quickly restoring the entire application topology.
A backup is a copy of data. Disaster Recovery is the process framework to restore business operations within a defined time. Two key metrics play a major role:
Depending on the criticality of your applications, three different architectural patterns are available:
The simplest way. With tools like Velero, we back up the cluster state (YAML manifests, Persistent Volumes, Secrets) to an S3 storage—ideally with another provider or on-premises.
A minimal standby cluster runs in a second region or another data center. Only the absolutely critical core components (e.g., database replication) are active.
Workloads run simultaneously in two clusters. A global load balancer distributes the traffic.
For medium-sized businesses, a combination of Velero and GitOps is often the “sweet spot.”
A DR plan that hasn’t been tested doesn’t work. We recommend regular “Game Days”: Intentionally shut down a test cluster and measure how long it takes your team to restore it with the available tools. Only then will you know if your cloud strategy is truly crisis-proof.
Should backups be with the same provider? Absolutely not. If AWS has a massive issue in the Frankfurt region, there’s a good chance your S3 bucket there will be affected too. Use “cross-provider” backups (e.g., backups from AWS to an S3 with Wasabi, IONOS, or on-prem).
How do I handle databases? Kubernetes snapshots (via CSI) are good, but often not “application consistent” for databases. Additionally, use native database tools (e.g., pg_dump or Barman for Postgres), which are triggered by Velero with a hook before the backup.
Is a Git repository sufficient as a recovery source? For application logic, yes. But be cautious: Secrets (passwords, certificates) are often encrypted in the cluster or managed externally (e.g., HashiCorp Vault). Ensure these vaults are also part of the recovery plan.
In retail, timing is ruthless. A system failure on a Saturday afternoon, during the peak sales …
Buying Dependency or Building Resilience Block storage is one of the invisible yet most critical …
TL;DR Secrets (API keys, database passwords) do not belong in Git code, but their runtime …