Storage Design for Database Platforms: Performance vs. Capacity with Ceph
When scaling a DBaaS platform, storage quickly becomes the most critical bottleneck. Databases have …

In a multi-region architecture for critical infrastructures (KRITIS), data consistency is the greatest technical challenge. While we can easily double computing power (Kubernetes pods), data cannot be kept “live” in two places at once without effort. The speed of light limits us: Every synchronous confirmation of a write operation over hundreds of kilometers creates latencies that can destabilize an application.
For a resilient platform, we therefore use a differentiated strategy for different data types - from relational databases to caches and message brokers.
For core databases, we use a two-tier model. The goal: Maximum write speed during normal operations and minimal data loss in the event of a disaster.
In KRITIS systems, a failover must not disrupt the user experience. If a technician from a network operator is coordinating a switching operation and the location changes, they must not be logged out.
For communication between different services and processing sensor data, we use message brokers. It is crucial that messages are not lost if a connection is interrupted.
An often forgotten point during failover is cryptographic keys and passwords. A cluster that starts up but has no access to its database passwords is worthless. We rely on a replicated HashiCorp Vault instance. All secrets are encrypted and synchronized between regions, ensuring the backup location is always “operational.”
True geo-redundancy accepts the physical limits of the network. Instead of trying to enforce everything everywhere simultaneously, we prioritize: Local performance for everyday operations, asynchronous security for emergencies. Through this layered data architecture, we ensure that the KRITIS platform is not only available but also operates with correct and up-to-date data.
Is there a risk of data loss with asynchronous replication? Yes, theoretically, the last milliseconds of data can be lost in a hard site crash (Recovery Point Objective > 0). For KRITIS systems, however, this controlled trade-off is usually safer than a synchronous system that halts the entire production at every minor network fluctuation.
How is data consistency checked after a failover? We use automated checksum comparisons and point-in-time recovery mechanisms. Additionally, we ensure “fencing” so that the old (defective) master never writes simultaneously with the new master (split-brain prevention).
Can we also use NoSQL databases like MongoDB or Cassandra? Absolutely. Many NoSQL systems come with native multi-region features. The choice of database always depends on the specific use case and the consistency requirements of your application.
What happens if the connection between sites is interrupted for a longer period? The systems switch to a “queue” mode. Once the connection is restored, a “re-sync” occurs. The platform is designed so that both sites can continue to fulfill their local tasks independently (island mode).
How does ayedo support the design of the data layer? We analyze your data flows and define the appropriate RPO and RTO goals with you. We implement the replication pipelines and ensure through regular failover tests that the theory of data security truly holds in practice.
When scaling a DBaaS platform, storage quickly becomes the most critical bottleneck. Databases have …
🧠 Editorial This week, tech feels less like progress and more like a dose of reality. The same …
Operating highly available platforms for critical infrastructures (KRITIS) presents an …