Storage Design for Database Platforms: Performance vs. Capacity with Ceph
David Hussain 4 Minuten Lesezeit

Storage Design for Database Platforms: Performance vs. Capacity with Ceph

When scaling a DBaaS platform, storage quickly becomes the most critical bottleneck. Databases have two opposing demands on storage infrastructure: on one hand, they require extremely low latencies for read and write operations (I/O), and on the other, backups and transaction logs (WAL) generate massive amounts of data that need to be stored cost-effectively.

When scaling a DBaaS platform, storage quickly becomes the most critical bottleneck. Databases have two opposing demands on storage infrastructure: on one hand, they require extremely low latencies for read and write operations (I/O), and on the other, backups and transaction logs (WAL) generate massive amounts of data that need to be stored cost-effectively.

Relying on “one-size-fits-all” storage means either paying too much for backup space on expensive SSDs or sacrificing database performance on slow archival disks. The solution for a sovereign European provider lies in an intelligent, software-defined design with Ceph.

1. The Two-Pillar Model: Block vs. Object

Instead of trying to use a single type of storage for everything, we divided the storage into two specialized layers:

A. The Performance Layer: Ceph RBD (Block Storage)

For active database volumes, we use Ceph RBD. This is where the actual data that PostgreSQL operates on resides.

  • Why Block Storage? It offers the performance and consistency required by databases.
  • Fault Tolerance: Ceph automatically replicates data across multiple physical nodes. If a server fails, the data is immediately available on other nodes, and Kubernetes can restart the database instance without data loss.
  • Scalability: We can linearly increase performance by adding more nodes.

B. The Capacity Layer: Ceph RGW (Object Storage)

For backups and the continuous archiving of Write-Ahead Logs (WAL), we use Ceph RGW, an S3-compatible interface.

  • Why Object Storage? It is significantly more cost-effective for large volumes of data. Since backups are written sequentially and rarely read, they do not require the extremely low latencies of block storage.
  • Point-in-Time Recovery: This is where the pieces are stored that allow a database to be restored to the exact second.

2. Protection Against “Noisy Neighbors” at the Storage Level

A nightmare for any DBaaS provider: A customer writes massive amounts of data, overloading the entire storage system and slowing down the databases of all other customers.

By using Ceph in combination with Kubernetes limits (Cgroups), we prevent this effect:

  • IOPS Limits: Each database instance is assigned a fixed budget of input and output commands (IOPS).
  • Isolation: Physical resources are managed so that a peak-load customer cannot jeopardize the performance guarantees (SLAs) of other customers.

3. Geo-Redundancy Without Vendor Lock-In

A key feature for European sovereignty is independence from a single location. Our storage design allows backups to be automatically replicated to a second, geographically separate region. Should an entire data center fail, valuable customer data is securely stored in the S3 storage of the second location and can be used there for a quick restart.

Conclusion: Storage as a Competitive Advantage

A well-thought-out storage design is an economic lever for a DBaaS provider. It allows high performance where needed while keeping the costs of massive data growth (backups) under control. Solving storage systemically builds a platform that not only convinces technically but also scales profitably.


FAQ: Storage Strategy for DBaaS

Why not just use the cloud provider’s block storage? Local provider storage is often expensive and ties you technically to that provider. With your own Ceph layer, you retain full control over performance profiles and can theoretically move your platform to any infrastructure (multi-cloud capability).

How secure is the data with Ceph against hardware failures? Ceph is “self-healing.” We typically configure triple replication. This means that even if two servers fail simultaneously, the data is still available. The system immediately begins restoring redundancy on the remaining servers after a defect.

Do backups affect the performance of the running database? By separating RBD (for the DB) and RGW (for backups), we minimize the impact. Writing backups to the S3 layer uses different resource paths than critical database I/O.

Can storage space for customers grow dynamically? Yes. Thanks to Kubernetes integration, customers can increase their storage space through the portal. The platform expands the volume in the background “on-the-fly” without needing to restart the database.

Ähnliche Artikel