AWS MSK vs. Apache Kafka
Consume or Control Infrastructure AWS MSK and Apache Kafka do not compete on a feature level. They …

In the world of databases, there’s a significant difference between a “backup” and “recoverability.” For a DBaaS provider, a daily snapshot of data is not enough. If a customer accidentally deletes an important table at 2:05 PM, a backup from 2:00 AM is only partially helpful—they would lose an entire morning’s work.
The true product promise of a modern database platform is Point-in-Time Recovery (PITR). It allows restoration to any second within the retention period.
To technically implement PITR, we use a combined approach of two components:
When a restore is requested, the system first applies the last base backup before the target time and then “fast-forwards” the WAL files to the exact second. The result: a consistent data state with minimal data loss (RPO near zero).
The challenge for our customer was the volume: How do you manage this logic for hundreds of databases simultaneously without exploding storage or losing oversight?
A backup is only as good as the successful restore. In practice, many DR (Disaster Recovery) concepts fail because the restoration was never seriously tested.
We have established automated restore tests for the platform. The system regularly creates test instances from the existing backups and verifies their integrity. Only then can the provider confidently offer an availability and security guarantee to their customers.
For a DBaaS provider, PITR is not a “feature” but the lifeline for their customers’ business. By automating restoration to the second level and securing it georedundantly, we create the necessary trust to maintain business-critical workloads on the platform.
How far back can a customer go in time? This depends on the defined “Retention Policy.” Periods between 7 and 30 days are common. Since WAL files occupy storage space, this is often a differentiating factor between different pricing models of the provider.
Does WAL streaming affect database performance? By using asynchronous archiving and dedicated Object Storage, we keep the overhead extremely low. The database writes its logs locally anyway; the copying process to S3 storage happens in the background.
What happens in the event of a “corruption” error in the database? This is where PITR shines. If the data has been corrupted (e.g., due to a software bug), the customer can choose the point in time just before the corrupting event and restore a clean instance.
Can customers trigger restores themselves? Yes, that’s the goal of self-service. Through the API or the customer portal, the user selects the time and target instance. The platform takes care of provisioning the resources and applying the data in the background.
Consume or Control Infrastructure AWS MSK and Apache Kafka do not compete on a feature level. They …
Consuming or Mastering Databases AWS RDS and MariaDB do not represent competing products but rather …
The transition from OTRS to Zammad is more than just a technical upgrade for many organizations – …