Observability for MLOps: More Than Just Monitoring CPU and RAM
In the traditional IT world, things are binary: A server is either running or it’s not. A …

In a modern data engineering platform, storage needs are not only vast but also diverse. We need space for raw sensor data, finished AI models, container images, and backups. Classic file servers (NFS) quickly reach their limits, especially when it comes to parallel access from hundreds of Kubernetes pods.
The solution for our industrial corporation is CEPH. As a highly available, distributed storage system, CEPH transforms standard server hardware into a powerful storage network. The key feature: it offers an S3-compatible interface directly within the data center.
The S3 protocol (Simple Storage Service) has become the de facto standard for cloud data. Almost all modern tools like Apache Spark, Presto, or even Python libraries like Pandas can natively communicate with S3 storage.
By integrating CEPH (often via the Rook operator) directly into Kubernetes, a seamless interplay between computing power and storage is created:
Self-Healing: CEPH automatically replicates data across multiple physical servers. If a hard drive or an entire server fails, CEPH restores data integrity in the background without interrupting the operation of data pipelines.
Unified Storage: CEPH can simultaneously provide three types of storage:
Tiering: We can combine fast NVMe storage for “hot” data (current analyses) and cheaper HDD storage for “cold” data (archiving) in one system.
A strategic advantage of this architecture is the clean separation. As data volume grows, we simply add more servers with hard drives to the CEPH cluster. If more computing power is needed for AI models, we scale the CPU/GPU nodes. This independence saves massive costs, as hardware can be procured exactly as needed.
With CEPH on Kubernetes, we build a “Private Cloud Storage” that is functionally identical to the offerings of the major hyperscalers but remains entirely under the corporation’s control. It is the backbone for a stable data lake that does not falter even with petabytes of data and forms the foundation for any form of advanced analytics.
Isn’t CEPH very complex to administer? It used to be. By using Kubernetes operators like Rook, the management of CEPH is automated. Tasks such as adding new hard drives or updating software are controlled via declarative YAML files, drastically reducing complexity.
How secure is the data in CEPH against total loss? CEPH uses methods like “Erasure Coding” or simple replication (e.g., factor 3). Even if two servers fail simultaneously, the data remains available. Additionally, offsite backups for disaster scenarios can be easily integrated.
Can I use CEPH if I’m already in the cloud? Yes. Many companies use CEPH in the cloud to have a unified storage layer across different environments or to avoid the often expensive egress costs and proprietary storage fees of cloud providers.
How fast is access compared to local storage? By distributing the load across many hard drives in parallel, CEPH can often be faster than a single local SSD for sequential access (typical for data engineering). For databases with many small write accesses, we optimize the system through special caching layers.
How does ayedo support the setup of CEPH? We plan the hardware sizing, implement the Rook/CEPH stack in your Kubernetes cluster, and configure the S3 endpoints for your applications. We ensure that your storage backend is performant, secure, and future-proof.
In the traditional IT world, things are binary: A server is either running or it’s not. A …
TL;DR In the microservices world, services need a way to communicate. Tools like RabbitMQ (based on …
TL;DR The S3 protocol is to data storage what HTTP is to websites: the universal standard. However, …