SaaS Apps

From VM Operation to Platform: How ayedo’s Planwerk Led to Scalable, Auditable SaaS Operations

Many SaaS platforms grow as they are built: step by step, pragmatically, “works for now.” This is appropriate in the early phase. But at some point, it tips over. Pragmatic infrastructure becomes a risk—not because it’s bad, but because the product has outgrown it.

Planwerk develops a platform for digital construction planning and project management. Architecture firms, developers, and public clients use it to plan collaboratively, document decisions, and manage construction projects. Around 8,000 active users across approximately 200 clients—from small offices to municipal building authorities with hundreds of users. Additionally, there were three enterprise clients running the platform in a dedicated on-premise instance for regulatory reasons.

Planwerk had a successful product. What was missing was an operating system that could withstand product growth.

Initial Situation: A VM Setup That Was Good for the Prototype—But Too Small for the Product

The platform ran on six VMs at a European cloud provider: two application servers behind a load balancer, two database servers in a primary-replica setup, a worker server for background processes, and a server for file storage. Deployments, configuration, and maintenance were handled via Ansible and SSH.

What worked well for one team and a few dozen clients suddenly became a daily operational burden with 200 clients. Not because of a single weak point, but because multiple issues compounded each other.

Monday morning—when hundreds of construction teams simultaneously updated their weekly plans—the platform noticeably slowed down. There was no elastic scaling. The only reaction was to book larger VMs. This is vertical scaling as a reflex—expensive and sluggish. And most importantly: it does not solve the operational stress because peak loads still lead to what you want to avoid in SaaS: “we need to react.”

Deployments were an event of their own. Rollout across both app servers, manually, with interim checks. Meanwhile, there were noticeable restrictions of one to two minutes. In case of errors, rollback was manual work—including further downtime. This artificially limited deployments to a few evenings per week. A product that should actually be developed iteratively inevitably becomes slower this way.

Staging existed—but as a single VM that only roughly resembled production. Different machine size, different network paths, no replica database. Errors that occurred under load or in multi-server operation were not visible on staging. This further reinforced the fear of deployments and extended release cycles.

The most critical issue was backups. Database backups ran as nightly pg_dump on the same storage as the database. A restore was tested exactly once in three years—and failed because the backup was corrupt. There was no automated backup for file storage. This is not a “technical detail,” but an existential risk: In audits, it doesn’t count that backups run—but that restore demonstrably works.

And finally, the on-premise clients: three separate installations, manually maintained, with their own playbooks and update cycles. Updates regularly lagged weeks behind. Each on-premise client permanently tied up DevOps capacity—and made the team defensive: on-premise was not a revenue opportunity, but a burden.

The growth lever that exacerbated everything was a potential large municipal purchasing group: potentially 2,000 additional users in six months. The engineering team knew: With the existing setup, this would not work—not without drastic risks.

ayedo’s Approach: Building an Operating System—Not Just Replacing Infrastructure

We didn’t start with the goal of “replacing VMs with Kubernetes.” We started with a platform goal:

Planwerk needed a unified operating model that

scales horizontally instead of growing vertically,
enables deployments without downtime,
makes staging truly production-like,
makes backup/recovery demonstrable,
makes disaster recovery testable and documentable,
and maps cloud and on-premise through the same process.

The central decision was therefore: Managed Kubernetes as a unified operating platform—with GitOps as the operational standard.

A Platform for Cloud and On-Premise: The Same Process, Different Location

The biggest structural gain was the unification. Previously, there were two worlds: VM-based SaaS in the cloud and separately maintained on-premise installations. After the migration, there is only one process.

The application runs as a Container workload—the same images, the same manifests, the same configuration structure. Whether it runs in the ayedo cloud or in an on-premise client’s cluster is just a matter of location, not operation.

This eliminates the biggest cost driver in everyday life: “special solutions” for on-premise. Updates are no longer individually followed up but rolled out via the same GitOps path—planned and consistent.

Scaling and Availability: Load Peaks Become Routine

With Horizontal Pod Autoscaling, the application layer scales automatically based on CPU usage and request volume. The notorious Monday morning peaks are no longer an alarm state but a normal condition that the platform absorbs—without a ticket, without manual VM enlargement, without “we need to stay on it.”

Redis decouples sessions from individual application instances. Pods can be restarted or scaled without logging out users. RabbitMQ shifts demanding background jobs out of the request path: PDF generation, mail sending, exports, and integrations run asynchronously. This reduces latencies and makes the system more stable under load.

Deployments as Routine: GitOps Instead of SSH

ArgoCD became the central rollout engine. Deployments are a Git commit: change the version in the manifest, ArgoCD rolls out—with rolling updates, health checks, and automatic rollbacks if something goes wrong.

This is not only more convenient. It is a complete change in the risk model: A deployment is no longer an event with downtime but a routine. This decouples releases from “Tuesday and Thursday evenings.” Planwerk can deploy at any time—even several times a day if necessary—without user impact.

GitLab CI/CD builds versioned images, runs tests, and automatically updates the GitOps manifests. This makes the path from commit to production not only fast but also traceable and reproducible.

Staging That Deserves Its Name: Production-Identical and Resilient

A frequently underestimated point in such migrations is staging. Not “a VM that roughly fits,” but an environment that is truly identical.

Planwerk’s staging now runs in the same platform model, with identical configuration, identical database structure, and realistic, anonymized test data. This makes errors that previously only occurred in multi-server operation or under load visible before production. Preview environments per pull request complement the setup for feature-specific testing.

This takes pressure out of the process and brings speed back into development.

Backup, Restore, Disaster Recovery: From Hope to Proof

The biggest Compliance lever was the professionalization of backup and recovery.

PostgreSQL runs as a cluster with automatic failover, continuous WAL archiving, and daily full backups on geo-redundant storage. Point-in-time recovery enables restoration to any second within retention—a crucial difference to “we have a nightly dump.”

More important than the backup itself are the automated restore tests. Planwerk tests restores weekly in an automated manner, and the results are logged. This means backup functionality is no longer an assumption but a verifiable process—exactly what public clients want to see in tenders.

Velero complements this at the platform level: not only databases but also Kubernetes resources, configurations, secrets, and volumes are secured and restorable. This makes disaster recovery not “we rebuild everything and somehow import data,” but a defined, repeatable process.

On this basis, we built a DR concept with defined RTO/RPO values, documented and regularly tested. The recovery time thus drops from “several days if everything goes well” to hours—and most importantly: it is verifiable.

Compliance as a Byproduct of Operations

Harbor brings vulnerability scanning and SBOM generation into the standard process. Every container image is checked before deployment. This is not only security hygiene but a strong compliance proof for public clients.

Authentik ensures consistent access control with SSO and OIDC integration. VictoriaMetrics, VictoriaLogs, and Grafana provide observability: request latencies, error rates, platform health, database metrics—including alerting on anomalies.

This turns many questions that were previously laboriously answered into standard reports: access, backup, version status, vulnerability situation, operational status.

Result: Scalable, Auditable, and Finally Unified

Planwerk deploys today without downtime and not according to a calendar, but as needed. Releases are no longer “an evening project,” but routine.

The platform scales horizontally and can absorb growth without the team having to “rebuild infrastructure” for every new client. The framework agreement with the municipal purchasing group could be signed because the platform can technically absorb the increase in users.

Backup and recovery are not only implemented but tested and documented. Disaster recovery has turned from an unclear risk into a plannable process with defined RTO.

The on-premise clients are no longer an exception. They run on the same deployment process as the cloud. Updates reach them on the same day instead of weeks later. The DevOps effort per on-premise client drops drastically—and on-premise becomes sellable again, instead of being internally resisted.

And last but not least: compliance proofs are now possible at the push of a button. This demonstrability was crucial to winning public tenders where documentation makes the difference.

Why This Approach Works

SaaS platforms rarely fail due to features. They fail in operation when operations do not grow with them.

The switch from VM manual operation to a declarative platform with GitOps, auto-scaling, tested recovery, and unified cloud/on-premise operation turns “we operate software” into a real operational model—reproducible, auditable, and scalable.

Call to Action

If your SaaS platform is still operated via SSH and playbooks, deployments cause downtime, and backups work more “by feel” than demonstrably, this is not an isolated case. It is the typical transition point: from a product that runs to a platform that needs to grow.

ayedo migrates SaaS applications to Managed Kubernetes—including GitOps, auto-scaling, production-identical staging, backup/PITR, disaster recovery, and a unified model for cloud and on-premise.

If you are facing similar challenges or growth is currently being slowed down for infrastructure reasons, let’s talk. We’ll take a look together.

From VM Operation to Platform: How ayedo’s Planwerk Led to Scalable, Auditable SaaS Operations

From VM Operation to Platform: How ayedo’s Planwerk Led to Scalable, Auditable SaaS Operations

Initial Situation: A VM Setup That Was Good for the Prototype—But Too Small for the Product

ayedo’s Approach: Building an Operating System—Not Just Replacing Infrastructure

A Platform for Cloud and On-Premise: The Same Process, Different Location

Scaling and Availability: Load Peaks Become Routine

Deployments as Routine: GitOps Instead of SSH

Staging That Deserves Its Name: Production-Identical and Resilient

Backup, Restore, Disaster Recovery: From Hope to Proof

Compliance as a Byproduct of Operations

Result: Scalable, Auditable, and Finally Unified

Why This Approach Works

Call to Action

Diesen Use Case umsetzen?

Weitere Use Cases

Video Processing

SaaS Apps

Machine Learning

SaaS Apps

From VM Operation to Platform: How ayedo’s Planwerk Led to Scalable, Auditable SaaS Operations

From VM Operation to Platform: How ayedo’s Planwerk Led to Scalable, Auditable SaaS Operations

Initial Situation: A VM Setup That Was Good for the Prototype—But Too Small for the Product

ayedo’s Approach: Building an Operating System—Not Just Replacing Infrastructure

A Platform for Cloud and On-Premise: The Same Process, Different Location

Scaling and Availability: Load Peaks Become Routine

Deployments as Routine: GitOps Instead of SSH

Staging That Deserves Its Name: Production-Identical and Resilient

Backup, Restore, Disaster Recovery: From Hope to Proof

Compliance as a Byproduct of Operations

Result: Scalable, Auditable, and Finally Unified

Why This Approach Works

Call to Action

Diesen Use Case umsetzen?

Weitere Use Cases

Video Processing

SaaS Apps

Machine Learning

Kontakt aufnehmen