Margin Killer: Cloud Costs? How SaaS Providers Can Maximize Infrastructure Efficiency
In the growth phase of a SaaS company, there is a dangerous curve: the Cost of Goods Sold (COGS). …

The hype around Artificial Intelligence has ushered in a new era of IT spending. Those who train or operate LLMs (Large Language Models) today quickly realize: The costs for Graphics Processing Units (GPUs) follow entirely different rules than traditional CPU instances. A single H100 instance in the cloud can cost as much per month as a small car.
FinOps 2.0 is the evolution of cloud cost management. It’s no longer just about shutting down unused instances, but about managing the company’s most expensive resources – AI computing power – with surgical precision.
AI workloads are “voracious” and often unpredictable. Without a specialized FinOps strategy, the AI project risks becoming a financial fiasco before it generates its first dollar of revenue.
GPUs, unlike CPUs, are often binary: Either a process fully occupies them, or they sit idle. If a developer books a GPU instance for experiments and forgets it over the weekend, the costs continue to accrue linearly.
AI requires massive amounts of data for training. If you store your data in Cloud A but want to use cheaper GPUs in Cloud B, you pay massive fees for data transfer (egress).
Often, small models use only a fraction of the video memory (VRAM) of a large GPU.
Forget total costs. In the FinOps 2.0 world, costs per outcome matter:
Why are AI costs so much harder to plan than classic web apps? Web apps usually scale linearly with users. AI models, however, have a “minimum consumption” (baseline). A model must be loaded in memory to respond – this costs money even if no user is currently querying it. Techniques like “Scale-to-Zero” can help here.
Is on-premise always cheaper for AI than the cloud? Not necessarily. The acquisition of AI hardware is extremely expensive and delivery times are long. The cloud offers flexibility. The rule of thumb for 2026: Cloud for experiments and peak loads, own hardware for constant baseline workload.
What is “Spot-Instance Training”? It involves using surplus capacities from cloud providers at a fraction of the price (up to 90% discount). The risk: The instance can be withdrawn at any time. Modern AI frameworks therefore save “checkpoints” every few minutes to immediately resume training after an interruption.
Do open-source models help save costs? Massively. Instead of paying per request to a commercial provider (like OpenAI), you run models like Llama 3 or Mistral on your own infrastructure. You pay for the hardware, not for usage frequency. Beyond a certain volume, this is the decisive factor for profitability.
How do we start with FinOps 2.0? Through tagging and labeling. Every GPU workload must be assigned to a project or department. Only when you see who is causing the costs can you talk about optimization. Cloud-native approaches can help here.
In the growth phase of a SaaS company, there is a dangerous curve: the Cost of Goods Sold (COGS). …
Imagine getting the same computing power for 70% to 90% less cost. The catch? The cloud provider …
FinOps in Kubernetes - 20 Answers 1. Why is the standard cloud bill for Kubernetes costs unusable? …