Five Key Features of Portainer
Five Key Features of Portainer 1. Docker Environments 2. Access Control 3. CI/CD Capabilities 4. …

In modern IT infrastructure, the GPU has become the new CPU. Whether it’s Large Language Models (LLMs), computer vision, or complex data analysis, the demand for computing power on graphics cards has massively increased in the mid-market. However, while CPUs have been efficiently virtualized and shared for decades, GPUs often present platform engineers with a dilemma: A high-end graphics card (like an NVIDIA H100 or A100) is often oversized for a single microservice, yet too expensive to leave idle.
The solution to this problem is GPU Slicing. In this post, you will learn how to partition your expensive hardware in Kubernetes clusters so that multiple workloads can benefit simultaneously without blocking each other.
By default, Kubernetes treats a GPU as an indivisible resource. A pod requests nvidia.com/gpu: 1, and the system assigns the entire hardware exclusively to it. While this might make sense for intensive model training, for inference (running a model), where the GPU might only be 15% utilized, it leads to massive resource wastage.
To increase efficiency, three technical approaches have been established, which we use at ayedo in the context of sovereign infrastructures.
NVIDIA MIG is the most robust form of slicing at the hardware level (available from the Ampere architecture). Here, a physical GPU is divided into up to seven independent instances.
If your hardware does not support MIG (e.g., older T4 or consumer cards), time-slicing is the way to go. Here, the GPU scheduler of Kubernetes is configured to allow multiple pods on the same GPU.
Out-of-Memory (OOM) error.MPS is a software layer that sits between the application and the hardware. It allows you to allocate compute resources as a percentage.
To utilize GPU slicing, we rely on the NVIDIA GPU Operator. This automates the loading of drivers, the configuration of the container runtime (nvidia-container-runtime), and the labeling of nodes.
The allocation is done as usual via YAML resource definitions. Instead of requesting a whole GPU, we use profiles:
resources:
limits:
nvidia.com/mig-1g.10gb: 1
This approach allows you to consolidate development, staging, and production workloads on the same physical machine, drastically reducing operational costs.
By 2026, GPU slicing is no longer a gimmick but an economic necessity. By intelligently partitioning your hardware, you avoid “resource islands” and ensure that your AI initiatives remain scalable. Especially regarding digital sovereignty, this approach enables you to operate powerful AI services on your own hardware (on-premise or colocation) that can compete price-wise with major public cloud providers.
What is GPU slicing in Kubernetes? GPU slicing refers to various techniques (such as MIG or time-slicing) to divide a physical graphics card into multiple smaller units. This allows multiple containers or Kubernetes pods to access the same GPU simultaneously, improving hardware utilization and reducing costs.
When should I use NVIDIA MIG? MIG (Multi-Instance GPU) is ideal when you need hard isolation between workloads. Since memory and compute cores are separated at the hardware level, it is the safest method for multi-tenant clusters or critical production environments, but it requires Ampere generation hardware (e.g., A100) or newer.
Can I share GPUs on older hardware? Yes, through “time-slicing”. Here, the driver divides the GPU time between the pods. However, since there is no true memory isolation, developers must ensure that applications do not overload the graphics memory (VRAM).
How does GPU slicing affect performance? With MIG, there is virtually no performance loss as resources are dedicated. With time-slicing, minimal latencies can occur due to context switching. In most inference scenarios, however, this effect is negligible compared to the cost savings.
Does ayedo support managed GPU infrastructures? Yes, ayedo integrates GPU support natively into managed Kubernetes environments. We configure the NVIDIA GPU Operator and assist companies in implementing slicing strategies that are both economically efficient and technically stable.
Five Key Features of Portainer 1. Docker Environments 2. Access Control 3. CI/CD Capabilities 4. …
TL;DR Milliseconds determine conversion rates and user experience. If every database query has to be …
TL;DR Artificial Intelligence (AI) is the new standard, but using cloud APIs like OpenAI (ChatGPT) …