Gateway API v1.2: New Features for Developers and DevOps Teams
Kubernetes SIG Network is excited to announce the general availability of Gateway API v1.2! This …
Efficient management of Kubernetes clusters is becoming increasingly important, especially as cluster size grows. One of the biggest challenges with large clusters is the memory overhead caused by list requests.
In the current implementation, the kube-apiserver handles list requests by assembling the entire response in memory before transmitting data to the client. But what happens when the response content is extensive, say several hundred megabytes? And imagine multiple list requests arriving simultaneously, perhaps after a brief network outage. While API Priority and Fairness has proven effective in protecting the kube-apiserver from CPU overload, its impact on memory protection is visibly less. This can be explained by the different nature of resource consumption of a single API request - CPU usage at any given time is limited by a constant value, whereas memory, which is not compressible, can grow proportionally with the number of objects processed and is unlimited.
This situation poses a real risk that can overwhelm any kube-apiserver within seconds due to Out-of-Memory (OOM) conditions, causing it to crash. To better illustrate the problem, let’s look at the following diagram.

The diagram shows the memory usage of a kube-apiserver during a synthetic test. (See the section synthetic test for more details). The results clearly show that the number of informers significantly increases the server’s memory usage. Notably, the server crashed around 16:40 while serving only 16 informers.
Our investigation revealed that this substantial memory allocation occurs because the server, before sending the first byte to the client, must:
This sequence results in significant temporary memory consumption. The actual consumption depends on many factors, such as page size, applied filters (e.g., label selectors), query parameters, and the sizes of individual objects.
Unfortunately, neither API Priority and Fairness nor Golang’s garbage collection or Golang’s memory limits can prevent the system from exhausting memory under these conditions. Memory is allocated suddenly and rapidly, and just a few requests can quickly deplete available memory, leading to resource exhaustion.
Depending on how the API server is run on the node, it could either be killed by the kernel due to OOM if the configured memory limits are exceeded during these uncontrolled spikes, or, if no limits are configured, it could have even worse impacts on the control node. And the worst part: after the first API server failure, the same requests are likely to hit another control node in an HA environment, with likely the same effects. A potentially hard-to-diagnose and hard-to-recover situation.
However, introducing API streaming could yield significant improvements. With API streaming, memory consumption is optimized as data can be processed in smaller, more manageable chunks, reducing the pressure on the kube-apiserver. This is a great example of how ayedo, as a Kubernetes partner, helps enhance the efficiency and stability of clusters.
Source: Kubernetes Blog
Kubernetes SIG Network is excited to announce the general availability of Gateway API v1.2! This …
Following the general release of the Gateway API last October, the Kubernetes SIG Network is …
Modern generative AI and large language models (LLMs) present unique traffic management challenges …