Optimizing Kubernetes: How API Streaming Enhances Efficiency
ayedo Redaktion 3 Minuten Lesezeit

Optimizing Kubernetes: How API Streaming Enhances Efficiency

Discover how API streaming in Kubernetes reduces memory load and boosts efficiency. Practical tips and use cases.
kubernetes kubernetes-news api

Efficient management of Kubernetes clusters is becoming increasingly important, especially as cluster size grows. One of the biggest challenges with large clusters is the memory overhead caused by list requests.

In the current implementation, the kube-apiserver handles list requests by assembling the entire response in memory before transmitting data to the client. But what happens when the response content is extensive, say several hundred megabytes? And imagine multiple list requests arriving simultaneously, perhaps after a brief network outage. While API Priority and Fairness has proven effective in protecting the kube-apiserver from CPU overload, its impact on memory protection is visibly less. This can be explained by the different nature of resource consumption of a single API request - CPU usage at any given time is limited by a constant value, whereas memory, which is not compressible, can grow proportionally with the number of objects processed and is unlimited.

This situation poses a real risk that can overwhelm any kube-apiserver within seconds due to Out-of-Memory (OOM) conditions, causing it to crash. To better illustrate the problem, let’s look at the following diagram.

Monitoring graph showing kube-apiserver memory usage

The diagram shows the memory usage of a kube-apiserver during a synthetic test. (See the section synthetic test for more details). The results clearly show that the number of informers significantly increases the server’s memory usage. Notably, the server crashed around 16:40 while serving only 16 informers.

Why does the kube-apiserver need so much memory for list requests?

Our investigation revealed that this substantial memory allocation occurs because the server, before sending the first byte to the client, must:

  • Retrieve data from the database,
  • Deserialize the data from its stored format,
  • And finally construct the final response by converting and serializing the data into the format requested by the client.

This sequence results in significant temporary memory consumption. The actual consumption depends on many factors, such as page size, applied filters (e.g., label selectors), query parameters, and the sizes of individual objects.

Unfortunately, neither API Priority and Fairness nor Golang’s garbage collection or Golang’s memory limits can prevent the system from exhausting memory under these conditions. Memory is allocated suddenly and rapidly, and just a few requests can quickly deplete available memory, leading to resource exhaustion.

Depending on how the API server is run on the node, it could either be killed by the kernel due to OOM if the configured memory limits are exceeded during these uncontrolled spikes, or, if no limits are configured, it could have even worse impacts on the control node. And the worst part: after the first API server failure, the same requests are likely to hit another control node in an HA environment, with likely the same effects. A potentially hard-to-diagnose and hard-to-recover situation.

However, introducing API streaming could yield significant improvements. With API streaming, memory consumption is optimized as data can be processed in smaller, more manageable chunks, reducing the pressure on the kube-apiserver. This is a great example of how ayedo, as a Kubernetes partner, helps enhance the efficiency and stability of clusters.


Source: Kubernetes Blog

Ähnliche Artikel