The Fundamental Principles of Kubernetes Capacity Management

Kubernetes is an excellent tool for improving containerized operations because it provides a unified platform for deploying, scaling, and managing containerized applications. With Kubernetes, users can automate container deployment, scaling, and management processes, which greatly reduces the time and effort required to manage applications. Kubernetes also provides powerful scheduling capabilities, allowing users to specify complex rules for deploying and scaling containerized applications based on resource utilization, load balancing, and other factors.

Another key benefit of using Kubernetes is that it enables users to achieve greater reliability and resilience for their containerized applications. Kubernetes provides built-in features for managing container health, self-healing, and fault tolerance, which help to ensure that applications remain highly available and responsive even in the face of failures or disruptions.

Additionally, Kubernetes supports rolling and gradual roll out strategies to ensure application uptime during upgrades and feature updates without downtime or service interruption, further enhancing the reliability and resilience of containerized applications. Overall, Kubernetes is an essential tool for any organization that wants to optimize their containerized operations and achieve greater agility, efficiency, and reliability in their software delivery pipeline.

However, some fundamental principles go a long way in making sure that Kubernetes setups are optimized for business use. Here are some of the ideas that you can use to craft the best architectures in developing and engineering solutions for enterprise:

Promote Stateful Operations

When setting up Kubernetes, it is important to promote stateful operations, which refer to the handling of dynamic situations and scenarios. Kubernetes is often referred to as ‘continuous orchestration’ of containers, meaning that it continuously manages and monitors containers to ensure that they are running efficiently and securely.

By default, most systems, including the conventional Internet browser, operate using stateless data. This means that nothing is saved unless it is saved explicitly. However, Kubernetes allows for the creation of ‘persistent schemes’ that enable the system to keep appropriate user session data for re-use.

When stateful operations are promoted, it allows for a continuous view of the system’s health and security, which is a critical aspect of managing complex systems like Kubernetes. With this approach, teams can gain a more concrete concept of clusters and right-sizing, which is a good basis for advancing deployments.

Optimize Horizontal Pod Autoscaler (HPA)

When running production workloads with autoscaling enabled, there are a few HPA optimization best practices to keep in mind.

Define pod requests and limits: The Kubernetes scheduler bases scheduling decisions on the pod’s requests and limits. If they are not set correctly, the scheduler will not be able to make informed decisions, and the pods will not go into a pending state due to insufficient resources. Instead, they will go into a CrashLoopBackOff, which prevents the Cluster Autoscaler from scaling the nodes. Additionally, with HPA, not setting the initial requests to retrieve the current utilization percentages can lead to scaling decisions without a proper base to match resource utilization policies as a percentage.
Specify PodDisruptionBudgets for mission-critical applications: When a PodDisruptionBudget is defined for an application in the Kubernetes Cluster, critical pods running on it are protected from disruption. The autoscaler is programmed to avoid scaling down replicas beyond the minimum value set in the disruption budget.
Resource requests should be close to the average usage of the pods: For new applications, determining an appropriate resource request can be difficult as they lack previous resource utilization data. Vertical Pod Autoscaler can help in this regard, as it can be run in recommendation mode. The recommended values for CPU and memory requests for pods are based on short-term observations of the application’s usage.
Increase CPU limits for slow starting applications: Certain applications, such as Java Spring, need an initial CPU burst to start up. However, during runtime, the application’s CPU usage is usually lower than the initial load. To address this issue, limiting the CPU to a higher level is recommended. This enables containers to start up quickly, and lower request levels can be added later to match the typical runtime request usage of these applications. For these kinds of applications you can use an init container, that will be terminated once the application is running and serving load.
Don’t mix HPA with VPA: Running Horizontal Pod Autoscaler and Vertical Pod Autoscaler simultaneously is not advisable. It is recommended to first run Vertical Pod Autoscaler to obtain the appropriate CPU and memory values as recommendations. Then, HPA can be run to manage traffic spikes.
Implement a 3rd-party optimization solution: Various solutions exist in the market that help organizations address Kubernetes-related performance and cost issues, including over-provisioning and limitations of HPA/VPA (as discussed above). For instance, Granulate’s Capacity Optimization solution utilizes autonomous workload and pod rightsizing to ensure optimal performance while allowing you to pay only for what you use, thereby maintaining competitive SLAs.

Keep An Eye Out For “Waste”

Identify inefficiencies or other problems, or places where operations are sub-par. Stay attuned to the warning signs of situations like:

Idle cores
Container “sprawl” (to use the parlance of VM management)
Inefficient namespaces

All of this will help with the overall goal of optimization. Remember that issues like CPU throttling and memory traps have to do with the precision of allocating resources to each part of your network.

Desired State Planning

The desired state should always be ‘top of mind.’

In container virtualization engineering or any other kind of digital architecture design, it’s crucial to have a clear understanding of the desired state from the outset of the project. This helps ensure that all efforts are aligned towards achieving the same end goal.

Capacity optimization is a key component of achieving the desired state, but it’s important to remember that capacity optimization is a means to an end, not an end in itself. The end goal is to create business systems that can support the organization’s day-to-day operations and strategic goals.

To achieve the desired state, it’s important to have a holistic understanding of the system and its requirements. This includes a deep understanding of the organization’s goals and requirements, as well as the technical requirements of the system. It’s also important to consider factors such as scalability, reliability, and security.

One approach to achieving the desired state is to use technologies like a capacity optimization platform. This platform can help optimize the performance of containerized applications, ensuring that they run efficiently and reliably. By leveraging this technology, organizations can gain a competitive advantage and deliver better business outcomes.

The Fundamental Principles of Kubernetes Capacity Management

Roman Yegorov

Solutions Engineer, Intel Granulate

Promote Stateful Operations

Optimize Horizontal Pod Autoscaler (HPA)

Keep An Eye Out For “Waste”

Desired State Planning

Save on cloud costs.