Back to blog

Kubernetes VPA: How It Works, HPA vs. VPA, and 7 Best Practices

Itay Gershon

Product Manager, Intel Granulate

What Is Kubernetes Vertical Pod Autoscaling (VPA)?

Kubernetes Vertical Pod Autoscaling (VPA) is a feature that automatically adjusts the resources allocated to Kubernetes pods, based on their actual resource usage. Unlike horizontal pod autoscaling, which scales the number of pod replicas, VPA scales the resource requests and limits of individual pods. 

By analyzing the resource usage patterns of the pods, VPA adjusts their resource requests and limits to ensure that they have sufficient resources to operate efficiently while avoiding over-provisioning of resources. This helps to optimize resource utilization and minimize costs while ensuring that applications are running smoothly. 

VPA is a custom resource definition in Kubernetes and a useful tool for managing resource-intensive workloads in Kubernetes clusters.

This is part of a series of articles about Kubernetes performance.

In this article:

How Does Kubernetes VPA Work? 

The components of VPA include the updater, admission controller, recommender, metrics server, and deployment. The VPA updater updates the resource requests and limits of pods, while the admission controller ensures that pods meet the VPA requirements. The recommender analyzes resource usage to make recommendations, and the metrics server provides resource usage data. Finally, the deployment manages the scaling and deployment of pods.

The VPA workflow starts with the user configuring the VPA autoscaling policy, which specifies the target resource utilization level for the pods. The recommender then reads this policy and analyzes the resource usage patterns of the pods to provide recommendations for adjusting their resource requests and limits. The updater then reads these recommendations and terminates the existing pod, which triggers the deployment to recreate a replica pod with updated resource requests and limits. 

During this process, the admission controller checks the resource requests to ensure they meet the VPA requirements and overwrites the recommendations if necessary. This cycle repeats continuously to maintain optimal resource utilization and minimize costs.

Related content: Read our guide to Kubernetes autoscaling

Kubernetes HPA vs. VPA 

Kubernetes HPA (Horizontal Pod Autoscaler) and VPA (Vertical Pod Autoscaler) are both tools used to automatically adjust the resources allocated to pods in a Kubernetes cluster. However, they differ in their approach and the resources they manage.

The HPA adjusts the number of replicas of a pod based on the demand and workload of the application running on it. It scales the number of replicas up or down based on the CPU utilization or custom metrics defined by the user. The HPA ensures that the application has enough resources to handle the workload and that the cluster can handle spikes in traffic.

The VPA, on the other hand, adjusts the CPU and memory requests and limits of the pods based on their actual resource usage. It ensures that each pod has the necessary resources to perform its tasks without wasting resources. The VPA optimizes the resource utilization of the cluster and helps to reduce waste.

Both the HPA and VPA can be used in conjunction with each other to optimize the resource utilization of a Kubernetes cluster. The HPA can ensure that the cluster has enough replicas of the pod to handle the workload, while the VPA can ensure that each pod has the necessary resources to perform its tasks efficiently.

New call-to-action

Vertical Pod Autoscaling Benefits and Limitations 

VPA offers several benefits to Kubernetes users:

  • It can improve cost efficiency and stability by automatically adjusting the resource allocation for individual pods based on their actual usage, ensuring that resources are only provisioned when needed. This can lead to significant cost savings while maintaining application stability. 
  • VPA promotes efficient use of cluster nodes by ensuring that pods are right-sized, with the appropriate number of resources allocated. This can reduce resource waste and improve overall cluster utilization. 
  • VPA eliminates the need for time-consuming tasks such as benchmarking, as it automatically adjusts resources based on actual usage. 
  • It makes it easier to maintain applications by providing automated resource management, freeing up resources for other tasks.

While VPA is a powerful tool, it does have some limitations:

  • One of the main limitations is that it cannot be used in parallel with Horizontal Pod Autoscaling (HPA), as both features may conflict with each other. Therefore, users must choose between using VPA or HPA, depending on their specific needs.
  • Another limitation is that VPA may recommend resources that are not available on the cluster. To address this issue, users can set the limit range to the maximum available resources on the cluster, which can prevent the recommender from suggesting resources that cannot be allocated. 

7 Kubernetes VPA Best Practices 

The key consideration when using VPA is to optimize capacity and prevent over- or under-utilization. 

Here are some Kubernetes VPA best practices to consider:

  1. Enable VPA for critical workloads: Enable VPA for workloads that are critical to the organization’s operations. This will help to ensure that these workloads have the necessary resources to perform their tasks and that they are not impacted by resource constraints.
  2. Set appropriate resource requests and limits: Set appropriate resource requests and limits for each pod based on its actual resource usage. This will help to optimize the resource utilization of the cluster and reduce waste.
  3. Monitor resource utilization: Regularly monitor the resource utilization of the pods and the cluster as a whole. This will help to identify any issues that may affect the performance of the VPA and the overall efficiency of the cluster.
  4. Use pre-defined policies or user-defined thresholds: Use pre-defined policies or user-defined thresholds to automatically adjust the resource requests and limits of pods. This will help to ensure that the resources are allocated appropriately based on demand.
  5. Test the VPA: Test the VPA regularly to ensure that it is functioning properly. Create load tests and simulate sudden spikes in demand to test the VPA’s ability to adjust the resource requests and limits of pods.
  6. Use in conjunction with other Kubernetes components: Use the VPA in conjunction with other Kubernetes components, such as the Horizontal Pod Autoscaler (HPA) and the Cluster Autoscaler, to optimize the resource utilization of the cluster.
  7. Consider application requirements: Consider the specific requirements of the applications running on the cluster when setting resource requests and limits. Some applications may require more resources than others, and it is important to ensure that they have the necessary resources to perform their tasks.
  8. Automate capacity optimization: For example, Intel Tiber App-Level Optimization provides automated capacity optimization capabilities that can significantly reduce the cost of Kubernetes (up to 45%), by continuously orchestrating resources to match actual usage. 

Learn more in our detailed guide to kubernetes vpa

Optimize application performance.

Save on cloud costs.

Start Now
Back to blog