Kubernetes Autoscaling: Methods and Best Practices
What is Kubernetes Autoscaling?
Kubernetes provides a set of features to ensure that clusters are appropriately sized to handle any type of load. There are several autoscaling tools provided by Kubernetes—the most important are Horizontal Pod Autoscaling (HPA), Vertical Pod Autoscaling (VPA), and the Cluster Autoscaler, which performs Kubernetes node autoscaling.
Developers use Kubernetes to efficiently scale applications up and down, to provide a good experience to users while making effective use of computing resources. You can design the capacity of your cluster based on the expected load on your workloads. But imagine that a service is rapidly growing—this runs the risk of running out of computing resources, slowing down services, and hurting user experience.
Manually allocating resources can slow your response to the changing needs of applications. Kubernetes autoscaling can help by providing multiple layers of autoscaling, which ensure that each pod and cluster has enough performance to meet current requirements:
- Pod-based scaling with HPA and VPA.
- Node-based scaling with Cluster Autoscaler—the cluster automatically scales up when needed and scales back down to its normal size when the load decreases.
This is part of a series of articles about Kubernetes performance.
In this article:
- Why Do You Need Autoscaling in Kubernetes?
- Kubernetes Autoscaling Methods
- Best Practices for Kubernetes Autoscaling
Why Do You Need Autoscaling in Kubernetes?
Consider a scenario where you don’t use Kubernetes’ autoscaling feature. Whenever your needs change, you have to provision resources manually and scale them down later. You end up paying for excess capacity, or services fail because you don’t have enough resources to handle the load.
It is true that you can manually increase the number of pod replicas, but this trial-and-error approach is not sustainable in the long run. Even if administrators can perform the scaling operation quickly, they cannot do it immediately like autoscaling tools can, leading to service disruption and user frustration. Kubernetes autoscaling features can help overcome these challenges.
Related content: Read our guide to kubernetes HPA
Kubernetes Autoscaling Methods
Kubernetes is inherently extensible. There are many tools that allow you to scale your application and its underlying infrastructure based on demand, efficiency, and many other metrics. Let’s review the main Kubernetes autoscaling mechanisms.
Cluster Autoscaler is a Kubernetes tool that increases or decreases the size of a Kubernetes cluster (adding or removing nodes) based on the number of pending pods and node utilization metrics.
Cluster Autoscaler cycles through two main tasks:
- Monitors pods that cannot be scheduled.
- Calculates whether all currently deployed Pods can be merged into a smaller number of nodes.
Autoscaler checks for pods in a cluster that cannot be scheduled on an existing node because it lacks CPU or memory resources, or because the pod’s node affinity rules, tolerances or taints do not match those of the existing node. If there are pods in the cluster that cannot be scheduled, the autoscaler checks the managed node pool to see if adding more nodes will unblock the pods. In this case, more nodes will be added, if autoscaling policies allow increasing the node pool size.
The autoscaler also discovers nodes in the node pool it manages. If a node has pods that can be rescheduled on other available nodes in the cluster, the autoscaler will evict that node before deleting it.
Vertical Pod Autoscaler (VPA)
VPA automatically sets container resource requests and limits based on usage. VPA is designed to reduce the maintenance overhead of resource requests and container limits configuration, improving cluster resource utilization.
- Decrease the request value for containers whose resource usage is consistently lower than the specified amount.
- Increase the request value for containers that have a consistently high percentage of requested resources.
- Resource limits are automatically set based on the limit demand percentage specified as part of the container template.
Many applications that have varying loads over time might need to add or remove pod replicas. HPA can automatically manage the scaling of these workloads.
For HPA-configured workloads, the HPA controller monitors the pods in the workload to determine if the number of pod replicas needs to change. In most cases, the controller uses a per-pod utilization metric value, to calculate whether adding or removing replicas will bring the current value closer to the target value.
By default, HPA uses CPU utilization as its metric, but it can also use custom or external metrics—such as values related to network traffic, memory, or pod applications. HPA can also use external metrics, which are pod-independent values.
Best Practices for Kubernetes Autoscaling
Make Sure that HPA and VPA Policies Don’t Clash
The Vertical Pod Autoscaler automatically scales requests and throttles configurations, reducing overhead and reducing costs. By contrast, HPA is designed to scale out, expanding applications to additional nodes. Double-check that your VPA and HPA policies do not conflict with each other.
Use VPA Together With Cluster Autoscaler
The best way to configure a VPA is to use it together with Cluster Autoscaler. VPA’s recommendation component can recommend resource request values that exceed available resources. This creates resource pressure and may cause some pods to hang. Enabling cluster autoscaler mitigates this behavior by starting new nodes in response to pending pods.
Ensure all Pods Have Resource Requests Configured
HPA makes scaling decisions based on observed CPU utilization values, for pods that are part of the Kubernetes controller. Utilization values are calculated as a percentage of individual pod resource requests. Missing resource request values for some containers can cause HPA controller utilization calculations to fail, resulting in unoptimized operational and scaling decisions.