What Is Kubernetes?
Kubernetes is an open source platform for managing and orchestrating Linux containers. It is commonly used to deploy and manage elements of a microservices architecture, manage containers in the public cloud, and set up private cloud environments.
DevOps, developers, IT architects, and FinOps teams use Kubernetes to automatically deploy, scale, maintain, schedule, and operate groups of containers, known as pods, on a cluster of physical or virtual machines, known as nodes. By organizing containers as a cluster, they can be managed more efficiently and resources can be automatically provisioned to the workloads that need them most.
As Kubernetes becomes a mainstream platform for running all types of workloads, the costs of provisioning and operating Kubernetes clusters becomes a major concern for IT organizations. This article will help you understand the cost of running Kubernetes on managed platforms, how to use automated scaling mechanisms to reduce costs, and additional methods to optimize the cost of production Kubernetes clusters.
In this article:
- Kubernetes Pricing on Different Platforms
- 4 Ways to Optimize Kubernetes Cost
- What Is the Impact of HPA and VPA on Kubernetes Resource Costs?
Kubernetes Pricing on Different Platforms
Amazon EKS Pricing
Amazon Elastic Kubernetes Service (EKS) lets you run and scale Kubernetes applications on-premises or in the cloud. It provides secure, high-availability clusters and automates node provisioning, updating, and patching tasks.
Each EKS cluster costs $0.10 per hour—one cluster can run multiple applications using Kubernetes namespaces and IAM policies. You can use Fargate or EC2 to run Amazon EKS in the cloud or use Outposts to run it on-premises:
- With Amazon EC2—you pay for the AWS resources that run the Kubernetes worker nodes.
- With Fargate—you pay according to memory and vCPU resource usage.
- With Outposts—you pay for the EKS cluster deployment in the cloud, while the worker nodes run on Outposts EC2 capacity at no extra cost. Learn more in our detailed guide to EKS pricing.
Amazon ECS Pricing
Amazon Elastic Container Service is free for AWS users and offers scalability and automation when running your application architecture. You pay for the AWS resources that run and store the application. You pay only for the resources you use when you use them—there is no upfront or minimum fee:
- With Fargate—you pay for the memory and vCPU resources the container-based application requests.
- With Outposts—you pay the same as in the cloud, with the control plane running in the cloud, while the container instances use Outposts EC2 capacity at no extra charge. Learn more in our detailed guide to ECS pricing.
Azure Kubernetes Service (AKS) eliminates the complexities of running and managing Kubernetes, letting you focus on the workloads. AKS offers the following pricing plans:
- Free tier—you can use AKS for free for one year.
- Pay-as-you-go pricing plan—you can use the standard AKS plan to try out different services without a full commitment.
- One-year reserved instances—you can reserve one year of AKS services to ensure predictable prices.
- Three-year reserved instances—you can commit to three years to increase your savings (up to 65%).
- Spot instances—you can leverage unused Azure capacity to save up to 88%. However, these are only suitable for clusters and workloads that tolerate interruptions. Learn more in our detailed guide to AKS pricing (coming soon).
Google Kubernetes Engine (GKE) is a Kubernetes management platform for deploying, scaling and managing containerized and stateless applications. It offers a standard or autopilot mode with various pricing options (both have flat fees of $0.10 per hour per cluster). The costs cover memory, CPU, scheduled pods, and storage resources.
GKE has the following pricing options:
- Free tier—gives you monthly credits worth $74.40.
- Commitment-based discounts—you commit to using GKE for one or three years.
- Spot VM discounts—you can cut costs by using Spot virtual machines, with discounts of over 60% compared to standard pay-per-use prices. However, they are not safe for workloads that don’t tolerate disruptions. Learn more in our detailed guide to GKE pricing (coming soon)
Red Hat OpenShift is a Kubernetes management platform for multi-cloud, edge, and hybrid deployments. It offers the following pricing options:
- Cloud services—you pay for reserved instances starting from $0.076 per hour. Pricing is according to a three-year contract (4vCPU).
- Self-managed—pricing varies according to subscription and sizing choices. These are standard, entry-level, and flagship.
- Dedicated pricing—you pay based on variables such as node sizes, availability zones, and cloud configuration. You pay for services on an hourly, annual, or three-year basis. Learn more in our detailed guide to OpenShift pricing (coming soon).
Related content: Read our guide to Kubernetes pricing (coming soon)
4 Ways to Optimize Kubernetes Cost
1. Downsizing Your Clusters
You can reduce costs by decreasing the number and size of your cluster. You might delete a whole cluster or nodes within a cluster. Visualizing the utilization of Kubernetes resources helps identify and scale down unallocated resources. Cutting underutilized resources is the easiest way to cut costs.
You can connect Kubernetes to a tool like Prometheus for monitoring. A Kubernetes cost management solution should provide a single-pane view of resource data, allowing you to find idle resources. A major advantage of Kubernetes is that it lets you specify your needs to the master node or control plan—Kubernetes will handle the requests automatically.
Tools like Granulate’s capacity optimization solution also enable a high level of monitoring that allow full visibility into K8s clusters and a simplified, end-to-end overview of Kubernetes objects, including namespaces, deployments, containers, and more.
The downside is that waste accumulates and resources can become orphaned—Kubernetes might provision resources and leave them unused. The breakdown between workload and cluster managers means that these cluster-level optimizations are usually best for infrastructure, platform, and CloudOps teams—however, any team that manages cluster provisioning, usage, and operations can use them.
Downsizing clusters is an effective strategy for any level of organization. You can get the most out of this approach by using an auto-scaler.
2. Rightsizing Your Workloads
While downsizing is a method of reducing unallocated resources, rightsizing involves minimizing the cost of idle resources. In this case, you might not look for fully unutilized resources. You would typically look for underutilized resources at the pod level. This approach allows you to move your workloads and create a more accurate profile of the compute resources required to run your nodes.
Rightsizing generally increases the density of pods and further optimizes resource usage across nodes. It requires understanding your past workload or usage patterns. Armed with this information, you can determine if your provisioned resources match your computing demands.
Suppose you have an average CPU utilization of 50%—this indicates that you might not need the compute level you originally envisioned. Thus, you could change your configuration to a more cost-effective one with fewer computing resources.
Challenges of rightsizing
The flipside of rightsizing workloads is that you must ensure Kubernetes allocates the right resources to each node. For example, the CPU usage could be 50%, but memory is still low—in this case, replacing the underutilized resource with a smaller resource is not enough. You must completely change the request/limit profile to lower the relative CPU usage and increase the memory parameters.
However, over-provisioning of memory and CPU is the more common problem. Either way, choosing the appropriate worker node for the specific workload you want to handle is important.
How to rightsize
Like downsizing, rightsizing requires understanding your utilized, unassigned, and idle costs. You can connect to a tool like Prometheus to help visualize this information and focus on identifying idle costs. Next, you decide the best resource sizes for your workloads. Alternatively, you might move the workloads to improve cost-effectiveness by eliminating idle resources.
Resizing is often a technically complex task—it requires an effective system for tracking usage and cost metrics. For example, you might plug into Prometheus to obtain key infrastructure metrics. Expert teams can then move your workloads without disrupting them, separating the overhead safety net from your company’s projected usage to meet your cost and performance objectives.
Optimizing costs at the workload level is usually best for DevOps and application teams. Still, any team can apply optimizations when managing the provisioning, operation, and usage of a Kubernetes workload.
Rightsizing is a clear cost-cutting strategy for teams that want to use existing resources more efficiently. You can benefit more from this strategy using an auto scaling solution such as Vertical Pod Auto Scaler. This tool handles optimizations at the macro level and lets you focus on fine-tuning at the micro level.
Achieving the cost-performance sweet spot in dynamic environments can be especially challenging, unless you have access to an autonomous Kubernetes cost optimization tool. Tools like this enable rightsizing of resources automatically and according to the dynamic needs of each workload deployment.
3. Running Kubernetes Nodes on Low-Cost Spot Instances
Spot instances help you save money with Kubernetes, but it is important to determine how well you can implement them before relying on Spot capacity.
Consider the following questions to decide if a workload is suited to Spot instances:
- How long will it take the workload to complete a job?
- Is the workload mission-critical or time-critical?
- Can the workload tolerate disruptions?
- Which tools can move the workload when your provider disconnects the Spot instances?
4. Auto Scaling with Cluster Autoscaler, HPA, and VPA
You won’t need to manually track and optimize everything in Kubernetes, so an automated scaling approach is often the easiest option. An auto scaler lets you decide when to provision additional resources and when to terminate them. You set the minimum and maximum limits for your resource configurations, ensuring the system doesn’t accidentally scale up or down too much.
Kubernetes has auto scaling mechanisms that automatically scale entire clusters, pods, or workloads:
- Cluster Autoscaler
- Horizontal Pod Autoscaler (HPA)
- Vertical Pod Autoscalers (VPA)
- Kubernetes Event-Driven Autoscaler (KEDA)
To automatically scale a workload using predefined metrics, you might use a pod or workload auto scaler (e.g., HPA, VPA, KEDA). If your workload or pod usage exceeds the threshold set for a given metric, the auto scaler will increase the pod resource limits or add more pods. Likewise, if resource utilization is too low, it will scale the pods down.
The only limits to your ability to scale workloads and pods are the available resources on the host node. The auto scaling stops when the node’s resources reach the specified limit.
Auto Scaling for Pods vs. Nodes
Pod scaling impacts the resource provisioning within a node or cluster, but this scaling approach only determines how existing resources are divided between all workloads and pods.
By contrast, node scaling gives pods more resources overall, by scaling up the entire cluster.
On the other hand, pod scaling better utilizes the available resources, while node scaling determines the type and amount of available resources.
The Kubernetes Cluster Autoscaler detects when a pod is pending (waiting for a resource) and adjusts the number of nodes. It also identifies when nodes become redundant and reduces resource consumption.
Using auto scaling in Kubernetes typically requires cloud automation. If you apply an auto scaling policy to AWS, Google Cloud, or Azure, you can also take advantage of auto scaling in Kubernetes. Auto scaling a cluster helps optimize your Kubernetes costs by setting appropriate limits and controls to address resource consumption issues such as zombie resources and snowballing costs.
Horizontal Pod Autoscaler (HPA)
For most workloads, usage changes over time. This means you might need to run more or less replicas of the same pod. You can use HPA to automatically scale these workloads.
The Horizontal Pod Autoscaler is a great tool for scaling stateless applications. However, it can also be used to support scaling of StatefulSets, a Kubernetes object that manages a stateful application together with its persistent data.
The HPA controller monitors the pods in the workload to determine if the number of pod replicas needs to change. HPA determines this by averaging the values of a performance metric, such as CPU utilization, for each pod. It estimates whether removing or adding pods would bring the metric’s value closer to the desired value specified in its configuration.
Using HPA can reduce the cost of workloads, by reducing the number of active nodes as loads decrease.
Learn more in our blog about Kubernetes HPA
Vertical Pod Autoscaler (VPA)
When a pod temporarily requires more resources, operators might be tempted to increase the pod’s resource requirements in a static manner. This solves the short term problem, but in the long term, wastes CPU or memory resources and limits the number of nodes the pod can run on.
VPA is a mechanism that increases or decreases CPU and memory resource requirements of a pod to match available cluster resources to actual usage. VPA only replaces pods managed by a replication controller. So it requires use of the Kubernetes metrics-server.
A VPA deployment consists of three components:
- Recommender—monitor resource utilization and estimates desired values
- Updater—checks if the pod needs a resource limit update.
- Admission Controller—overrides resource requests when pods are created, using admission webhooks.
Using VPA ensures that pods are only assigned more resources if they actually need them, conserving cluster resources and reducing costs in the long run.
Learn more in our detailed guide to Kubernetes cost optimization (coming soon)
What Is the Impact of HPA and VPA on Kubernetes Resource Costs?
HPA and VPA are two Kubernetes scaling mechanisms that can be used to optimize the use of resources in Kubernetes. If configured correctly, they can significantly reduce Kubernetes infrastructure costs.
HPA and VPA can only reduce costs on one of two conditions:
- There are other workloads you need to run on the cluster, and by conserving resources, you can use the spare capacity on nodes to run additional resources.
- You have a mechanism for down-scaling the cluster if less nodes are needed. This can be done with an automated mechanism like Kubernetes Cluster Autoscaler, or manually.
If these conditions are not met, these scaling mechanisms can conserve resources in the cluster, but this will result in idle nodes you will still need to pay for.