Back to blog

EC2 Auto Scaling: How It Works, Examples, and Key Challenges

Alon Roitman

Channels and Cloud Alliances Lead, Intel Granulate

What Is Amazon EC2 Auto Scaling?

Amazon Elastic Compute Cloud (EC2) Auto Scaling is a feature that ensures the right number of Amazon EC2 instances are available for an application’s load. You can create a collection of EC2 instances (an auto scaling group), specifying the minimum number of instances in the group—EC2 Auto Scaling will ensure the group always has enough instances. 

You can also specify the maximum number of instances for an auto scaling group—EC2 Auto Scaling will ensure the group stays within this limit. If you define the group’s target capacity when creating or editing it later, EC2 Auto Scaling will ensure the group always has the desired number of instances.

You can specify an auto scaling policy to launch and terminate instances based on the application’s changing demands. 

This is part of a series of articles about cloud optimization.

In this article:

What Are EC2 Auto Scaling Groups? 

An auto scaling group is a collection of Amazon EC2 instances with the same management and auto scaling policies. Auto scaling groups allow you to leverage scaling policy and health check capabilities. The main function of Amazon’s auto scaling service is to control the number of instances in each group and scale automatically. 

Each auto scaling group may have a different size depending on the desired number of EC2 instances (i.e., the target capacity). The size of a group is adjustable, allowing you to meet capacity demands manually or by automatic scaling.

Auto scaling groups launch the right number of instances to meet the specified capacity, maintaining the number of instances with regular health checks. They can maintain the same number of instances by tracking and replacing unhealthy instances. 

Auto scaling groups can launch both on-demand and Spot instances. When configuring the group to employ a specified launch template, you can set multiple purchase options for an auto scaling group.

AWS EC2 Auto Scaling Benefits with an Example 

Here is an example created by AWS and shared in the EC2 documentation, to demonstrate the benefits of EC2 Auto Scaling.

The example use case involves running a basic web application that allows employees to find conference rooms for virtual meetings. In this example scenario, the application’s usage is minimal during the beginning and end of the week. However, since more employees schedule meetings during the middle of the week, the demand increases during that time.

The graph below shows the usage of the application’s capacity over a week:

EC2 Auto Scaling application capacity over a week

Image Source: AWS

You can plan for changes in capacity by adding enough servers to meet peak capacity, ensuring the application always has the capacity needed to meet demand. However, this option means the application has more capacity than needed on some days, and this unused capacity increases the overall cost of running the application.

Alternatively, you can add enough capacity to handle the average demand. Since you are not purchasing equipment for occasional usage, this option is less expensive. However, it might result in a poor customer experience when demand exceeds capacity.

EC2 Auto Scaling solves this issue, enabling you to add new instances to your application when demand increases and terminate them when they are no longer needed. EC2 Auto Scaling employs EC2 instances, ensuring you can pay for actual usage. This feature helps create a cost-effective architecture that minimizes expenses while improving the customer experience.

EC2 Auto Scaling adjusting capacity

Image Source: AWS

Amazon EC2 Auto Scaling vs. AWS Auto Scaling: What Is the Difference?

AWS Auto Scaling provides a central location to manage the configuration of a variety of scalable resources, including EC2 instances, Elastic Container Service (ECS), DynamoDB tables, and Amazon RDS read replicas.

AWS Auto Scaling allows you to keep your EC2 auto scaling groups in configurable metrics. Developers can set dynamic DynamoDB read and write capacity units for specific tables based on resource utilization. You can configure an ECS service to start or end ECS tasks according to CloudWatch metrics. The same applies to Relational Database Service (RDS) Aurora read replicas—AWS Auto Scaling adds or terminates replicas according to utilization.

AWS Auto Scaling introduces the concept of a scaling plan that uses scaling policies to manage resource usage. The application owner can choose a utilization target, such as 60% CPU utilization, and AWS Auto Scaling adds or removes capacity to reach that goal.

How AWS Auto Scaling compares to EC2 Auto Scaling

AWS Auto Scaling is a simpler option for scaling multiple AWS cloud services according to your resource usage goals. On the other hand, EC2 Auto Scaling only focuses on EC2 instances, allowing developers to configure finer-grained scaling policies. 

Another major difference is that AWS Auto Scaling allows you to set goals like “add X EC2 instances when a metric passes the specified threshold” instead of having the developer configure individual actions. On the other hand, intensive use of EC2 Auto Scaling relies on predictive scaling and machine learning to determine the appropriate resources needed to maintain the utilization target for an EC2 instance.

EC2 Auto Scaling emphasizes flexibility, while AWS Auto Scaling emphasizes simplicity. Your choice depends on the features most relevant to your development and IT teams and looking to scale your cloud environment.

Amazon EC2 Auto Scaling Instance Lifecycle 

Each EC2 instance in an auto scaling group has a unique lifecycle or path. The lifecycle starts with the launch of an instance and ends with its termination. Here is an illustration of the changes to an instance throughout its lifecycle.

EC2 Autoscaling instance lifecycle

Image Source: AWS

Triggers for Scale Out

The following events tell the auto scaling group to launch more instances and attach them to the group::

  • You manually increase the group size.
  • You create an auto scaling policy to increase the group size based on demand. 
  • You schedule scaling to increase the group size at a specified time.

When the group launches EC2 instances, their state is “pending.” Once the instance is configured and has passed the EC2 health checks, it attaches to the group, and its state is “InService.”

If the group’s configuration accepts traffic from Elastic Load Balancing, Amazon EC2 Auto Scaling will automatically register the instance with the load balancer before marking it “InService.”

InService Instances 

Instances are “InService” until one of these events: 

  • A scale-in event terminates the instance to reduce the group size. 
  • You set the instance to “Standby.”
  • You detach it from the group.
  • The instance fails health checks.

Triggers for Scale In

The following events tell the auto scaling group to detach instances from the group and terminate them: 

  • You manually reduce the group size.
  • You set an auto scaling policy to reduce the group size based on demand. 
  • You schedule scaling to reduce the group size at a specified time.

You must create a matching scale-in event for all scale-out events to ensure your resources correspond to changing demand.

The auto scaling group’s policy determines the instances to terminate during a scale-in event. The state of instances during the termination process is “Terminating” and cannot return to service. Once termination is complete, the state is “Terminated.”

If the group’s configuration accepts load balancer traffic, EC2 Auto Scaling will automatically deregister terminating instances from the load balancer to ensure all requests reach other EC2 instances.

AWS EC2 Auto Scaling Challenges 

If a deployment to an EC2 instance in an auto scaling group fails, it could be for one of these reasons:

  • EC2 Auto Scaling is continuously launching and terminating the EC2 instance—this occurs when CodeDeploy cannot automatically deploy an application revision. You can address this by disassociating the auto scaling group from your CodeDeploy deployment group or changing its configuration to ensure the current state matches the desired capacity. It will prevent EC2 Auto Scaling from launching more instances. 
  • CodeDeploy is unresponsive—this occurs when the CodeDeploy agent is not properly installed, usually because the initialization scripts fail to run immediately upon the launch of the EC2 instance (i.e., if they take longer than an hour to run). After one hour, the CodeDeploy agent times out and cannot respond to a pending deployment. You can address this issue by moving the initialization scripts into the CodeDeploy application revision.
  • An instance in the auto scaling group reboots during the deployment—rebooting an EC2 instance or shutting down the CodeDeploy agent while processing the deployment command can cause the deployment to fail.
  • Several application revisions are deployed to a single EC2 instance—this can result in failure if a deployment has long-running scripts (over a few minutes). You should avoid deploying multiple application revisions to each EC2 instance in your auto scaling group.
  • The deployment of a new EC2 instance fails when launched in your auto scaling group—this occurs when a script that runs in the deployment blocks the launch of an EC2 instance. 
Optimize application performance.

Save on cloud costs.

Start Now
Back to blog