Auto Scaling

* Note: This blog post was also published on 90min’s tech blog.

For the past few months we’ve been working on a project to migrate‘s infrastructure to a fully automated auto-scaling one. In the following article you’ll join me in the long journey we had: From designing the auto-scaling pipeline, through modifications we had to make to our Ruby on Rails deployment and to the actual results – a major reduction of 40% in our AWS bills and calmer nights and weekends.


Like every other web site, has its bell shaped traffic patterns with low traffic in the morning and high traffic in the evening.

Bell Shaped Traffic Pattern
Bell Shaped Traffic Pattern

The thing that distinguishes from other web sites is its rapid growth. We break traffic records almost every week.

In the months before implementing auto-scaling in our infrastructure, we were constantly having service interruptions because of new traffic records which choked our X constant machines on AWS. We were raising new machines as fast as we could (which was approx. 10 minutes) to meet the new traffic demands every week.

The problem was that we were afraid to put those servers down after the peaks because we did not know what was going to happen the next day. We might have an even higher traffic rate than the new record broken today. And so, with every traffic peak and service interruption we found ourselves adding more and more servers we were then afraid to put down fearing the next day traffic. Our AWS bill more than doubled itself and we felt something has to be done to properly manage our resources on AWS.

Another thing that made us feel uncomfortable with our infrastructure is its fragility. We counted on a static fleet servers to always be up and running and to serve our content 24/7. Each server having a failure or even scheduled AWS instance restarts required human intervention to keep our service up.

We wanted to have a fully automated recovery whenever something bad happens. We wanted a 24/7 babysitter for our AWS machines which is always available to replace failed instances as if nothing happened.

Auto Scaling – What is it?

Auto Scaling is the art of automatically adjusting computation power according to measurements you impose on your system so that the minimum computation power is always used.
Or, at least, this is how I see it🙂

When we say “minimum computation power” we have to bear in mind that the service:

  1. Must always stay up
  2. Must respond as fast as possible or within reasonab

As I see it, auto-scaling is good for 4 main reasons:

  1. Service robustness – Automatically replaces failed instances not passing a certain health check
  2. Constant response times – Even when traffic volume changes
  3. Cost reduction – As minimum resources are always used
  4. Confidence – When auto-scaling backs me up, I feel a lot more peaceful

There are also cases in which I would not recommend using auto-scaling:

  1. If you have very sudden traffic spikes auto-scaling might not be your cure. It takes at least 2 minutes from when your system detected it should auto-scale until it actually have adjusted its computation power accordingly. It means that if you have a sudden 10 seconds traffic spike auto scaling will not be able to react in a timely manner.
  2. If your system is too small you might not want to invest in the time to make it auto-scalable. The benefit of auto-scaling grows with the size of the system and with the volume of traffic it serves. If your system is small enough, it might not worth your while.

How Does It Work?

The general schema of auto scaling is as follows:

Scale-up flow on AWS
Figure 1. Scale-up flow on AWS

On the left side there is an ELB (Elastic Load Balancer) which has some EC2 instances behind it.
When we scale up we actually add more instances to that ELB so that we have more computation power under that endpoint.
Let’s go over the process which makes this happen:

  1. Metrics are constantly going from the ELB and the EC2 instances themselves into AWS CloudWatch.
    CloudWatch is AWS’s monitoring platform. It receives data from within AWS or custom data from outside. It visualises the metrics into graphs and allows you to define alarms when a metric has crossed a threshold.
  2. When a certain metric goes above/below the threshold an alarm is triggered CloudWatch.
    For example, you can set an alarm when the average CPU usage of your instances is above 60%.
    Alarms can trigger a “Auto Scaling Policy”.
  3. An Auto Scaling Policy is how we scale up or down. It can be set to add a constant number of servers (add 5 servers for example) or to add a certain percentage of our current number of servers (add 20%).
    We can have a few Auto Scaling Policies to handle different alarms. An example usage is: add 20% of current servers is CPU > 50%, but add 75% of current servers if CPU > 85%. This way we can scale up/down faster or slower, depending on the circumstances.
  4. Auto Scaling Policy decides to launch X instances from an AMI. An AMI is an image that should make your service available on boot (more on this later).
    Of course it may take it some time to boot and run the proper processes but eventually it should run the service.
  5. The ELB performs health check on the newly created instances. For example, the ELB might send HTTP request to /ping on your instance. You should verify that an instance responds with 200 OK to the /ping only when it is up and running with your service.
  6. When the instances pass the health check they are added behind the ELB and start serving requests like all other instances in your service.

Making Our Service Auto-Scalable is mostly a Ruby on Rails project. As such, it needed some modification to its architecture and deployment to support auto-scaling.

Static Asset Serving

On a typical Rails project deployment, assets are going through a process of compilation which minifies, uglifies and concatenates Javascript and CSS files. Each file gets an md5 inserted into its name in order to have older versions of assets available.

When we had a static fleet of servers each of the servers had a copy of all assets and all their versions. We served static assets from the same machines that served our Rails application. On each deployment we added the newly generated assets to all servers.
With auto-scaling, machines regularly go up and down. Since every server is considered temporary, you can’t rely on any of them to hold the whole history of your assets. Our web machines couldn’t be the address for static assets anymore. We needed an independent repository for our static assets.

A part of moving to auto-scaling was to change the way we handle assets compilation and serving. Now we perform assets compilation on our Jenkins server, then upload them to an S3 bucket., our static assets host, leads directly to this bucket and so our Rails machines do not hold static assets at all.

Bootable Image

Auto Scaling on Amazon works closely with the concept of AMIs. An AMI is an image that an instance can be created from. As we saw in step 4 Figure 1, when we scale up we actually instantiate an AMI into an instance. This instance, like every other linux machine, goes through a “boot process” which is a series of commands that are triggered when the machine starts.

When you work with Auto Scaling, one of the basic things you’ll need is an AMI that can be booted into a working application server. A typical Rails app boot sequence might include:

  1. Pull the right code version from git
  2. Run bundle install to verify all gems are correctly set
  3. Run your application server (unicorn, puma …)
  4. Run any other auxiliary processes – we run a Logstash agent for example.

Another important thing to pay attention to is the version of the application that this instance will be running. It must be the same version as other machines on that ELB.
To make things clearer, if I have an ELB with 10 instances running v5 of my application, I’d like all machines that are added via Auto Scaling to run v5. If I deploy v6, I’d like that new machines will run v6 etc.

To sum it up the challenge is to be able to create an AMI that on boot will start running the current version of our application. This is how we create this AMI (we call it “Base AMI”):

Creating AMI with
Figure 2. Creating AMI with

We have a Jenkins job which runs a recipe. is an image bakery tool which integrates well with Amazon. When it runs, it attaches an EBS volume (which is a complicated word for hard drive on amazon) and provisions this volume to our needs. It has many provisions – from shell script to Puppet and Ansible. We use Chef at so it was natural for us to use the Chef provisioner on Packer. The Chef recipe takes an EBS volume containing a clean Ubuntu OS and:

  1. Installs our application dependencies – Ruby, NginX, MongoDB / MySQL client libs etc…
  2. Downloads master branch from Github and runs bundle install so we have all gems installed on the Base AMI
  3. Setups the boot sequence so that when the AMI is being booted, it will start running the right version of our Rails application.
    We use Upstart in order to manage the boot sequence.

After the run, an AMI is saved to AWS. This AMI will run our application when booted. But how does it know what version of the code to run?
3 simple steps:

  1. At the end of each deployment we create a Git Tag to the deployed commit.
  2. We write the name of the created tag to a file called deployed_tag on AWS S3.
  3. When a machine boots, it queries this file for the latest tag and uses it to pull the right version of the code from git.


When we came to see how we deploy our Rails application to the auto-scalable infrastructure we had two main options: Either we create an AMI for each version or we have a single base AMI and deployment updates existing machines. Let’s have a deeper look into these two paradigms and how they would effect our deployment and development process:

AMI Per Version Paradigm

In this paradigm, whenever we want to deploy a new version we create an AMI that contains the relevant version of the code. When we deploy, we replace all currently running machines with new machines spawned from the new AMI. When we auto scale, we don’t have the problem of detecting which version we should run since it already exists on the AMI itself.
There are 2 main advantages for this method of deployment:

  1. No restart process is required on production machines: If on each deployment we replace all of our currently running machines, we don’t need to perform any restart operations on existing machines in order to load new code. Specifically in Rails, restarting large applications is a very expensive operation, that might cause service hiccups if the machine does not have sufficient memory and CPU resources.
  2. Boot sequence is shorter: When a machine is spawned via auto-scaling it does not need to sync with the code since the code already exist in the AMI. This makes the boot sequence shorter which is crucial for auto-scaling operation – the shorter the boot time, the faster our infrastructure reacts to traffic changes.

Sadly, nothing is perfect in this world and there are also some big cons to the AMI-per-version deployment method:

  1. Creating an image is a long process: No matter what we tried to do it took us 6 minutes from the time we had the EBS volume ready after Packer run, until the time we could make instances out of the AMI. After this we had to wait for the new machines to boot which is another 2-3 minutes. In, where deployments occur a few times per hour, 9 minutes is a lot of time.
  2. Deployment is more complicated: It is far more complicated to replace dozens of running servers with completely new ones, than to just update the code on existing machines. We tried to use Netflix’s Asgard which is supposed to orchestrate this operation but we found it lacking of documentation and we didn’t want to build our deployment process around a tool we’re not sure of.

Single Base AMI Paradigm

The other method of deployment is to update code on the running machines. On this paradigm we have a single base AMI which, in contrary to the AMI paradigm, does not contain any specific code. Instead, the code is being pulled when the machine boots up. This way, we don’t have to create an AMI per version, we can instead mark the version we want to run somehow (S3 bucket for example) and have the machine query for it and pull it when it boots up.

As with every paradigm, there are pros and cons to this one.


  1. Fast deployments: Since there is no need to generate an AMI for each version and no need to wait for new machines to boot.
  2. Deployment is less complicated: We don’t replace a whole fleet of machines, we update code and restart processes on existing machines.


  1. Deployment restarts processes: Especially in large Rails applications, restarting processes consumes a lot of resources. Practically, our machines need twice the RAM in order to restart gracefully, which obviously costs more money.
  2. Longer AMI boot sequence: Since we have one base AMI, and not AMI per version, the instance has to pull the code when it boots. This delays scale-up operation.
  3. We still need to build a base AMI periodically: Even though we refer to a “one base image”, this isn’t the case in reality. We do have to build new a base image from time to time to have a more close-to-current version on our base AMI. This might include gems that were added to the project, system packages that were installed and so on.
  4. Multiple versions problem: We have to prevent a situation where an instance is spawned in the middle of a deployment and fetches the older version of the code:
    Multiple Versions Deployment Flow
    Multiple Versions Deployment Flow
    1. Version V1 is on production
    2. We start deployment of V2
    3. An instance is launched, queries for currently running version and gets V1 in response
    4. Deployment is finished

Now we have all instances with V2, and one instance running V1.

In order to get around this problem, we stop all auto scaling operations before starting the deployment and resume them when deployment ends. So we have a window of few minutes in which we don’t launch or terminate instances but that terminates the risk of having multiple versions running simultaneously.

Single AMI VS AMI per vesion
Single AMI VS AMI per vesion

Our Decision

As you can already understand, we went on single base AMI version. Our main consideration was deployment time which is a lot shorter when updating existing servers. AMI creation + waiting for new fleet of servers to boot is a long process to have per deploy when you deploy as much as we do at 90min.


Bootstrapping an auto-scaled environment is a very complicated task. It involves a lot of small configurations and a lot of different AWS resources: ELBs, Cloud Watch Alarms, Scaling Policies, AMIs, Auto Scaling Groups and so on. Since the setup is so complicated, we wanted to have the following:

  1. A documentation of all infrastructure parameters
  2. A repeatable process to bootstrap new environments
  3. Have it all on version control

Luckily, AWS offers a great service called Cloud Formation. It allows you to define AWS resources and dependencies among them via JSON files. It can even accept parameters and fill them into the JSON template. An example to such definition could be a template that receives an AMI ID and:

  1. Defines an Auto Scaling Group ASG-Web running the given AMI
  2. Defines an Auto Scaling Policy POL-Web that adds 50% of current number of machines in ASG-Web
  3. Define an Alarm on ASG-Web which states that if ASG-Web average CPU is above 60% run POL-Web to scale up

So you can see now how easily you can state your desired resources and have AWS orchestrate it all for you. No need to remember how things were set up. No need to investigate when alarms thresholds have been changed. It is all documented and version controlled.


We’ve been running auto scaling for two months now and the first significant outcome is our AWS bill which dropped in almost 50%. We don’t need to have the computation power to handle traffic peaks at all times anymore. We can settle for a lower baseline of computation power and have auto-scaling add machines as needed.

A great graph to look at is the following which shows request count (in orange) against number of servers (in blue).

Requests Count Vs. # Of Servers
Requests Count Vs. # Of Servers

* NOTE: Number of machines (blue line) is not on the right scale. This is because of AWS Cloudwatch limitations. Consider this as a trend graph rather than an absolute graph.

Lastly there is an influence to auto scaling that can’t be shown in a graph – The amount of confidence I have when I go home every evening that nothing will bother me at night. Since auto-scaling is on, I know that even if something unpredictable happens – machines failures, traffic spikes and what not – there is something out there, watching that everything is within reasonable thresholds and manages my infrastructure as needed.

Sure, it was a long and hard journey to go through. Even now, a few months later we’re performing modifications and fine tunings to our thresholds. We also paid the learning price when auto scaling didn’t work as expected and we got underpowered in resources which lead to service interruption. We have to always keep our metrics and thresholds fresh to be able to auto scale wisely.
Nevertheless, I feel our infrastructure today is much more mature, stable and flexible than it ever was.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s