Implementing Horizontal and Vertical Scalability in Cloud Computing

Approximately remain in this minutes read.

Implementing Horizontal and Vertical Scalability in Cloud Computing

Implementing Horizontal and Vertical Scalability in Cloud Computing
Written by

Airbnb, the popular peer-to-peer online marketplace and homestay network infrastructure serves 50M users with over 800,000 listings in 33,000 cities and 192 countries. In a discussion last year with Mike Curtis, VP of engineering at Airbnb, we learned that the Airbnb site and internal machine learning services are operated by a pool of 5000 EC2 instances which scale according to demand.

Startups and even individual developers all start out with a small service running from their laptops, hoping that one day the services and applications they offer will conquer the world and become the most popular thing on the Internet. And when that day comes, a single machine or even a single cluster simply won’t be able to handle such a large workload. If you plan to run your application on an increasingly large scale, you need to think about scaling in cloud computing from the beginning, as part of your planning process.

In this article we will describe both horizontal and vertical scaling implementation methods, including the advantages and downsides of each approach.

Horizontal Scaling (Scale In/Out)

Horizontal-ScalingWe need to allow our infrastructure to grow and shrink to meet the traffic and demand in terms of number of requests it is supposed to serve. For that purpose, scaling-out is a common approach when running on the cloud, leveraging it commodity hardware.

Think of horizontal scaling as a railway system that needs to transport more and more goods every day. In order to cope with the demand, you add locomotives to increase your throughput.

For example, let’s assume you have a Web application that will serve requests from the public Internet. Its Web server will eventually hit a limit in the number of requests it can handle and the amount of traffic it can serve, therefore you will need to add more servers that can handle a specific load.

Learn here how to manage the cloud to suit the workload.

Of course, you cannot just expect to add another instance without having to do some work in the background. Recognizing that manually setting up this ‘magic’ in the background would be very time consuming and cumbersome, the cloud providers already do this for you. They strive to make their cloud platform as simple as possible and provide you with the tools such as AWS autoscaling to allow you to scale out/in based on for example CPU utilization levels.

Let’s take a look at some of the ways possible today that enable scalability in cloud computing.

In the examples below we use cloud native orchestration and provisioning tools including OpenStack’s Heat or AWS’ Cloudformation together with the scaling tools in order to demonstrate users’ ability to easily scale complete stacks in the cloud.

Scaling Groups

A scaling group is a logical entity that you create, grouping a number of resources together in a single logical construct.

In OpenStack you would use this resource with a syntax similar to the following.

The constructs below defines an instance resource, and then groups that resource into a scaling group with the required number of instances that should be available, including how they should scale.

heat_template_version: 2015-04-30  
description: A simple auto scaling group.  

   type: OS::Nova::Server
     flavor: m1.small
       – network: internal001
     user_data_format: RAW
     user_data: |
       while [ 1 ] ; do echo “Hello World” 1>/dev/null 2>&1; done

   type: OS::Heat::AutoScalingGroup
     cooldown: 60
     desired_capacity: 2
     max_size: 5
     min_size: 1
       type: my_server

   type: OS::Heat::ScalingPolicy
     adjustment_type: change_in_capacity
     auto_scaling_group_id: { get_resource: my_group }
     cooldown: 60
     scaling_adjustment: 1


In this case, the scaling factor is based on a Ceilometer alarm that will monitor the average CPU utilization on the instance.  

As shown in the construct below, when the threshold is reached, it will spin up an additional instance and add it to the load balancer. When the load has subsided, the alarm will trigger again and remove the instances that are no longer needed.

   type: OS::Ceilometer::Alarm
     meter_name: cpu_util
     statistic: avg
     period: 60
     evaluation_periods: 1
     threshold: 50
       – {get_attr: [my_scaleup_policy, alarm_url]}
     comparison_operator: gt

The same can be applied on AWS using CloudFormation

Here you configure an auto scaling group with a minimum size of 2 and maximum size of 4, with a defined set of metrics and at what schedule (or granularity) the metrics will be collected.

 Type: “AWS::AutoScaling::AutoScalingGroup”
     Fn::GetAZs: “”
     Ref: “LaunchConfig”
   MinSize: “2”
   MaxSize: “4”
     – Ref: “ElasticLoadBalancer”
       Granularity: “1Minute”
         – “GroupMinSize”
         – “GroupMaxSize”

Here is a simple construct for defining an instance.

 Type: AWS::AutoScaling::LaunchConfiguration
   ImageId: my_ami
   – Ref: mySecurityGroup
   – myExistingEC2SecurityGroup
   InstanceType: m1.small

And here is a construct to define the metrics and thresholds for how the scaling group will grow.

Learn more about how to monitor EC2 instances.

As you can see – there is a difference in syntax as compared to the example above from OpenStack, but the basic principle is the same; create a scaling group based on a Server resource, define the minimum and maximum size and assign a scaling policy and metric to trigger the elasticity of the stack.

 Type: AWS::AutoScaling::ScalingPolicy
   AdjustmentType: ChangeInCapacity
     Ref: asGroup
   Cooldown: ‘1’
   ScalingAdjustment: ‘1’
 Type: AWS::CloudWatch::Alarm
   EvaluationPeriods: ‘1’
   Statistic: Average
   Threshold: ’10’
   AlarmDescription: Alarm if CPU too high or metric disappears indicating instance
     is down
   Period: ’60’
   – Ref: ScaleUpPolicy
   Namespace: AWS/EC2
   – Name: AutoScalingGroupName
       Ref: asGroup
   ComparisonOperator: GreaterThanThreshold

Load Balancing

In order to scale a Web application, you will need a component that knows how to route a request from a client to a server on the backend. This is the simplest form of a Web proxy.

One of the most popular popular load balancing applications in use today is HAproxy. In its simplest form, you define a frontend and a backend in your haproxy.cfg file.

frontend web
   bind *:80
   mode http
   default_backend webservers

backend webservers
   mode http
   balance roundrobin
   server web01
   server web02
   server web03


The frontend directive listens on a defined port, and in the backend directive, you put the IP address of the instance you want to redirect traffic to. There is a whole world of documentation for HAProxy with a substantial amount of configuration options available to you, including header manipulation and redirection. The number of options are well beyond the scope of this article.

But you don’t really have to worry about the nuts and bolts. As mentioned above, the cloud providers want to make it easy for you to consume their services and so all of the above is already built into the scaling group construct. You don’t have to deploy a load balancer or update the configuration, this is all performed in the background by the vendor orchestration engine for you.

AWS has their ELB (Elastic Load Balancer) and CloudFormation resources such as AWS::ElasticLoadBalancing::LoadBalancer that allows a simple way to configure a load balancer through a feature rich API.

There are other options, such as health checks for OpenStack (OS::Neutron::LBaaS::HealthMonitor) that allow you to perform periodic checks on your applications and automatically remove them from the load balancer pool if they stop working correctly.


Infinite Scale

In theory – if your application can scale properly in a horizontal fashion – you can use an infinite number of instances to enable limitless growth and serve an infinite number of requests.


Cloud providers are extremely accommodating to this deployment architecture and have tools in place already for you to use from day one. You do not need extensive knowledge of how to configure all the options, and how to balance traffic yourself (although knowledge never hurts).

The tools today make it very easy to scale your application.


Architectural Design

Designing your application properly is the biggest challenge you will encounter. So many different elements have to be accounted for, a good number of which are related to preserving state (cookies/persistence) and dealing with security (SSL). In addition, serious architectural design is required to create an application that is capable of 100% horizontal scaling. Although horizontal scaling is a goal that everyone aspires to achieve, not everyone can.

Databases and Persistence

An application does not existing a vacuum. Usually, you require somewhere to store information about your customer, what they have purchased, when and where. These details usually go into a database. Depending on the type of database you use, this may require additional architectural work to ensure that not only does your application scale, but that the data layer behind it also scales. Again, this is not a simple task, one that is further complicated when data expands over different regions, locations, and tiers.

Vertical Scaling (Scale Up/Down)

Vertical-ScalingGoing back to our train example, in this case as the amount of goods grows so does our single locomotive, including its parts. For example, by replacing its engine (CPU in case of a single instance) with a larger one to provide more horsepower.

Instead of creating additional copies of your application, you just add more resources to it you super-size it.


Software and Architecture stays the same

You do not have to make any changes to the way you write or design your software. Referring to the train analogy, if you only upgrade the single locomotive, you won’t have to change the way you distribute the goods (load) or add more lines to support multiple trains as in the horizontal scalability case.

The same is true for scaling up your resource, just add more RAM, CPU, disk space, and network throughput, and you are good to go. You do not make any modifications to the application itself only to the underlying instance and its size.


Finite Scale

There is a limit to how much you can grow. At a certain point, there are limits to how big a cart you can actually create or how many horses can actually pull it. Sometimes adding RAM, CPU, and disk space to an instance can only go so far before the application becomes the bottleneck, rather than the underlying resources it consumes.

Learn about performance based resource allocation here.


In order to add more resources, you will need to bring your application down. Very few physical servers allow you to add hardware resources on the fly (and they are extremely expensive), and the same applies for instances in the cloud. You cannot resize an instance while it is running you need to power it off.

You will need to plan a maintenance window for this activity and assure that you have enough running capacity to accommodate your current load, otherwise your current applications might suffer as a result of this resize.

Pets not Cattle

This is the antithesis of a cloud model, you are nurturing a pet you keep on feeding it more and more resources and taking care of it. It is quite probable that this resource will not change over time, meaning you must maintain the instance over time, including patches, fixes and improvements. This will become an added overhead for your Operations team, and they will have to maintain legacy deployments and applications.


When designing your application, you must factor a scaling methodology into the design to plan for handling increased load on your system, when that time arrives. This is should not be done as an afterthought, but rather as part of the initial architecture and its design.

There are many benefits to scaling horizontally and it is the methodology of choice, but that does not mean it is easy. On the contrary, designing an application to scale horizontally can be extremely challenging. Sometimes, vertical scaling is your only option, but you should be aware of the downsides that come with such a choice.

banner Calculating the Economics of a Private Cloud Banner - Stratoscale


November 16, 2016

Simple Share Buttons
Simple Share Buttons