Scalability in the cloud entails the managed process of increasing the number of capacity when the load increases. This can occur either by increasing the number of nodes, or by preserving the same number of nodes while increasing their physical capacity.

The Vertical Scaling Pattern requires downtime to reconfigure and “scale up” the hardware by increasing the capacity (CPU, memory, disk) of a node. Scaling up is also has limits since the node can only be scaled up so much due to its resource capability limit. Vertical scaling is the least common pattern of the scaling options.

Increasing the Virtual Machine (VM) nodes when load increases, and reducing the number of VMs when the stress on a system tails off below a certain level, is known as Horizontal Scaling Compute Pattern. “Scaling out” requires minimal or no downtime to reconfigure the number of nodes. In the end it provides a capacity that exceeds a single node using multiple nodes by increasing the number of nodes of same size and same configuration (homogeneous) to scale upon load. It’s used to handle capacity requirements that vary seasonally or due to unpredictable load spikes. Care must be exercised to not use sticky session or session state unless that state is commonly stored in a central location not bound to a particular VM instance.

The Horizontal Scaling Pattern can be implemented manually or automatically. Its often preferred to minimize the human intervention required to automate the horizontal scaling process via custom or standard auto-scaling rules. A standard rule can typically be written against CPU utilization, memory usage, average queue length, or average response times to continuously monitor a fluctuating resource. For example, you can set a rule such that is a customer query on a product takes more than 20 seconds to come back, then increase the number of VM instances hosting the database cluster by one.

Scaling out can be more than just a compute node and can be other resources, such as storage, queues, and other components that can grown and shrink dynamically. For instance, queues can be added or removed as needed. Databases can be sharded or consolidated as the growth occurs or scales back.

When scaling out a VM not you may want to pay attention to the “N+1” rule. This deploys N+1 new nodes when scaling up even though only N nodes are needed at one time. Doing this provides additional ready-to-go computing resources if a sudden spike occurs. It also provides an additional instance in case of hardware failure.