Latest Entries »

I met with a customer today and the subject of Azure Scale Units (SU) and the limitations to scaling-up Azure VMs was discussed.  So I want to provide a simple definition of this issue and how to work around it.

Scale Units

Azure VMs that are part of the same Cloud service have free access to each other and share the load balancer and a virtual IP (i.e. http://myapp.cloudapp.net).  A Cloud Service is bound to a single Scale Unit, which Azure uses to scale out any of the VMs in a Cloud service. VMs can only be resized to a size supported in the Scale Unit (SU) where the VM is deployed. At times the word “stamp” is used to refer to the same concept as an Azure scale unit.  You can view a stamp or a scale sectioned off group of hardware in the DC that works together.

As new hardware becomes available MS builds it into a new SU. Currently there are five SU types. But keep in mind these will evolve and change as new hardware and new data centers are added to the Azure Cloud.  Scale Unit 1 is the oldest hardware in the DC, while SU 5 is the newest.

  • Scale Unit 1: A0-A4 (original VM sizes) Basic VMs (No LB, no Autoscale) and can only scale between A0-A4)
  • Scale Unit 2: A0-A7 (like SU1 but adds A5-A7)
  • Scale Unit 3: A8/A9 (“HPC” VMs, with Infiniband)
  • Scale Unit 4: A0-A7 and D1-D14 (D’s series and all A0-A7)
  • Scale Unit 5: G1-G5
  • How do you know which scale unit you are using?
  • Go into the VM/Configure ta
  • Click on VM Size and drop down all sizes.
  • If you see A0-A4 so you can tell what SU (#1) you are using. So you cannot scale up to anything above S4 in this case.

The Problem

Azure has generational hardware in its datacenters. Scale Units are groups of hardware resources in an Azure DC.  Due to the progressive rollout of constantly improving VM types (such as A8-A9), not all stamps support all of the new hardware. So if you are in an older stamp, and you try to scale up by increasing the VM type, you may or may not be able to do so. This depends up on the age and hardware functionality of the particular stamp to which your original VM is assigned. For instance, all the stamps will support A1, but not all support the new A8 and A9 VMs, or the D- and G-series VMs.

The Solution

There is not a portal setting or a  “scale-unit” PowerShell parameter to control the stamp to which your VM is assigned .  So what should you do?

>> If you want to allocate a new VM and make sure you can move up to bigger Scale Units in the future:

  • To ensure you get a SU that will meet your needs for scaling up, ensure the first VM deployed in that Cloud Service (or legacy Affinity Group) is in the upper range. So if you want SU2, deploy an A5 or above (A6 or A7) and you will be in SU2 at that point for all subsequent allocations.

>> If you want to move an existing VM to a new bigger size that is not in your current Scale Unit:

  • If you are in SU1 and need to move to a VM size that is not in SU1 (say A5-A7 in SU2) you can’t change it directly from the UI. So find the OS disk name in the Usage Overview
  • Delete the VM but be sure to choose “Keep the Attached Disks”
  • Go to VM/Disks and make sure that disk is not attached to any other VMs
  • Go to Gallery and create a new VM using that saved OS Disk and select the upgraded size to which you want to scale up

Note that once you allocate a G-series VM you can only change the VM size (scale up) to another G-series VM). If you want an A or D series VM you need to delete the VM, save the OS disk, etc. The D-series is similar in nature but also includes A0-A7 for SU #4.

Closing Thoughts

The key here is planning ahead of time what possible up-sizing could occur and in what tiers. It’s a part of procurement design.  “Sizing up” is not a very common process and typically it is done manually due to possibly undersized planning estimates.  So if you feel there is a good chance you are going to possibly eventually need a D-series, and are initially allocating planning on allocating a A-series, you should allocate a D series (recommend do the lowest of D series, D1, just to get the absolute lowest monthly usage cost). Once you allocate, drop down to the A series and run from there.  Later, if you need to scale up to a D series you have that capability to do so in that S4 scale unit.  You don’t have this option with the G-series Scale Units, which does not contains any D or A series options.

 

Azure MVP x 2!

Received my 2nd consecutive Azure MVP award today from Microsoft!   I love how Microsoft takes care of its partners… this is an awesome program!  Anyone considering becoming an MVP for sure it’s worth the time and effort.  For  some of my thoughts on becoming an MVP refer to “How to Become an MVP” https://michaelmckeownblog.wordpress.com/2014/04/22/how-to-become-an-mvp

And a special thanks to my manager Jeff Nuckolls and Aditi for supporting me in this program time and moneywise over the past year!

IMG_1877a microsoft-mvp-logo

Azure Scale Units

When scaling Azure based upon load, you can scale horizontally (add more compute instances) or vertically (increasing the size of the VM). Azure Auto-scaling functionality works with horizontal scaling.  For more details about these options see my previous blog post on “Cloud Scaling Patterns“.

Azure Scale Units (SU) apply to vertical scaling, where you are wanting to scale ‘up’ the unit on which your VM is partitioned.  As new hardware becomes available Microsoft builds them into a new “scale unit”. If scaling UP, there are some sizes you can’t scale up (resize) to depending upon what SU your VM is currently partitioned.  VMs can only be resized to a size supported in the SU where the VM is deployed. Currently there are five scale unit types:

  • Scale Unit 1: A0-A4 (original VM sizes) Basic VMs, no LB, no Autoscale, can only scale between A0-A4)
  • Scale Unit 2: A0-A7 (like SU1 but adds A5-A7)
  • Scale Unit 3: A8/A9 (“HPC” VMs, optimized networking with Infiniband)
  • Scale Unit 4: A0-A7 and D1-D14 (D’s series with SSDs and better CPUs, and all A0-A7)
  • Scale Unit 5: G1-G5 (monster powerful VMs up to 32 core Xeon CPUs/448 GB RAM/6596 GB SSD storage/64 data disks)

How to check the SU unit a VM is currently allocated

  1. VM/Configure tab
  2. Click on VM Size and drop down all sizes. Here, you see A0-A4 so you can tell what SU (#1) you are using. So you cannot scale up to anything above S4 in this case.

How to choose scale unit

  • To ensure you get a SU that will meet your needs for scaling up, ensure the first VM deployed in that Cloud Service (or legacy Affinity Group) is in the upper range. So if you want SU2, deploy an S5 or above (S6 or S7) and you will be in SU2 at that point for all subsequent allocations
  • When you deploy a smaller size, like A2, you can get put into many different scale units. You won’t necessarily be in SU1.

How to move to a bigger size VM that is not in your current SU:

  • If you are in SU1 and need to move to a VM size that is not in SU1 (say A5-A7 in SU2) you can’t change it directly from the UI
  1. Note the OS disk name in the usage overview
  2. Delete the disk but “Keep the Attached Disks”
  3. Go to VM/Disks and make sure that disk is not attached to any other VMs
  4. Go to Gallery and choose that OS Disk and select the upgraded size you want to upgrade it to.

Network Security Groups (NSG) are an improvement upon VM endpoint Access Control List (ACL) rules. They provide network segmentation within an azure VNET at the infrastructure management level. This is an improvement over the earlier endpoint protection where you can define an ACL to define rules and permit or deny ingress traffic to the public IP port. ACLs only apply to incoming traffic, whereas NSGs apply to both ingress and egress traffic.

  • You must first remove an ACL from any endpoints on a VM to use NSGs. The two (ACL and NSG) cannot co-exist on the same endpoint.
  • Each VM endpoint has both a private and a public port. The private port is used by other VMs in the subnet to communicate with each other and does not use the Azure load balancer. The public port is exposed to the implicit Azure load balancer for ingress Internet traffic. ACLs apply to the public port. For a VM endpoint you’ll need to ensure that the VM firewall allows data to be transferred using the protocol and private port corresponding to the endpoint configuration
  • Today there is no support in the portal for NSGs. It all needs to be done via the REST API or PowerShell.
  • NSGs offer full control over incoming/outgoing traffic into a virtual machine in a VNET
    • NSGs do not work with VNETs that are linked to an affinity group.
    • Filters requests before they hit VM (ingress) and before they go out to Internet
  • Provides a logical DMZ for database and app servers to be protected
    • Web proxies (mask actual incoming and outgoing IP addresses) and DNS servers are usually put in DMZ since they are exposed to the internet. NSGs can logically create this type of abstraction for your VNET.
  • NSG provides control over traffic flowing in and out of your Azure services in a subnet or a VNET as a set of ACL rules
    • Are applied to all VMs in a subnet in a VNET in the same region. VMs must be in a regional VNET.
    • NSGs can be applied to an individual VM as well.
    • To give additional layers of protection you can associate a NSG to both a VM and another NSG to the VMs subnet
      • Ingress traffic the packet goes through the access rules first specified in the subnet, then secondly by the VM NSG rules
      • Egress traffic goes through the VM rules first, then the NSG rules in the subnet.
    • You can now control access coming in and out of the Internet to your VMs for an entire subnet via a since ACL rule
    • ACL rules on multiple VMs can be quickly changed in one operation vs. having to go into the firewall on each VM individually. You make the changes at the NSG level and don’t even need to go into the VM.
  • Azure provides default NSG rules around the VNET, and access to the internet, that you can modify if needed. It also provides default ‘tags’ to identify the VIRTUAL_NETWORK, AZURE_LOADBALANCER and the INTERNET. These tags can be used in the Source/Destination IP address to simplify the process of defining rules.
  • Defined rules are prioritized in their execution (as in the case in most business rules engines) so that the most importantly prioritized rules are executed first.
  • You define the rules (in PowerShell or REST) using the following fields
    • Priority, Access Permissions, Source IP/ Source Port, Destination IP/Destination Port, Protocol

For more information on NSGs refer to “About Network Security Groups” at https://msdn.microsoft.com/library/azure/dn848316.aspx.

Well it’s official!  Even though you probably have read the article “Kardashians to Kick Bruce and their Affinity Groups to the Curb” in People magazine, you were not really sure what to believe until you saw the official word from Microsoft.  Check out the content “About Regional VNets and Affinity Groups” at https://msdn.microsoft.com/library/azure/jj156085.aspx.   Improvements to the Azure virtual network infrastructure that have made communication between different parts of the Azure DC very quick.  No longer do you need to explicitly group resources in close proximity to each other to minimize latency between items like VNETs, VMs, and storage.

 

 

Take a look at my new book on the topic of Azure Automation (part of the Azure Essentials series from Microsoft Press).  http://blogs.msdn.com/b/microsoft_press/archive/2015/03/06/free-ebook-microsoft-azure-essentials-azure-automation.aspx    This is the PDF format released ahead of Epub and Mobi, which will  be released later in the month.

Microsoft Azure Essentials: Azure Automation (ISBN 9780735698154), by Michael McKeown. This is the second ebook in Microsoft Press’s free Microsoft Azure Essentials series. Future ebooks will cover specific Azure topics, such as Azure Machine Learning, Azure Websites for Developers, and others.

ScottGu posted new info today on GA and features for the following. For more info on these go to https://weblogs.asp.net/scottgu/azure-machine-learning-service-hadoop-storm-cluster-scaling-linux-support-site-recovery-and-more.

  • Machine Learning: General Availability of the Azure Machine Learning Service
  • Hadoop: General Availability of Apache Storm Support, Hadoop 2.6 support, Cluster Scaling, Node Size Selection and preview of next Linux OS support
  • Site Recovery: General Availability of DR capabilities with SAN arrays
  • SQL Database: General Availability of SQL Database (V12)
  • Web Sites: Support for Slot Settings
  • API Management: New Premium Tier
  • DocumentDB: New Asia and US Regions, SQL Parameterization and Increased Account Limits
  • Search: Portal Enhancements, Suggestions & Scoring, New Regions
  • Media: General Availability of Content Protection Service for Azure Media Services
  • Management: General Availability of the Azure Resource Manager

 

My good buddy and ex co-worker Michael Collier has co-authored a very well-done book on Essentials of Azure. Michael is a great communicator and his passion for community and Azure comes out in his content.  You can download it free at http://blogs.msdn.com/b/microsoft_press/archive/2015/02/03/free-ebook-microsoft-azure-essentials-fundamentals-of-azure.aspx. It’s like finding a $100 bill in your wallet unexpectedly.  A great read for not only the beginner but those who want a deeper understanding of the constantly changing Azure platform. Great job, Michael (as always)!

Eventual Consistency Patterns
In the next few blog entries we will examine three popular patterns around the eventual consistency of data – Compensating Transaction pattern, Database Sharding pattern, and the Map-Reduce Pattern. Each requires an agile partitioning of data into commonly formatted segments to yield benefits such in performance, access, scalability, concurrency, and size management. The tradeoff in the eventual consistency of the data means that all segments of the data may not always be showing the same values to all consumers at any given time, which occurs in a strongly consistent data architecture. However, the strongly consistent data model does not always map well to a Cloud data architecture where data can be distributed in many geographic locations and in multiple types of storage. The transactional requirement that strongly consistent data requires the same view of all values at any given time comes with a price of overhead due to locking. When locks need to span large distances to maintain transactional boundaries the latency and blocking that may occur might not be acceptable in the Cloud for your data architecture. Thus the choice of having the data eventually consistent is often a better choice in the Cloud.

Eventual vs. Strong Data Consistency
Before we look at patterns let’s get a good understanding of eventual consistency. To those coming from a traditional ACID enterprise data world, the view that data does not always have to carry the “C” (consistency) part of ACID it may be a totally new concept that is hard to fathom. Let’s start out with strong consistency since that is the paradigm that you are likely most familiar.
Strong data consistency (SC) presents the same set of data values are seen by all instances of the application at any time. This is enforced by transactional locks to prevent multiple instances of an app from concurrently changing these values – only the lock holder can change the value.

All updates to any strongly consistent data are done in atomic transactional manner. So in a Cloud where data is often replicated across many locations the use of strong consistency might be complex and slow since no atomic updates are completed until all of the replicated updated data is done updating. Due to the networked and distributed nature of Cloud resources causing a large number of failures in a Cloud environment the model of strong consistency is not very practical.

It’s best to implement strong consistency only in apps that are not replicated or split over many data stores or many different data centers, such as a traditional enterprise database. If data is stored across many data stores use eventual consistency.
Just a note that if you are storing your data in Azure storage that does not enforce transactional strong consistency for transactions that span multiple blobs and tables.

Eventually data consistency (EC) is used to improve performance and avoid contention in data update operations. This is not a simple and straightforward model to use. In fact, if possible to architect an application to use the native transactional features for update operations – then do that! Only use eventual consistency (and the compensating operations) when necessary to satisfy needs that a strongly consistent data story cannot.

A typical business process consists of a series of autonomous operations. These steps can be performed in all sequences or partially in parallel. While they are being completed that overall data may be in an inconsistent state. Only when it all operations are complete is the system consistent again. EC operations do not lock the data when it is modified.

EC typically makes better sense in a Cloud since it typically uses data replicated across many data stores and sites in different geographies. These data stores do not have to be databases as well. For instance, data could be stored in the state of a service. Data may be replicated across different servers to help load balancing. Or it could be duplicated and co-located close to the services and its users. Locking/blocking only effectively works when all data is stored in the same data store, which is commonly not done in a Cloud environment.

You trade strong consistency for attributes such as increased availability (since no locking/blocking to prevent concurrent access) when you design your solutions around the notion of eventual consistency. With EC, your data might not be consistent all of the time. But in the end it will be – eventually. Data that does not use transactional semantics only becomes eventually consistent when all replicas have been updated through synchronization.

Compensating Transaction Pattern
So now that you understand a bit of the difference around the two consistency models you probably want to know more about how the “C” eventually compensates to make the data eventually consistent. This pattern is best used for Eventually Consistent operations where a failure in the work done by an eventually consistent operation needs to be undone by a series of compensating steps. In the ACID “all-or-nothing” strongly consistent transactional operations, a rollback with is done via all resource managers (RM) involved in the transaction. In a typical two-phase commit, each of the RMs update their part of the transaction in the first phase (locking the data in the process). When all of the RMs are complete they all vote into the main transaction coordinator. It all vote “Commit”, the RMs then commit their individual resources. If one or more votes “Abort”, all the RMs roll-back their piece of the transaction to its original state.

But with eventually consistent operations, there is no supervising transactional manager. Eventually consistent operations are a non-transactional sequence of steps with each step committing totally once it’s complete. If a failure occurs then all partial changes up to the point of failure should be rolled back. From a high level this rollback is done the same as in a transactional strongly consistent environment. But at a lower level the process is very individualized and not coordinated among the different steps automatically via resource managers.
It’s not always the case of simply overwriting a changed value with the original value to compensate. This is because the original value may have been changed a few more times before the rollback needs to occur. Replacing the current value with the original value may not be the right thing to do in that case.

One of the best solutions for a compensating transaction is to run it all in a workflow. And in some cases the only way to recover from a non-transactional eventually consistent failed operation is via manual intervention. These compensating operations often are specific and customized to the user and the environment. In fact, the sequence of the strongly consistent operations do not need to be the same necessarily when a failure occurs and the compensation process is started.

The steps in a compensating transaction should also be idempotent since they can fail and have to be rerun again over and over. Part of this is it’s not always easy to tell when an eventual consistency operation has failed. Unlike an ACID operation when individual resource managers check in with the transaction coordinator and update it when one of them fails (to begin the transactional rollback) this does not occur on an eventually consistent operation.

In subsequent blogs on Eventual Consistency patterns, we will address the Database Sharding and the Map-Reduce patterns.

Scalability in the cloud entails the managed process of increasing the number of capacity when the load increases. This can occur either by increasing the number of nodes, or by preserving the same number of nodes while increasing their physical capacity.

The Vertical Scaling Pattern requires downtime to reconfigure and “scale up” the hardware by increasing the capacity (CPU, memory, disk) of a node. Scaling up is also has limits since the node can only be scaled up so much due to its resource capability limit. Vertical scaling is the least common pattern of the scaling options.

Increasing the Virtual Machine (VM) nodes when load increases, and reducing the number of VMs when the stress on a system tails off below a certain level, is known as Horizontal Scaling Compute Pattern. “Scaling out” requires minimal or no downtime to reconfigure the number of nodes. In the end it provides a capacity that exceeds a single node using multiple nodes by increasing the number of nodes of same size and same configuration (homogeneous) to scale upon load. It’s used to handle capacity requirements that vary seasonally or due to unpredictable load spikes. Care must be exercised to not use sticky session or session state unless that state is commonly stored in a central location not bound to a particular VM instance.

The Horizontal Scaling Pattern can be implemented manually or automatically. Its often preferred to minimize the human intervention required to automate the horizontal scaling process via custom or standard auto-scaling rules. A standard rule can typically be written against CPU utilization, memory usage, average queue length, or average response times to continuously monitor a fluctuating resource. For example, you can set a rule such that is a customer query on a product takes more than 20 seconds to come back, then increase the number of VM instances hosting the database cluster by one.

Scaling out can be more than just a compute node and can be other resources, such as storage, queues, and other components that can grown and shrink dynamically. For instance, queues can be added or removed as needed. Databases can be sharded or consolidated as the growth occurs or scales back.

When scaling out a VM not you may want to pay attention to the “N+1″ rule. This deploys N+1 new nodes when scaling up even though only N nodes are needed at one time. Doing this provides additional ready-to-go computing resources if a sudden spike occurs. It also provides an additional instance in case of hardware failure.

Follow

Get every new post delivered to your Inbox.

Join 25 other followers