Category: Azure Admistration


I met with a customer today and the subject of Azure Scale Units (SU) and the limitations to scaling-up Azure VMs was discussed.  So I want to provide a simple definition of this issue and how to work around it.

Scale Units

Azure VMs that are part of the same Cloud service have free access to each other and share the load balancer and a virtual IP (i.e. http://myapp.cloudapp.net).  A Cloud Service is bound to a single Scale Unit, which Azure uses to scale out any of the VMs in a Cloud service. VMs can only be resized to a size supported in the Scale Unit (SU) where the VM is deployed. At times the word “stamp” is used to refer to the same concept as an Azure scale unit.  You can view a stamp or a scale sectioned off group of hardware in the DC that works together.

As new hardware becomes available MS builds it into a new SU. Currently there are five SU types. But keep in mind these will evolve and change as new hardware and new data centers are added to the Azure Cloud.  Scale Unit 1 is the oldest hardware in the DC, while SU 5 is the newest.

  • Scale Unit 1: A0-A4 (original VM sizes) Basic VMs (No LB, no Autoscale) and can only scale between A0-A4)
  • Scale Unit 2: A0-A7 (like SU1 but adds A5-A7)
  • Scale Unit 3: A8/A9 (“HPC” VMs, with Infiniband)
  • Scale Unit 4: A0-A7 and D1-D14 (D’s series and all A0-A7)
  • Scale Unit 5: G1-G5
  • How do you know which scale unit you are using?
  • Go into the VM/Configure ta
  • Click on VM Size and drop down all sizes.
  • If you see A0-A4 so you can tell what SU (#1) you are using. So you cannot scale up to anything above S4 in this case.

The Problem

Azure has generational hardware in its datacenters. Scale Units are groups of hardware resources in an Azure DC.  Due to the progressive rollout of constantly improving VM types (such as A8-A9), not all stamps support all of the new hardware. So if you are in an older stamp, and you try to scale up by increasing the VM type, you may or may not be able to do so. This depends up on the age and hardware functionality of the particular stamp to which your original VM is assigned. For instance, all the stamps will support A1, but not all support the new A8 and A9 VMs, or the D- and G-series VMs.

The Solution

There is not a portal setting or a  “scale-unit” PowerShell parameter to control the stamp to which your VM is assigned .  So what should you do?

>> If you want to allocate a new VM and make sure you can move up to bigger Scale Units in the future:

  • To ensure you get a SU that will meet your needs for scaling up, ensure the first VM deployed in that Cloud Service (or legacy Affinity Group) is in the upper range. So if you want SU2, deploy an A5 or above (A6 or A7) and you will be in SU2 at that point for all subsequent allocations.

>> If you want to move an existing VM to a new bigger size that is not in your current Scale Unit:

  • If you are in SU1 and need to move to a VM size that is not in SU1 (say A5-A7 in SU2) you can’t change it directly from the UI. So find the OS disk name in the Usage Overview
  • Delete the VM but be sure to choose “Keep the Attached Disks”
  • Go to VM/Disks and make sure that disk is not attached to any other VMs
  • Go to Gallery and create a new VM using that saved OS Disk and select the upgraded size to which you want to scale up

Note that once you allocate a G-series VM you can only change the VM size (scale up) to another G-series VM). If you want an A or D series VM you need to delete the VM, save the OS disk, etc. The D-series is similar in nature but also includes A0-A7 for SU #4.

Closing Thoughts

The key here is planning ahead of time what possible up-sizing could occur and in what tiers. It’s a part of procurement design.  “Sizing up” is not a very common process and typically it is done manually due to possibly undersized planning estimates.  So if you feel there is a good chance you are going to possibly eventually need a D-series, and are initially allocating planning on allocating a A-series, you should allocate a D series (recommend do the lowest of D series, D1, just to get the absolute lowest monthly usage cost). Once you allocate, drop down to the A series and run from there.  Later, if you need to scale up to a D series you have that capability to do so in that S4 scale unit.  You don’t have this option with the G-series Scale Units, which does not contains any D or A series options.

 

Advertisements

Take a look at my new book on the topic of Azure Automation (part of the Azure Essentials series from Microsoft Press).  http://blogs.msdn.com/b/microsoft_press/archive/2015/03/06/free-ebook-microsoft-azure-essentials-azure-automation.aspx    This is the PDF format released ahead of Epub and Mobi, which will  be released later in the month.

Microsoft Azure Essentials: Azure Automation (ISBN 9780735698154), by Michael McKeown. This is the second ebook in Microsoft Press’s free Microsoft Azure Essentials series. Future ebooks will cover specific Azure topics, such as Azure Machine Learning, Azure Websites for Developers, and others.

This is the seventh and last post in the Microsoft Azure Administration mini-series hosted primarily off the Aditi Technologies blog. It is about updates to Microsoft Azure Websites. It parallels my just-released Pluralsight course entitled Microsoft Azure New Administration Features (March 2014)

My company, Aditi Technologies, is working in parallel with me to discuss the admin “topic of the week” in this series on Microsoft Azure New Administration Features. So no need to rehash the work they are doing. For a highlight of the Azure Websites information that is in the actual Pluralsight course, go to http://blog.aditi.com/cloud/learn-from-the-expert-microsoft-azure-websites.

This is the sixth post in the Microsoft Azure Administration mini-series hosted primarily off the Aditi Technologies blog about Big Data with Microsoft HDInsight. It parallels my just-released Pluralsight course entitled Microsoft Azure New Administration Features (March 2014)

My company, Aditi Technologies, is working in parallel with me to discuss the admin “topic of the week” in this series on Microsoft Azure New Administration Features. So no need to rehash the work they are doing. For a highlight of the Azure HDInsight information that is in the actual Pluralsight course, go to http://blog.aditi.com/cloud/learn-from-expert-azure-hdinsight.

This is the fourth post in the Microsoft Azure Administration mini-series hosted primarily off the Aditi Technologies blog. It is about Microsoft Azure BizTalk Services and it parallels my just-released Pluralsight course entitled Microsoft Azure New Administration Features (March 2014)

My company, Aditi Technologies, is working in parallel with me to discuss the admin “topic of the week” in this series on Microsoft Azure New Administration Features. So no need to rehash the work they are doing. For information around Azure BizTalk Services go to http://blog.aditi.com/cloud/learn-expert-windows-azure-biztalk-services/.

This is the third post of substance in the my new Microsoft Azure Administration mini-series hosted primarily off the Aditi Technologies blog that parallel my just-release Pluralsight course entitled “Microsoft Azure Administration New Features (March 2014)”.

Azure Management Services help you better manage your Azure services.
• The Azure Service Dashboard which gives you a 10,000 foot status view of all the global Azure provided services.
• Are you worried about exceeding spending limits each month on your Azure bill and don’t have the time or desire to check your current costs on a regular basis? Using the Billing Alert Service means you don’t have to fret about rising bill costs like you used to. And since Billing Alerts in an integrated component of Management Services we will cover them both in unison so it all makes sense as a synergistic management solution
• Another type of alert, Service Alerts, allows you monitor a Windows Azure service based on a metric value and become notified when that value reaches a certain limit. New alerts have just been added for SQL Azure database and Azure Web sites of which you can now take advantage.
• The new Auto-scaling feature in preview allows us to scale up instances of Cloud Service and IaaS VMs, and we will look at the later in this module
• Operation Logs give you a record of what operations transpired within your subscription. All of these features help you better manage your Azure Services and give you more control over what’s going on from both a technical and spending prerogative. We will open up some Operation Logs and see how to filter/view the log messages of what has gone on within our system at the Admin level.
• Web endpoint status allows Azure to monitor the exposed endpoints on both your PaaS Cloud Service and your IaaS VMs and we will see how that works and how to configure it.

If you go into my course on Pluralsight (Microsoft Azure Administration New Features (March 2014).) you will find a lot more information on Microsoft Azure Management Services plus be able to watch videos of how to use these concepts. In addition, the course includes a wealth of information and demos on the Azure Scheduler, Azure Recovery Services, Azure Traffic Manager, Azure BizTalk Services, HDInsight, and improvements to Azure Storage and Web Sites. Hope to see you there!

This is the second module of substance in the my new Microsoft Azure Administration mini-series hosted primarily off the Aditi Technologies blog that parallel my just-release Pluralsight course entitled “Microsoft Azure Administration New Features (March 2014)”.

Traffic Manager enables you to improve the availability of your critical applications by monitoring your hosted Azure services. It provides automatic routing to alternative replica sites based upon your choice of three load balancing methods that applies an intelligent policy engine to the DNS queries on your domain names. These entities are performance, failover, and round robin algorithms. We’ll talk about all three of these shortly in more detail.

Azure allows you to run cloud services in datacenters located around the world. Traffic Manager can manage your traffic in different ways based upon what routing emphasis you tell it to use. It can improve the responsiveness of your applications and content delivery times by directing end-users to the cloud service that is closest to them (in terms of network latency). Or when one cloud service is brought down, perhaps for maintenance, Traffic Manager will route user traffic to the other available cloud services that you define in the Traffic Manager profile. This helps you to maintain and upgrade your services without downtime. Or if you are having no problems such as a down server or performance, you can do simple round-robin load disbursement to balance the load evenly among two or more nodes in the configuration.

So this gives you a cursory introduction to ATM. But for you to be able to leave this module with more than just a few conversational facts about ATM we need to first go back to school and talk about HA/DR in the Cloud. Without that the ATM concepts will not be applied to your real-life solutions. To not do this would be like visiting France without learning a few basic French words, such as food, bathroom, wine, and airport. So let’s first talk about HA/DR and see how it applies to Cloud architectures. Only with this basic understanding of Cloud HA/DR will you be able to use ATM optimally for your solution.

High Availability \ Disaster Recovery
I want to park on this slide for a few minutes because I think it’s really important to have some idea of what it means for solutions to be highly available and to recover from a disaster. The reason I say that is when I ask customers if they are prepared for temporary and large-scale failures, most say they are. However, before you answer that question for yourself, does your company rehearse these failures? Do you test the recovery of databases to ensure you have the correct processes in place? Chances are probably not. That’s because successful DR starts with lots of planning and architecting to implement these processes. Just like many other non-functional requirements, such as security, disaster recovery rarely gets the up-front analysis and time allocation required. Also, most customers don’t have the budget for geographically distributed datacenters with redundant capacity. Consequently even mission critical applications are frequently excluded from proper DR planning.
Azure provide geographically dispersed datacenters around the world. These platforms also provide capabilities that support availability and a variety of DR scenarios. Now, every mission critical Cloud application can be given due consideration for disaster proofing of the system. Windows Azure has resiliency and DR built into many of its services. These platform features must be studied carefully and supplemented with application strategies.
Note a full discussion of HA/DR in Azure would take us a few hours to work through. But in the next few slides I will touch on some of the factors to make you aware of them. You also need to have an awareness of your HA/DR strategy to properly use the Azure Traffic Manager to meet the RTO (Recovery Time Objective) and RPO (Recovery Point Objective) requirements of your application.
The recovery time objective is the maximum amount of time allocated for restoring application functionality. This is based on business requirements and is related to the importance of the application. Critical business applications require a low RTO.

The recovery point objective is the acceptable time window of lost data due to the recovery process. For example, if the RPO is one hour, then the data must be completely backed up or replicated at least every hour. Once the application is brought up in an alternate datacenter, the backup data could be missing up to an hour of data. Like RTO, critical applications target a much smaller RPO.

Key Factors – The implementation of the application needs to factor in the probability of a capability outage. It also needs to consider the impact it will have on the application from the business perspective before diving deep into the implementation strategies. Without due consideration to the business impact and the probability of hitting the risk condition, the implementation can be expensive and potentially unnecessary. Determining factors of RTO, RPO, and budget help you outline a strategy that works for you and your applications. And all applications in your portfolio most likely will not be treated the same with respect to HADR due to cost management.
High Availability – A highly available cloud application implements strategies to absorb the outage of the dependencies like the managed services offered by the cloud platform. In spite of possible failures of the Cloud platform capabilities, this approach permits the application to continue to exhibit the expected functional and non-functional systemic characteristics as defined by the designers. A highly available application absorbs fluctuations in availability, load, and temporary failures in the dependent services and hardware. The application continues to operate at an acceptable user and systemic response level, as defined by business requirements or application service level agreements.

Consider an automotive analogy for high availability. Even quality parts and superior engineering does not prevent occasional failures. For example, when your car gets a flat tire, the car still runs, but it is operating with degraded functionality. If you planned for this potential occurrence, you can use one of those thin-rimmed spare tires until you reach a repair shop. Although the spare tire does not permit fast speeds, you can still operate the vehicle until the tire is replaced. In the same way, a cloud service that plans for potential loss of capabilities can prevent a relatively minor problem from bringing down the entire application. This is true even if the cloud service must run with degraded functionality.

Disaster Recovery –
Unlike the temporary failure management for high availability, disaster recovery (DR) revolves around the catastrophic loss of application functionality. For example, consider the scenario where one or more datacenters go down. In this case you need to have a plan to run your application or access your data outside of the datacenter. Execution of this plan revolves around people, processes, and supporting applications that allow system to function. The level of functionality for the service during a disaster is determined by business and technology owners who define its disaster operational mode. That can take many forms from completely unavailable to partially available (degraded functionality or delayed processing) to fully available.

A cloud deployment might cease to function due to a systemic outage of the dependent services or the underlying infrastructure. Under such conditions, a business continuity plan triggers the disaster recovery (DR) process. This process typically involves both operations personnel and automated procedures in order to reactivate the application at a functioning datacenter. This requires the transfer of application users, data, and services to the new datacenter. This involves the use of backup media or ongoing replication.

Consider the previous analogy that compared high availability to the ability to recover from a flat tire through the use of a spare. By contrast, disaster recovery involves the steps taken after a car crash where the car is no longer operational. In that case, the best solution is to find an efficient way to change cars, perhaps by calling a travel service or a friend. In this scenario, there is likely going to be a longer delay in getting back on the road as well as more complexity in repairing and returning to the original vehicle. In the same way, disaster recovery to another datacenter is a complex task that typically involves some downtime and potential loss of data. To better understand and evaluate disaster recovery strategies, it is important to define two terms: recovery time objective and recovery point objective.

If you go into my course on Pluralsight (http://pluralsight.com/training/courses/TableOfContents?courseName=microsoft-azure-administration-new-features) you will find a lot more information on Microsoft Azure Traffic Manager plus be able to watch videos of how to use these concepts. You will also learn how to use Azure Traffic Manager to load balance incoming traffic across multiple hosted Windows Azure Cloud services and Web sites whether they’re running in the same datacenter or across different datacenters around the world. By effectively managing traffic, you can ensure high performance, availability and resiliency of your applications. You will see some basic ATM concepts, then spent a good amount of time understanding High Availability/Disaster Recovery in the Cloud and how you need to plan intentionally for that. We will also examine three algorithms you can use for load-balancing: failover, round-robin, or performance. In addition, the course includes a wealth of information and demos on the Azure Scheduler, Azure Recovery Servcies, Azure Management Services, Azure BizTalk Services, HDInsight, and improvements to Azure Storage and Web Sites. Hope to see you there!

This is the first module of substance in the my new Microsoft Azure Administration mini-series hosted primarily off the Aditi Technologies blog that parallel my just-release Pluralsight course entitled “Microsoft Azure Administration New Features (March 2014)” .

There are two main components to the Recovery Services functionality. The first is called Backup Recovery Services and is used to do automated backups of data primarily from on premises into the Azure Cloud. The advantage from a security and Disaster Recovery standpoint is that the backups are stored off site from your data center in the safe Azure Cloud storage. Recall that Azure storage gives you automatic 3x free replication within that same data center as well. Before the data is transmitted it is encrypted as it is when it is stored in azure storage. Backups are done incrementally to allow point—in-time recovery. It does incremental backups also to improve efficiency, minimizes transfer time, and reduces storage costs.

The second main part is of Azure Recovery Services is the Hyper-V recovery manager (HVRM). This works in conjunction with System Center Virtual Machine Manager to backup Virtual Machines to the Azure Cloud. So why would you be interested in the Hyper-V Recovery Manager Azure feature? The Hyper-V Recovery Manager is used for Disaster Recovery (DR). When you talk about High Availability (HA) you talk about running your app on a single data center so that your app stays running but possibly in a degraded mode or slower response. But with HA your app does not go down. When we refer to DR, there is failure across the VM either in pieces or as a whole where your entire data center goes down. So you need a solution that brings up your app in a secondary data center in a short amount of time so your business is not affected significantly. HVRM will help you manage your DR from one DC to another. HVRM is the go-between to manage this process

HVRM works with Hyper-V Replica in Windows Server 2012. A host running VMs in a DC can replicate all the VMs into another DC. The challenge is most customers have many VMs so you have to orchestrate the order and timing in which VMs come up in the new DC during DR. You could use piece together a PowerShell or System Center Orchestrator that you could put together a DR solution process. But this is sort of complex so HVRM gives a simple solution to this problem for both the VM and the data it uses. As long as the data is in a VM (in a VHD) you can replicate data transparently.

HVRM is in Azure and use the same Azure account to manage your on premises VM. HVRM gives you a custom orchestrated way for recovery. Recovery scripts that normally run in your primary DC can now run in azure. Azure will monitor the two DCs and orchestrate the DR if one of them goes down. You install a HVRM agent on the VMM host machine (not on all the VMs in the group) and pushes metadata to Azure (but your data stays in the DC). The metadata is what you see on the VM console in Azure that is sent to Azure regularly.

Before we can fully understand recovery services there are a few important Azure recovery services concepts to be understood, such as Azure storage vaults, their associated certificates, and the backup agent.

An Azure “vault” is a logical abstraction onto Azure blob storage. When choosing to backup data in the Azure Cloud (blob storage), either through Backup Recovery Services or Hyper V Recovery Manager, you must create a backup vault in the geographic region where you want to store the data. A vault is created using Windows Azure PowerShell or through the Azure portal using a “Quick Create” mechanism.

You do not set the size of an Azure vault. Since a vaults maps to Azure page blob storage, the limit of the entire vault is set at 1 TB, which is the size limit of an Azure page blob. But the limit on actual stored data on an Azure backup vault is capped at 850GB. That’s because with vaults there is metadata associated with the backup, and it consumes around 150 GB of storage if the blob was completely full. Thus, this leaves about 850 GB of actual storage space. You can have more than one server using an azure storage vault. It’s up to you how you want to architect your storage layer.

Vaults required you to register an X.509 v3 certificate with your servers that are using backup vaults. You can obtain one of these certificates by getting a valid SSL certificate issued by a Certificate Authority (CA) that is trusted by Microsoft and whose root certificates are distributed via the Microsoft Root Certificate Program. Or alternatively you can create your own self-signed certificate using the MAKECERT tool. You download the latest Windows SDK to get access to this tool, then run it with a command similar to this one.

This creates a certificate and installs into the Local Computer \ Personal certificate store.
makecert.exe -r -pe -n CN=recoverycert -ss my -sr localmachine -eku 1.3.6.1.5.5.7.3.2 -len 2048 “recoverycert.cer”

To upload the certificate to the Windows Azure Management Portal, you must export the public key as a .CER formatted file. And whether you purchase or build your own self-signed certificate you end up with a certificate in .CER file format which mean it does not contain the private key. The certificate must live in the Personal certificate store of your Local Computer with a minimum key length of at least 2048 bits.

When the certificate is installed on the server to be backed up it should contain the private key of the certificate (PFX file), in the case of Hyper-V Recovery Services. So if you will be registering a different server than the one you used to make the certificate, you need to export the .PFX file (that contains the private key), copy it to the other server and import it to that server’s Personal certificate store.

The high-level certificate management steps differ a bit if you are using the certificate for a Backup Vault or a Hyper-V Recovery Vault.
Backup Vault certificate management process:
1. Create or obtain a .CER certificate
2. Upload the .CER file to Windows Azure portal Recovery Services vault

Hyper-V Recovery Vault certificate management process:
1. Create or obtain a .CER certificate
2. Export is as a .PFX file
3. Upload the .CER file to Windows Azure portal Hyper-V Recovery vault
4. Import the .PFX file onto the VMM servers to be backed up

Azure Backup Recovery services requires an agent to be installed on any source machines for a file transfer operation to a backup vault. An agent is a piece of software that runs on the source client machine to manage what is uploaded to the Azure cloud for backup. Note that the Hyper-V recovery Manager does not require an agent since it copies an entire VM’s (its metadata), whereas Backup services can back up as little as one file.

The backup agent is downloaded by connecting to the Azure portal from the server to be backed up. You will go to a specific vault and click on Install Agent under Quick Glance. There are two versions from which to choose which one you will install. The tool you will use to manage the backup will determine which one you choose.
• Windows Server 2012 and System Center 2012 SP1 – Data Protection Manager
• Agent for Windows Server 2012 Essentials
After the installation of the agent is complete you will configure the specific backup policy for that server. To do this you can use the following tools. Again, whatever tool you will use will determine which version of the backup agent you install on that server.
• Microsoft Management Snap-In Console
• System Center Data Protection Manager Console
• Windows Server Essentials Dashboard

If you go into my course on Pluralsight (http://pluralsight.com/training/courses/TableOfContents?courseName=microsoft-azure-administration-new-features) you will find a lot more information on Recovery Services plus be able to watch videos of how to use these concepts. You will also learn how to manage Azure Backup Services, see how to take and restore backups, Hyper-V Recovery Managera and its use of Hyper-V vaults. In addition, the course includes a wealth of information and demos on the Azure Scheduler, Azure Traffic Manager, Azure Management Services, Azure BizTalk Services, HDInsight, and improvements to Azure Storage and Web Sites. Hope to see you there!

My company, Aditi, and I are publishing an eight-part series on some of the newer administration features of Microsoft Azure. We will publish weekly off the Aditi site and I will provide a summary and link to it from this blog each week. So if you are following my blog you will get regular updates when it is published. Take a look at the initial overview post describing all that we will discuss over the next eight weeks. If you want more details and demos you can watch the course content (which the blog posts parallel) at the Pluralsight.com web site. Search for courses with my last name (McKeown) or for the new course entitled “Microsoft Azure Administration New Features”. This course is to be released hopefully sometime this week (week of May 19th).

Over the eight-week mini-series we will cover new features from Azure Q4 2013 leading up to the BUILD release focused on advanced topics for the administration of Microsoft Azure. Topics are primarily focused around Azure IaaS but overlap at times into the Azure PaaS space. From a backup standpoint Azure offers Recovery Services to prevent data loss via Azure Hyper-V Recovery Manager and Recovery Services. Azure scheduler gives you the ability to schedule jobs that can run internal and external to Azure. Traffic Manager supports both PaaS configurations based upon round robin, failover, and performance metrics. Management Services gives you the ability to set up billing alerts, as well as service alerts for services, storage, and web sites and run queries on the results. Virtual Machine Auto-Scaling has been improved to now give you configurable options for both PaaS Cloud Services, Web sites, and IaaS VMs. Azure BizTalk Services integrates the enterprise and the Azure Cloud by configuring B2B messaging. HDInsight is Azure’s solution to bring big data processing to the Azure Cloud. Improvements to Azure Storage include read-only access to geo-replicated storage, improved storage analytics, and allowing the transfer of large amounts of data via physical drive using the Azure Import and Export Service. For Web sites we look at the recent enhancements of AlwaysOn Support, Web Jobs, and Staged Publishing.