Latest Entries »

With the General Availability (GA) release of Windows Azure Infrastructure as a Service (IaaS) a very powerful Identity Access Management (IAM) architecture is now available.   It brings a simplification to the authentication process for users of Office 365 whose companies utilize an Active Directory deployment for their organizational IAM process.

By leveraging the total cost of ownership advantages of Windows Azure Virtual Machines, or “VMs” (as part of Azure’s IaaS offering) companies can now host their Active Directory deployment in the Cloud in an inexpensive, scalable, and highly-available manner.   This includes their Active Directory Domain Controller (ADDC), and one or more IaaS VM servers running Active Directory Federation Services (ADFS) and ADFS Proxy Services (ADPS).  With the exception of the ADDC, the ADFS and ADPS servers can all be placed into a server farm.  Windows Azure provides “free of cost and hassle” dynamic round-robin load balanced functionality for its VMs.  Additionally through the concept of Azure IaaS availability sets, automatic failover is provided for VMs sharing a common endpoint. This provides not only a scalable architecture but a highly-available one as well.

For unified integration of AD domain accounts with Office 365 the AD deployment utilizes the Directory Synchronization (DirSync) tool.  This is run on a regularly scheduled basis to synchronize your email-enabled user objects from within your ADDC with managed accounts in Office 365. This maps licensed functionality for users to application capabilities within Office 365.

Through seamless integration with Office 365, the Active Directory deployment can subsequently federate identity management for Office 365 users.  Via the power of identity federation users within your organization log into Office 365 using their organizations credentials. They don’t have to create and maintain as separate set of credentials unique to their Office 365 account.   This offers a very powerful single sign on (SSO) experience for your users using a sole consistent identity principal.

In the following diagram you can see how at a high level the identity flows through the federation process.

AzureOffice365

  1. User goes to Office 365 and enters their email address which contains a domain.  Since you have already configured Office 365 to federate identity for that domain Office 365 redirects the user to the ADPS login screen for that domain.
  2. ADPS accepts the users login ID and password.
  3. ADPS lives within a DMZ and communicates with ADFS through a secure HTTPS channel. ADFS communicates with ADDC to authenticate the user.
  4. The claims-based SAML authentication token is sent back to Office 365 and the user is authenticated.

At that point Office 365 looks at its internal account for that user’s managed account to see what applications that user is licensed to use> Based upon that Office 365 makes application access permissions decisions. The user is allowed to use only those Office 365 applications to which they are licensed.

Alternative architectures are possible to take into account an existing ADDC on-premise.   By configuring an Azure IaaS Virtual Network (VNET) and connecting it through a VPN device to an on-premise network, an existing ADDC can remain on-premise while all the other parts of the AD deployment (ADFS, ADPS, and DirSync) can run in Azure IaaS VMs.

The power of identity federation exists through Office 365 and Active Directory integration.  It is now available in a scalable, inexpensive, and highly-available manner through Windows Azure IaaS.

I just published on the Aditi site a new whitepaper on the specifics of how upgrades work in Azure.   Here’s the introl.  Unfortunately if you want the entire paper you go to the Aditi site and fill and fill have to request it to be mailed to you next day.  Sorry about that…

http://blog.aditi.com/cloud/how-to-manage-deployments-upgrades-on-windows-azure/

==================================================

Managing deployments and updates to Windows Azure applications is a very similar process as to how it is done for on-premise applications.  You can upgrade your application on every server all at once.

Windows Azure offers the concept of a “VIP-swap” where you can deploy or update your Azure application in one shot. Or you can do it incrementally with custom-defined granularity across groups of Azure servers.

Azure provides the concept Update Domains to help you manage this process of updating your application in a sequential manner.  The option you choose is based upon your business and application requirements for system availability to users as well as your tolerance for having more than one version of your application available to users during the upgrade process.

A common thread of discussion facing IT Ops personnel new to Windows Azure are:

  • How to manage application changes with respect to deployment, updates, and down times?
  • How does one update a service when only some parts of its configuration need to change?
  • What about down time and do I need to take my whole application down to update just a part of it?
  • Can I update it incrementally over the range of servers?
  • Do I deploy an application directly into user circulation within an environment or can I verify base functionality before making it public?

Answers to this question are complex even if you are managing all your servers yourself on-premise. On the Azure Cloud, it can get even more confusing. The deployment issues revolve around a few key Azure concepts – VIP swaps, deletion/redeploy of a service, service updates, and update domains.

Click on the link below to download this whitepaper for an in-depth guide on how to manage application changes with respect to deployment, updates, and down times on Windows Azure.

http://blog.aditi.com/cloud/how-to-manage-deployments-upgrades-on-windows-azure/

Microsoft announced its purchase of the startup MetricsHub which specializes in software to monitor and scale Azure Wed and Worker role instances.   For more information on the acquisition refer to here.

Not sure why it took MS this deep into the Azure lifecycle to obtain the functionality of auto-scaling. It’s the single biggest question IT Ops folks tend to ask when viewing the “Instances” tab in the portal.  At the SDR last year they said auto-scaling was coming very soon so not sure why they would wait this long to finally obtain that functionality.   Purchasing MetricsHub is an interesting but late acquisition for Microsoft. Better late than never I guess…

In this post I will share my initial experience and observations of the MetricsHub tool. I will then compare it to one I use regularly – AzureWatch from Paraleap. Doing the comparison helps to show points of improvement for MetricsHub.

Evaluation of MetricsHub

The MetricsHub site does not give a lot of technical info. It’s high-level and written for marketing folks and even the blogs do not go into detail on how this works and what does and it appears that the functionality of the tool is solely about adjusting scaling based upon rules.  Like Azure Watch it appears to have the ability to monitor endpoints or queue length. The latter is a key factor is loosely-coupled environments for correct scaling.    It also requires an agent (not a favorite of customers) to get metrics on Azure VMs and Websites.  Just to be clear there is no auto-scaling – just metrics – for VMs and Wed sites.

Without the agent the service is 100% free which is very nice. Using normal Azure diagnostics it’s about 50 cent per month charge.  Not sure how they arrived at this number since it uses YOUR Azure storage evidently for WAD tables/blobs.  By contrast, Azure Watch uses THEIR azure storage and you pay them .01 per hour per service to be monitored.  If you turn up their monitoring levels vs. light monitoring levels YOUR storage costs can grow faster and not sure where the .50 cent figure comes from since it appears Azure will be charging up vs. MetricsHub (the cost to log to WAD tables in YOUR storage).  I was very unclear on that.  The .50 cent/month quoted rate to me is a bit misleading. If you are adding approximately .50 per month storage per month, and do not have a policy to truncate after 30 days, the second month your storage bill would be 1.00, 3rd month 1.50, etc.

I like the fact that is has default recommended rules for setup which can be tweaked later. That is a very nice feature since after you install the tool you then have to ‘guess’, based upon experience, what counters to monitor, what their values should be, etc.  For the average Azure customer this is nice.  They also provide alerts in three forms (pager, email, SMS) and give you a detailed breakdown of your monthly bill costs.

When you click signup it takes you to the Azure store.  The only option I was given was to sign up for the “No-Agent” option which is free, select the correct subscription and it’s done. Very simple.  Under contact setting it uses the default login email you use for your Azure portal which you can change but have to first check the box to be emailed promotions to enable the email field. Make the change, and uncheck the promotions box if you want.  The Upgrade button just gives you the original option of free (no agent) so still not sure how to sign up for the Agent option.  The Connection Info button brings up a few “secret” values but not sure where those are used or what I do with them. Obviously used for security purposes but they are not related to my storage key where MetricsHub will store the diagnostics data.

The Manage button is where all the magic happens.  You download the publish settings file first, then upload it to MetricsHub.   All my VMs and Cloud Services are displayed to choose which to monitor.  The VMs are only available using the Agent and still no info on how to sign up for it – these options are not available for me at this point.  Click “start now” to begin the monitoring.

From there you can click on specific roles to view data. You can use the default display or create a custom table or chart.  The default view contains generic CPU, Memory, Disk Read, Disk Write, NetIn, and NetOut values.  The custom table or chart option allows you to only select a subset of the Perfmon counters you have enabled for your application ahead of time.

MetricsHub has a link to Display Issues.  That worked nice when I clicked it as my test app was over 80% CPU and it gave me a message on this. However I did not receive an alert nor did it show up in bold letters on the main page telling me I have an issue. Unless I clicked on the Issues button I did not know about it.

AzureWatch and MetricsHub Comparison

One of the tools I use to monitor Azure applications is the AzureWatch tool. I like it because it is inexpensive, easy to use, has an incredibly powerful rules engine,  and runs from the Web so no setup is needed on your desktop.  Within AzureWatch you aggregate metrics (such as Perfmon counters). Then you write rules against those metrics and define actions to take when those Boolean rules evaluate to true.  Those actions include scaling up or down the number of compute instances, notification options, etc.

So being an AzureWatch OM user I was curious in how MetricsHub matched up with it. Here ar esome of comparisons and observations I made.

Performance Counters

For MetricsHub I did not see an option to add counters beyond what your code or scripts has already told Diagnostics Monitor to track. To me this is a big negative. When I use AzureWatch it allows me to add counters on the fly….this is HUGE for an ITOps person especially during troubleshooting.  To the average customer doing monitoring the ability to add additional counters dynamically is a big feature that’s missing from MetricsHub.

Custom Rules

AzureWatch also allows me to write custom rules and complex logic against those rules. I did not see the option to do that with MetricsHub.

User Interface

The MetricsHub UI is a richer and more user friendly than AzureWatch. However once someone is used to the AzureWatch UI this difference is minimal.

Coverage

In addition to auto-scaling Web/Worker Roles AzureWatch also supports active monitoring of SQL Azure, SQL Federations, Service Bus, Azure Storage and URL’s (active monitoring means they query these resources every minute rather than read some statistical tables). In contrast, MetricsHub only supports Web/Worker Roles without an agent, and Websites/VM’s with an agent running on these machines

Scaling Engine

This is a significant different.  AzureWatch supports scaling to happen via unlimited number of potentially complex rules that can evaluate any amount of metrics of all kinds.  It can auto-scale based upon any kinds of standard or custom performance counters, combinations their-of, service-bus queue/topic levels, storage queues, etc.  In addition, customers can even define their own custom feeds of metrics that can be imported and used for auto-scaling or alerts.  In contrast, MetricsHub appears to only auto-scale based on CPU speed and Storage queues.  AzureWatch can also execute scaling or alerting actions based on a schedule.

Something I was not able to find in Metrics Hub is how to manage how quickly or slowly the scaling occurs. Also there was no way to allow customers truly save money by scaling down at the end of the clock hour.  AzureWatch supports both of these options. This prevents a sudden short spike from causing new instances to be improperly allocated for a non-existent true false surge in load.

Performance Extensions

AzureWatch also has ability to show performance data on mobile devices and RSS feeds and export performance data via PDF/XLS/Word. I didn’t find any such option for MetricsHub.

And the Winner Is…

From what I observed the choice between AzureWatch and MetricsHub lies in the choice between flash and looks vs. power and flexibility. In general, a company whose whole sole business is to provide scaling and monitoring is going to do a better job than a company (Microsoft) who’s simply trying to fill up an array of tools to its customers and check the checkmark to say that it does have monitoring package.  If I was a company considering picking one of the two products, support for monitoring a full variety of Azure middleware and sophisticated scaling engine capable of measuring any amount of indicators with ability to get fancy with complex Boolean logic and order-based rules – that’s what I’d care about.

Overall nothing blows me away with this MetricsHub and there is a lot of functionality it does not provide that AzureWatch give me.  And the winner is…..AzureWatch!

You can install AzureWatch for a free 14-day trial here.

This is the last in a  three-part series on multi-tenancy within Windows Azure applications. Here is a breakdown of the topics for these posts.

Post 1 -  Tenants and Instances

Post 2 – Combinations of Instances and Tenancy

Part 3 – Tenant Strategy and Business Application

Multi-Tenant Strategies

Azure’s approach to scalability is to scale out as the load on an application dictates.  But what parts exactly do you want to scale out? You have different options based upon application specific requirements. You don’t have to use the same instance scaling approach for each layer of your application.

For instance, you can have multiple Web roles which write messages into a Windows Azure Service Bus queue that is managed by a single 1st-level Worker role. The Worker role can process the requests and send them to multiple 2nd level Worker roles. These roles then process the requests and write them to a single SQL Azure database. Or the 2nd level Worker roles can write to multiple SQL Azure databases. Or there could be multiple 1st level roles which read the queue and call a single level 2nd level Worker role which then writes to multiple SQL Azure databases.  The combinations are many but you have to carefully make sure they correctly support your application requirements. Here are just a few of the many tenancy combinations possible in your Azure application.

  • A multi-tenant UI (Web role) which calls out to a multi-tenant service (Worker role) which links to a single tenant data storage layer.
  • A single-tenant UI (Web role) calling into a multi-tenant service (Worker role) calling into a single-tenant data layer.

There are many reasons to group the application nodes in different combinations.  You have the flexibility within your architecture to choose which services can be shared and how much they are shared.  In an SIMT or a MIMT application you can logically group customers in your application domain into dedicated instances that support those common usage patterns.

Suppose you sell a multi-purpose application that is used by businesses in different industries.  Based upon usage patterns you could group customers by vertical markets since they tend to use similar transient state and work with similar schemas for persistent data.  For instance, you could put all the medical companies into one SIMT app and the all the restaurants that use your app into another SIMT app.  Or based upon different SLA requirements you can group customers with different SLA levels for availability in different SIMT applications.

Based upon customer security requirements there are different ways to separate or share data.  Multiple tenants can use different databases or schemas to have isolated storage. Coming up one level of isolation tenants can share the same database but have their data stored in different rows in the same table.  Or isolation may not be needed and data is shared at the field level.  Tenants can have different schemas or custom columns or table level permissions based upon requirements.  But however data is configured the multi-tenant application must protect each customer’s data from being visible or accessed by other customers as per application requirements.

State can be isolated or shared as the application dictates just like the database. All tenants in a multi-tenant application can share state, but most likely will have their own state. State can be stored across all customers in one instance or across multiple customers in multiple instances.

Again it all depends upon the application logic and data/business requirements.

In some situations within the services layer or the Web UI layer you may require more than one Worker or Web role respectively.  Dividing the work up by different tasks and assigning it to these roles can be done in many ways.

Case 1:

Caller1

Case 2:

Caller2

Case 1 – You can assign each role a specific task that the other roles do not have.   This means you will need to have at least one instance for each type of role.  For instance, you can have a services layer Worker role A that does number-crunching and another Worker role B that does caching of data.   This requires typically more Worker role instances to support the application’s scalability requirements than if both the number crunching and caching was all done in a combined instance.  If you had only one Worker role service that did all the number crunching and all the dependent Web roles required that service regularly so it was in high contention, you might need to scale out with multiple number crunching instances.  The good news is the code to support only one function tends to be simpler than if the role had to handle multiple tasks.

Case 2 – Alternatively you can have multiple roles that all do the same group of tasks.  For instance Worker role A does number crunching and caching, and Worker role B also does the same number crunching and caching.  This is more efficient typically since you don’t need to host as many nodes. However the code and logic is more complex to manage similar operations across multiple instances.  Your code will need to differentiate when it is permissible to carry out the same task in multiple roles at once.   There may be times when that task can only be done in one of the roles at any given time and you will need to implement a synchronization mechanism. If using Azure storage you can require a lease on that storage entity (i.e. blob storage) to do any work. Once that lease is required no other instances can access that resource.

Business Models and Tenancy

Let’s look at perhaps the most important driving factor of all behind tenancy – how will you charge customers of your application so you make $? Will you charge customers monthly based upon the actual resources they use, a percentage of resource usage for an instance shared among other tenants, or a flat fee?  Can customers of your application run concurrently in a shared instance? Or do they require their own dedicated instance? Can they share application code but not the same database?  Or can they share the same database tables but co-reside in adjacent rows? And so on.

There are many ways your architecture can evolve out of the answers to those business questions. At a high level you can use these answers as a basis to decide which tenancy model is correct for you.  At a more granular level you can decide upon different topologies from these architectures that further support your business model. Here are a few business points to consider when making tenancy decisions.

  • Choosing to sell to a large or small customer base?  Or both?
  • Strict regulatory data storage requirements?  Or data that can be stored anywhere and viewed by anyone?
  • Different performance, availability, and scalability requirements
  • A variety of different customer subscription pricing options

Depending upon the answers to these questions here are a few typical billing options you can use for your application.

Fixed Fee

You could agree up front to charge customers a fixed-fee each month regardless of the variable costs they incur.  This is like your standard cable TV model where whether you want ESPN 30 minutes a month or 10 hours per day your bill is still the same each month.

Actual Usage

For single-tenant applications it’s simpler to measure costs per customer since all the costs incurred belong to that specific customer. You could create a separate Azure account for each customer to simplify this process.  This is like your electric bill where you pay for what you use each month.

Variable Shared Costs

Shared of costs infers multi-tenancy.  You can do it at a granular level where you try to specifically charge each customer for what they use.  Thus in an app with 200 tenants you would have all sorts of various monthly bills for each customer, all summing up at least to the amount of the total costs.  Each fall season my neighbors and I used to pay approximately $100 each to rent individual pull-behind aerators to attach to our lawn mowers to aerate and seed our lawn.

We finally got smart and instead of us all paying $100 we all went in on one aerator and shared it across the weekend. If three of us went in together it costs us each $33 apiece. If only two of us shared the cost is was $50 apiece. Each year we paid a variable fee based upon how many of us shared the cost – a shared cost but at a variable amount.

Fixed Shared Costs

Customers could share the costs using a simpler model by taking the total Azure costs of that instance serving that group of tenants and divide it evenly.   To expand upon my shared aerator example suppose three of us went and purchased a deluxe whiz-bang self-propelled aerator for $2400 that not only aerated but seeded at the same time.  We went on a two-year interest-free payment plan and agreed to share the monthly payment of $100 per month for two years – a shared cost but at a fixed amount.

Whatever model you choose you will probably want to build in some profit % to charge customers once the actual Azure costs have been paid.  So it is absolutely critical that you do your homework and establish solid estimates of expected usage costs and profit points before you decide upon your billing strategy. This is especially true if you are using a fixed fee approach or you will end up eating the overage costs yourself. Note that regardless of how much actual CPU time a customer uses the compute cost is a fixed cost per month.  Other charges like storage, SQL Azure, bandwidth, etc. are variable costs and are dependent upon how much the customer uses the Azure infrastructure.

Regardless of the billing model try to maximize the # of tenants in the instance without a degradation of performance. You should consider the price of required resources against the customer’s need for isolation.  The more tenants can share resources the more correspondingly the cost will be minimized for all the tenants and make your application more attractive to a larger group of customers.  You could even have two different deployments of your application.  The more expensive deployment could be dedicated per each high-paying customer who needs isolation and higher performance.  The other deployment could be shared among the other customers who don’t need isolation or the very best performance but want lower pricing.

Provisioning

Provisioning customers for a multi-tenant application is typically a bit more involved than for a single-tenant application.  With a single-tenant application it is probably nothing more than a configuration update. But for a multi-instance application a new Azure instance will need to be configured as each new tenant is added.

A part of provisioning has to do with customizing the UI of the application. For a single-tenant application each customer will have their own instance running a customized version of the UI for that customer.  You can map a custom DNS name (using DNS CNAMEs) to each customer’s instance of the application.  So for Company1 and Company2, you might expose the URLs of http://myAccount.cloudapp.net/Customer1.com and http://myAccount/cloudapp.net/Customer2.com.  This approach works fine for the HTTP protocol.  But for secure HTTPS protocol a problem occurs in this strategy. When using HTTPS only a single SSL certificate can be associated with the standard HTTPS port 443.   To remedy this you can have different Web sites within the same Web instance. This can be done by adding port numbers onto the URIs. For instance, you could have the following addresses for four of the tenants within the same Web instance.

http://<myAccount&gt;.cloudapp.net:80

http://<myAccount&gt;.cloudapp.net:81

http://<myAccount&gt;.cloudapp.net:82

http://<myAccount&gt;.cloudapp.net:83

You can also use a custom addressing scheme with the same core part of the URL but just change other parts of it per tenant.  For instance, depending on the configuration of your site and app you could have something like these two URIS for Company1 and Company2.

https://<myAccount&gt;.cloudapp.net/AppName/Company1 https://<myAccount&gt;.cloudapp.net/AppName/Company2

There are other ways to provision and divide functionality among tenants to allow customization of their UI and processing.  You will have to decide just how much liberty to give customers to customize their applications.  It could be a simple change to a part of a page or using cascading style sheets. Or you can allow them to customize entire pages within their namespace.

You will need to ensure that technically a customer’s data is safe within both Azure storage and SQL Azure within a multi-tenant application. More importantly, you will also need to support the perception that the customer’s data is indeed safe in the shared tenant environment.

We mentioned earlier how the cost of using the compute instance is minimized with more customers.  The same applies to SQL Azure and the number of databases used.   For customers that need complete guaranteed data isolation it would make sense to allocate one database for each of them.  Others customers may more economically be able to share rows in the same database table.

 Summary

In summation, tenancy in Azure applications is largely dependent upon your customers business and application requirements.  You should carefully examine your business and data storage requirements and weigh out the pros and cons of running in a shared vs. isolated environment.

A poor multi-tenant architecture can make the experience of using your application very frustrating for customers.  It can also more importantly result in corrupted data.   Conversely covering your eyes and not doing your research by simply relying upon simplified single-tenant architecture you can make the cost of using your application cost prohibitive to certain customers.  Take your time and ensure you get the correct balance of tenancy and instances to allow your application to take full advantage of the Windows Azure platform.

A best practice for SQL Server is to store the data/log/backup files on a disk other than the OS Disk. This transfers over to SQL Server installed on an Azure IaaS VM.  Due to storage requirements it may be necessary to have more than one physical disk compose a logical disk for these log files.  Another best practice to increase IOPS (Input Output Operations) on SQL Server running on an Azure VM is to store each of these physical disks in their own Azure blob storage account.

However, the Azure Management portal will not give us the ability during VHD provisioning to select a preferred storage account.  By default the VHD will be created at the storage account where our OSDisk is created.  Therefore you must take the following steps to attach the disks one-by-one to the VM. Once that is done you create one spanned disk for data, and one for logs/backup, using the Computer Management Admin console.  Then within SQL Server Management Studio you replace the default C:\xxx locations for SQL Server data, logs, and backup files with the spanned disk volumes.

To create and attach the data disks to the VM you must do the following:

  1. From the Azure Portal for the VM select “Attach an empty disk” and create a new VHD. By default it will be created at the default OSDisk Azure storage location X for the VM
  2. Create a new Azure destination storage account Y. You will create one account for each VHD as best practice
  3. With ClumsyLeaf Cloud Explorer expand both the destination storage account X (left hand pane) and the source storage container with the blob containing the new VHD (right hand pane) from step 1.
  4. Drag/copy the VHD blob from original storage location X (right hand pane) to the new destination storage location Y (left hand pane)

    An alternative way to do the VHD copy is to use the Azure command line from your local machine defined in this Aditi blog entry http://blog.aditi.com/2012/11/how-to-copy-files-between-windows-azure.html. You will first need to download the NodeJS tool first from here to your local machine. A third copy option is to use the PowerShell command Add-AzureDisk with the –ImportFrom parameter.

  5. In the Azure Portal select Detach Disk for the specific VM
  6. Once the disk is detached go to the Azure Portal VM screen and click on Disks, the select Attach Created Disk
  7. In the Browse Cloud Storage dialog for VHD URL field locate the storage blob to which you copied the VHD to in step 4, and select Open and complete the dialog.
  8. Repeat the steps again for as many disks as you need to attach to that the VM running SQL Server.

This is the second in a  three-part series on multi-tenancy within Windows Azure applications. Here is a breakdown of the topics for these posts.

Post 1 -  Tenants and Instances

Post 2 – Combinations of Instances and Tenancy

Post 3 – Tenant Strategy and Business Application

Part 2 – Combinations of Instances and Tenancy

SIST (Single-Instance, Single-Tenant) Architecture

SIST is the simplest of designs since there is only one tenant with their own dedicated instance.  It is typically however the most expensive option since all costs are 100% the responsibility of the one tenant.  Cost savings via shared commuting infrastructure is thus a non-entity in the SIST case.

Applying the analogy from our previous commuting example means one rider has their own dedicated train getting to work.  Since the driver has their own train they can leave their own personal belongings in the car between days of driving and not worry about anyone having access to it. Any member of their family can use it, but only members of their family.

Applying it to your Windows Azure application would mean one customer of your application with their own dedicated Azure instance.  Application state within the process is dedicated to that one application and no other companies can have access to it.

Single-Instance Single-Tenant

Single-Instance Single-Tenant

SIMT (Single-Instance, Multi-Tenant) Architecture

SIMT adds complexity to the SIST model by allowing more than one tenant to concurrently execute within one instance.  This design is the first step in realizing cost savings from sharing resources found in that instance between customers.

The commuting example applied here is that of a single train where many riders from different companies share the cost of using that one train.  However unlike SIST, there are times of the day when the seats are hard to find and a rider may need to wait for one to come open.  There is overhead incurred to load and unload many riders from different companies on and off the train.  Since the train is a common shared entity if a rider leaves their personal belongings in the train between rides there is a very good chance it will not be there the next day. Or it will be accessed by someone from a rival company.

Applying this analogy to your SIMT Windows Azure application would mean at times a customer may have to share and wait for resources during busy times.  The overhead of context switching occurs since the CPU will switch between processing threads specific to that company. Memory and CPU can be taken up by other tenants in the VM process space. If state is used it must protected and managed so that each customer’s state makes it appear to them they are running in a dedicated SIST environment.

Single-Instance Multi-Tenant

Single-Instance Multi-Tenant

MIST (Multi-Instance, Single-Tenant) Architecture

Moving from single-instance to multi-instance the MIST configuration has multiple instances of the application with one tenant per dedicated instance.  This is really SIST but with more than one instance of the dedicated application per customer.

Applying our commuting example there is still only one customer per one dedicated train but there are now multiple trains for that customer.  Like SIST this does not translate into any shared costs savings for the rider.  But since there are now multiple trains, each with one customer, there is now the potential for more money to be made by the railroad company.  This is because it can now extend dedicated train service to many riders of that company.

Just like SIST, within an MIST Windows Azure application a customer should not have to share or wait for process resources.  Each customer of the application has a dedicated Azure instance space each with their own copy of the application. This could be an issue with respect to upgrades since each individual instance will need to have its application updated when an update occurs.  Application state within the process is dedicated to that one customer so no other customers can have access to it.

Since MIST is an upgrade over SIST there is the added complexity of having multiple applications possibly accessing the same shared resources outside of the dedicated instance space. That could mean accessing a shared Azure queue or table in SQL Azure.   So while you can make intra-process assumptions on state, data, and processing based upon one dedicated customer per process, you need to be aware of inter-process operations where customers need to process resources outside of their dedicated instance space.

Multi-Instance Single-Tenant

Multi-Instance Single-Tenant

MIMT (Multi-Instance, Multi-Tenant) Architecture

MIMT rounds out the multi-tenant configurations by allowing multiple tenants to run in multiple application instances.  It offers benefits with “potential” increased scalability and availability.  I use quotes around the word “potential” because contrary to the popular misconception bigger is not always better.   One cannot always assume that the MIMT architecture is the ultimate performance choice. While MIMT allows the best opportunity for optimized performance, too many cooks can spoil the broth.

The applied commuting analogy we are using in this case would mean multiple trains carrying multiple riders from multiple companies.  This has the same issues as a shared single train in the SIMT case.  Additionally there is the added complexity of different train tracks and thus different routes.  This can be a subtle point of confusion if not managed correctly and can negate the benefit of having multiple trains going to the same end point.

For example suppose a family wants to travel to a certain destination and coming from different locations after work – sister from college, brother from high school, mom from her work, and dad from his work location. If their goal is to meet for dinner at a certain stop along the route they need to coordinate the times they leave and arrive. This ensures they optimize their travel cost and time to arrive as close together as possible. It would not be ideal if dad arrived an hour before everyone and had to wait for them all to show up over staggered times. Or if brother took a train in the other direction he would go hungry.  Design and coordination of activities is important in making this a successful dinner trip.  If done right, it is more efficient if they all take different trains to the same endpoint (SIMT) and don’t have to all drive a car (SIST) to a single train station to ride together.  If not done optimally with different trains from different locations it can actually take longer than all driving the car to one train.

Applying this analogy to a MIMT Windows Azure application should be a fairly simple transition in your mind. Like an SIMT Windows Azure application may have to wait for resource, incur context switching overhead, and have to carefully manage individual state.  There is an additional complexity involved that can either improve on the SIMT situation or actually end up with worse performance than SIMT.  This is the ‘hard’ part of MIMT applications and is really application dependent in nature.

MIMT is an architecture that lends itself to use concurrency-based architecture patterns (Event Based Synchronous, Lock, Guarded Suspension, Concurrent Data Access, etc.). This allows you to coordinate and isolated work across multiple instances or across different threads in the same instance. Client 2 in the diagram below has two threads of execution in the third Azure Web instance and one thread in both the second and fourth Azure Web instances.  The code for Client 2 needs to manage that complex concurrency.

Mulit-Instance Multi-Tenant

Mulit-Instance Multi-Tenant

Sounds complex? Well that’s because it can be.  So here are some guidelines for coordinating multiple clients across your multi-tenant Azure application:

  • Synchronize access to all persistent and state data
  • Avoid long periods of locking or blocking while accessing resources
  • Avoid state as much as possible. Being stateless allows code to run properly on any thread on any server.

If you must have state, manage state correctly and in a central location that all threads can access regardless of what server they are running (Windows Server AppFabric Cache or a central database state server).

Minimize serialization and try to avoid deadlock situations

Carefully ensure your processing is optimized over not only multiple threads but over multiple servers.

Ensure none of your processing makes any assumptions about which Azure instance or thread any of your code running.

Don’t assume specific threads or instances will be accessing state or persistent data from across multiple Azure instances.

The above list is not all-inclusive yet it gives you some considerations of which to be aware. It may give you an appreciation of the potential complexity of ensuring your Azure MIMT app runs optimally across many VMs for many customers.  Your code can run on any thread on any VM at any given time.  Code must be written to be flexible and able to easily transition to any thread on any instance on any VM with the data still maintaining its integrity.  Remember my earlier statement that your MIMT app can be more efficient than an SIMT app if done correctly.  Or it can actually run slower if not designed correctly.

Various combinations of tenancy can exist throughout the spectrum of instances and tenants at the different application layers.  The requirements of your application will be the key factors in these tenancy decisions.  Application components and data should properly manage more than one tenant having access to them.

In the next and final post we will look in detail at tenacy strategy and business application.

This is the first of a  three-part series on multi-tenancy within Windows Azure applications. Here is a breakdown of the topics for this post as well as the final two posts.

Post 1 –  Tenants and Instances

Post 2 – Combinations of Instances and Tenancy

Post 3 – Tenant Strategy and Business Application

Part 1 – Tenants and Instances

One of the prime economic motivations to running your application in a Cloud environment is the ability to distribute the cost of shared resources among multiple customers.  At least that’s what all the evangelists and marketing folks like to tell us, right?  But the harsh reality of life in the Cloud is that an application’s ability to run safely and efficiently with multiple customers does not just magically ‘happen’ simply by deploying it to Azure.  An application’s architecture must be explicitly and carefully designed to support running with multiple customers across multiple (VM) virtual machines.  Its’ implementation must prevent customer-specific processing from negatively affecting processing for other tenants.

Here I will attempt to simplify the concept of tenancy in Windows Azure by first defining tenants and instances. There are deeper levels this discussion could be taken as entire books have been written on multi-tenancy in shared computing/storage environment (which is what the Cloud is after all). So we will only touch the tip of the iceberg when it comes to the science of instance allocations per tenant and multi-tenant data access.   We will look at various configurations of instances and tenancy and how to structure your application and data.  And finally we will wrap up with some strategies for multi-tenancy and how business models relate to tenancy.

Tenancy

Let’s first agree upon a definition for the term tenancy and how it applies to single- and multi-tenancy.  There are a few different definitions of this concept that can cloud (no pun intended) one’s understanding. Some define it as the relationship of clients to application instances.  Wikipedia defines it as “…a single instance of the software runs on a server, serving multiple client organizations (tenants)”. While I appreciate this definition for the sake of practicality I want to expand upon it if I may.   There are architectures where you can also have “more than one instance of the software running on a server”.   How you provision tenants to those multiple instances is your choice. You could have one tenant (customer or company) per instance and all the users from that company dedicated to that instance.  Or you could have more than one customer sharing that one instance but logically partitioned from each other’s data and processing. Each of those customers (or tenants) can have one or more users accessing the application.

So for the sake of this paper let’s agree that tenancy refers to a “customer/billing” relationship.   Suppose we have a SaaS Azure application that we sell to ten different companies (our “customers”).  Each company has 5,000 employees using our Cloud application.  Using our billing-relationship definition if we sell that service to 10 different companies we don’t have 5,000 tenants. Rather we have 10 tenants because of our 10 customer/billing relationships.  Later in this paper we will look at more about this type of relationship in the Business Model and Tenancy section. 

Tenant Types

There are two types of tenant environments we will need to consider. The simplest type is a single-tenant application where one customer has 100% dedicated access to an application’s process space.  A single tenant application is much more predictable and stable by its nature since there will never be more than one dedicated customer at any point in time in that VM.  That customer has all of its users accessing that dedicated instance of the application.

Contrast that with a multi-tenant environment where more than one customer shares the application’s process space and data.   Due to requirements of security and performance isolation, it’s more difficult to build a multi-tenant application.  You may have to plan for added complexity and development/test time to synchronize access to shared data and resources.   For a Windows Azure multi-tenant application similar complexity is required to ensure data and resources do not get corrupted with multiple tenants in the same or multiple process spaces.

Realize there are variations on this theme.  You could theoretically have only one user per instance of a VM.  So if a company had 2000 users it would need 2000 VMs. Not a very practical application of resources or costs.  More realistically, with our tenancy definition of customer/billing relationship, you would have 2000 users from that same company sharing the same VM instance. So if you have 20 companies you’d have 20 VMs.  Each VM would handle one or more users for that specific company only.  Getting more complex in tenancy we could have more than one company sharing an instance. So if you had 100 companies, you could provision 10 of them per instance and thus use 10 VMs. Each company would share a VM with 9 other companies, and you would have 10 VM instances of this. Confused? Hold that thought. We will break this down with picture and stick figures (okay… maybe no stick figures!) as we progress in this article.

Why Multi-Tenancy?

Why would one want to go through the trouble of making a Windows Azure application multi-tenant? Wouldn’t it just be so much easier to make an application single-tenant and not have to worry about complicated synchronization and a more complex billing model? From an infrastructure and development standpoint, it’s the difference in cost vs. simplicity.  Whether making the choice of the simplicity of driving your own private car to work or reducing your commuting cost with other riders using the public mass-transit system you have to decide what’s right for you.  For various reasons you may decide not to share your ride with others and pay more for your own custom commuting environment.  Maybe your schedule is unique, or you are scared of what could happen health-wise or crime-wise being sharing a commuting vehicle with strangers. Or you don’t want your commute time to be affected in any way by other riders and like to leave your briefcase in the car each night when you get home.  So you choose to pay more and have a dedicated way to work.

But for those commuters who can’t or won’t spend the increased cost of private commuting the choice of shared public transportation is their best option.  It offers a simple pay as you go model where the cost you pay is minimized since it is distributed across many customers. So there is no right or wrong answer – it depends upon the rider requirements.

Compare this to a software standpoint where there could be various requirements – government regulations or demanding SLAs – that you need to support for your customers.  It’s a tradeoff, right?  You want the benefits of operation isolation and simplified design found with a dedicated single-tenant application.  But it will cost you more since you aren’t sharing the cost of the resources over many customers.  So the choice when deciding to use a multi- or single-tenant environment depends upon your application requirements.

Instances and Tenancy

In the case of Windows Azure a new instance means running on a new VM instance isolated from the other instances.  These instances can communicate with each other typically through queues or WCF calls.  Let’s see how these terms combine to define the different types of applications running under Windows Azure.

There are four possible combinations of instances and tenancy.  The standard convention is “INSTANCE-TENANT” format when describing these combinations. These are listed in approximate complexity of architecture with SIST being the simplest.

  1. SIST (single-instance, single-tenant)
  2. SIMT (single-instance, multi-tenant)
  3. MIST (multi-instance, single-tenant)
  4. MIMT (multi-instance, multi-tenant)

A quirky way that helps me remember these in terms of Windows Azure is to equate the following:

“Instance” = “Azure virtual machine (VM) instance”

And

“Tenant” = “Customers sharing of the VM”

So in my personal Azure terms, I like to define the remaining combinations as follows. These are listed in order of increasingly complex development.

1. SIST (Single-Instance, Single-Tenant)

  • “Only one Azure VM dedicated to one company”

2.  SIMT (Single-Instance, Multi-Tenant)

  • “Only one Azure VM shared among multiple companies”

3.  MIST (Multi-Instance, Single-Tenant)

  • “Many Azure VMs each dedicated to one company”

4.  MIMT (Multi-Instance, Multi-Tenant)

  • “Many Azure VMs shared among multiple companies”

 In the next post we will look in detail at all four of these combinations of tenancy.

Learned something interesting today about how Azure handles federation for an Azure Web Role. I thought Web.config was no longer applicable for Azure Web applications now that  ServiceConfig.cscfg exists. I was wrong.

In the Azure Portal I configured an Access Control Service (ACS) namespace along with Identity Providers, Rules,  and Relying Party for my Azure Web application.  In Visual Studio for my Azure ASP.NET Web role I chose “Add STS Reference” to invoke the Federation Utility Wizard. Went through all the dialogs to successfully link my app to the ACS Namespace I had created.  Opened up the ServiceConfiguration.cscfg file to view the changes made by the wizard –  and there was none. Snooping around I opened up Web.config and found a number of entries written by the Federation Utility Wizard.  I thought the ServiceConfiguration.cscfg file was to replace Web.config for Azure-only applications. So how would these become visible to the ServiceConfiguration.cscfg  file? How would my Azure application make use of these settings in the Web.config file?

Here’s how it works.  Federation support in ServiceConfiguration.cscfg  file is not implemented as of yet. For now, Azure will use the entries in Web.config to manage federation of an application using ACS.  If you need to change values in Web.config relating to federation after the Azure Web app has been deployed without having to repackage and redeploy, here is a tip on how to do that.

Duplicate the settings found in Web.config in ServiceConfiguration.cscfg. (Note you also have to duplicate those in the ServiceDefinition.csdef for them to be able to be valid in the cscfg file).  In the OnStart method when your Azure Web role first loads have code to read the federation elements from the ServiceConfiguration.cscfg.  That code can then in turn write those values out to their matching elements in the Web.config file.  The Web.config file handles the actual federation not the  ServiceConfig.cscfg file. The role of the ServiceConfig.cscfg relating to Web.config in this case is to act as a conduit in case federation values need to be changed in the Web.config file of a deployed Azure Web application. This can be done by uploading an updated ServiceConfig.cscfg file in the Azure portal without redeploying the entire application.

Since the Windows Azure work of SCOM 2012 is not complete yet you cannot directly view custom performance counters in SCOM 2012. Rather you have to first use SCOM 2007 R2 Operations Manager and the Authoring Console to assist with this.  Here is information on how to do this.

DISCOVERY AND MONITORING FOR WINDOWS AZURE APPLICATIONS

Install the System Center Monitoring Pack for Windows Azure Applications from http://www.microsoft.com/en-us/download/details.aspx?id=11324.  Once you have installed the Azure Management pack on your SCOM server, you will have to follow those steps:

  1. In SCOM 2007 R2 create the appropriate accounts in Operation Manager to connect to your Azure environment for Azure applications discovery.
  2. Configure Performance Monitoring for Windows Azure applications.
  3. Export the pack into SCOM 2012 and view the results of the performance counters collection.

CONFIGURE ACCOUNTS FOR DISCOVERY STEPS

Step 1 – Create Run AS Accounts:

You will need three “Run As” Account in System Center Operation Manager 2012:

  • One for Binary Authentication. This account will use the Management Certificate to connect to Azure.
  • One for Basic Authentication. This account will be used for the Certificate and will store the password for the Certificate.
  • One that will be used for the proxy agent (Optional).

For detailed steps refer the following documents:

http://oakleafblog.blogspot.fr/2011/09/installing-systems-center-monitoring.html

http://blogs.technet.com/b/dcaro/archive/2012/05/03/how-to-monitor-your-windows-azure-application-with-system-center-2012-part-2.aspx

Step 2 – Configure Windows Azure Management Pack Template:

1. Click the Authoring button in the left pane, select the Authoring \ Management Pack Templates \ Windows Azure Application and click the Add Monitoring Wizard task in the right pane to open the Monitoring Wizard’s Select Monitoring Type dialog with Windows Azure Application selected.

2. Click Next to open the Name and Description dialog. Type a Name and Description for the service and click the new button to open the Create a new Management Pack dialog. Type a Name e.g. MY AZURE MP, which fills in the ID value, Version number, and Description.

3. Click Next to Open the Application Details dialog, type the service’s DNS prefix, e.g. MYDEVAPP1, copy the Subscription ID from the Developer portal and paste it in the text box, accept the default Production as the Environment to Monitor. Using the accounts you previously created, select the Binary Authentication account in the Azure Certificate Run As Account and the Basic Authentication account for the Azure Certificate Password Run As Account.

4. Click Next to open the Select Proxy Agent dialog and click Browse to open another Select Proxy Agent dialog. Click Search to list computers on your network and select an agent-managed computer to act as a proxy agent for the Windows Azure application, e.g. ITPRODC.

5. Click OK to close the dialog and click next, which displays a message. Click Yes to distribute the account to the selected Proxy Agent and open the Summary dialog.

6. Click Create to create the new Management Pack for the MYDEVAPP1 with Azure Applications Discovery.

7. Click the Monitoring button in the left pane, select the Monitoring \ Distributed Applications node to open the Distributed Applications list, and select the new MYDEVAPP1 hosted service monitor (Healthy state indicates monitoring is occurring):

8. Verify that Detail Views of the Deployment State, Hosted Service State, Role Instance State and Role State correspond to the known current state of the service.

CONFIGURE PERFORMANCE MONITORING FOR WINDOWS AZURE APPLICATIONS STEPS

  1. Create Performance Collection Rule using existing Performance Counters
  2. Create A Performance Rule using Custom Azure Counters
  3. Export the SCOM 2007 R2 Management Pack into SCOM 2012

Step 1 – Create Performance Collection Rule using Standard Performance Counters

In Operations Manager Console for 2007 R2 open Rules and select the following Targets:

  • Windows Azure Deployment
  • Windows Azure Hosted Service
  • Windows Azure Role
  • Windows Azure Role Instance
  1. Open Create Rule Wizard and select Collection Rules Node.
  2. Under the Performance Collection Node select the Windows Performance Node and select the Custom Management Pack (i.e. My Azure MP in our case).
  3. Provide the rule name e.g. “ASP.NET Applications Requests/sec (Custom)”.
  4. Select Performance Collection as the rule category.
  5. Select the Target as Windows Azure Role Instance (This will collect data for all role instances across hosted services you are monitoring).
  6. To collect data for instances within a particular hosted service, select the Windows Azure Role Instance (i.e. MYDEVAPP1 in our case).
  7. On the next Page select Performance Object, Counter, and Instance as:

Object Name:     ASP.NET Applications

Counter Name:  Requests/Sec

Instance Name: __Total__

All Instances: True or False

  1. Optimize Performance Collection Settings as default which means it will not use optimization.
  2. Save the performance collection rules.

Step 2 – Import the Management Pack into Authoring Console

  1. Open Authoring Console for Operations Manager 2007 R2.
  2. Connect to Management Group (i.e. “DEVMANAGEMENTGROUP”).
  3. Click on Tools and select Import Management Pack.
  4. Select the Custom MP created earlier (i.e. “My Azure MP”).  The Management Pack should successfully load in the Authoring Console.
  5. Open the Health Model Tab and click on Rules.
  6. You should see the same performance collection rule (Check display name to display correct rule names).
  7. Open the Rule Properties and click on Modules tab.
  8. Under Data Sources delete the existing Data Source values.
  9. Click on Create and select “Microsoft.SystemCenter.Azure.RoleInstance.PerformanceCounter.CollectData.DS” type entry.
  10. Give a custom name of ModuleID (i.e.  AzureDS).
  11. Edit the Data Source just created to adjust values for Counters and InstanceName.
  12. Provide the following values in Data Source module:

<Configuration>

<IntervalSeconds>300</IntervalSeconds>

<TimeoutSeconds>120</TimeoutSeconds>

<CounterName>Requests/Sec</CounterName>

<ObjectName>ASP.NET Applications</ObjectName>

<InstanceName>__Total__</InstanceName>

<AllInstances>false</AllInstances>

</Configuration>

13. Save those values and save the Management Pack in SCOM 2007 R2 Authoring Console.

14. Export the Management Pack back to in SCOM 2007 R2 Operations Manager Console.

15. Make sure you can see the Performance Data in the Console by creating a new Performance View.

Step 3 – Create Performance Collection Rule using Custom Azure Counters

  1. Create another Performance Collection Rule for Azure in the same manner mentioned above for standard counters but this time use a custom counter.
  2. Ensure the values under the Data Source Modules should be as follow:

Object Name:     CustomCategory

Counter Name:  TotalnumberofFileUpload

Instance Name:

All Instances: False

OR

<Configuration>

<IntervalSeconds>300</IntervalSeconds>

<TimeoutSeconds>120</TimeoutSeconds>

<CounterName>TotalnumberofFileUpload</CounterName>

<ObjectName>CustomCategory</ObjectName>

<InstanceName>

<AllInstances>false</AllInstances>

</Configuration>

(This means the InstanceName should be empty and AllInstances set to false).

NOTE – You need to follow all these steps in Operations Manager 2007 R2 set up or Management Group.

  1. Save the Management Pack and export it to Operations Manager 2007 R2 Management Group.
  2. Create a new Performance Collection View under Monitoring to see the Data related to Custom Counters.
  3. Once all the Rules are created, and when you can see the Data under Performance Collection View for Custom Counters save the Management Pack as a XML file using the Authoring Console.

EXPORT THE SCOM 2007 R2 MANAGEMENT PACK INTO SCOM 2012 STEPS

  1. Copy the saved Management Pack file to Operations Manager 2012 Management Server. Import the Management Pack using Operations Manager 2012 Console.
  2. Open Authoring Console on the same Management Server and select “import from Management Group” and select the Custom Azure MP in the list.
  3. Now open the Health Model and click on Rules. Select the Custom Counter Performance Collection Rule.
  4. Check the Data Sources in the Module and confirm that the correct Modules for Azure, Operations Manager and Operations Manager DW Database are selected.
  5. Change the Data Sources if required and save the Management Pack in the Console.
  6. Now export the Management Pack to Operations Manager 2012 Management Group using Authoring Console.
  7. Ensure to change the Azure Run As Accounts (if different in Operations Manager 2007 R2) used in Windows Azure Management Pack Template for Azure Discovery process using Operations Manager 2012.
  8. Also change the Proxy Agent in Azure MP Template to new Operations Manager 2012 Management Server.
  9. Save all the settings made in the Azure MP Template in same custom MP.
  10. Restart the Health Service on the Management Server and make sure all Azure Role Instances, Hosted Services and Deployment are successfully discovered.
  11. You should also see the same Performance Collection View in the Console as in Operations Manager 2007 R2 Console.
  12. Now you should see the same Performance Data for Custom Counters in the View.
  13. Similarly you can create more Rules for different Custom Performance Counters.

Important: In case you get the Event ID 34024, make sure to apply an Override and change the Diagnostic Connection String used in Windows Azure with your Custom Connection String.

ALL THIS IS POSSIBLE IF YOU CAN SEE THE CUSTOM PERFORMANCE COUNTERS ON THE AZURE VM OR SERVER WHERE THE AZURE ROLE INSTANCES ARE RUNNING.

While putting a slide deck together for some customers I pulled info for a number of sources, added some of my own info, and created a good flow. I decided to then take the deck and the supporting notes pages I had used and put it into a blog post. Rather long, and it duplicates info found in other documents (see links at the end).  I like the way the deck blended it all together so thought you might  want to see it in text form and simplfied down to key points.  Enjoy!

——————-

One of the common fears I hear from companies who are hesitant to move to the Cloud is the terror of their data not being 100% under their control living within the walls of their own private data center servers.   I can understand those feelings.   And a few of these fears may be well-founded in some situations due to regulatory requirements. If legally you are required to keep your data on premise, stored in a certain format, keep it from being transferred through certain geographic areas, or stored away from certain other data (for instance, can’t store data for different companies in the same database) then storing your data in the Cloud is probably not for you.

However, that does not mean the Cloud as a whole is not for you.  With the advent of Virtual Networks with the new IaaS feature of Windows Azure, and the technology of Azure Connect, you may be able to run your application in the Cloud keeping your data on-premise. Or you can move your non-critical data to the Cloud and leave the important data on-premise.

Microsoft has invested a significant amount of planning and technologies into securing a customer’s data and access within Azure. Here I will try my best to answer some common discussions and concerns that customers often ask about Azure, such as:

  • How can I ensure my data remains private?
  • Can Azure prevent non-authorized calls into my code?
  • Do I need to encrypt my data for Azure?
  • What about multi-tenant operations and protecting shared data?
  • Are calls between Azure and customer, and internal to Azure, secure?
  • How is my data backed up?
  • When data is deleted logically can I ensure it is also done physically?
  • What about regulatory data requirements?

Access Security

Access control for Hosted Services and Storage Accounts is governed by the subscription. The ability to authenticate with the Live ID associated with the subscription grants full control to all of the Hosted Services and Storage Accounts within that subscription.  Administrators can create Co-Administrators who then have access to all the services in that subscription.

Customers upload developed applications and manage their Hosted Services and Storage Accounts through the Windows Azure Portal web site or programmatically through the Service Management API (SMAPI). Customers access the Windows Azure Portal through a web browser or access SMAPI through standalone command line tools, either programmatically or using Visual Studio.

SMAPI authentication is based on a user-generated public/private key pair and self-signed certificate registered through the Windows Azure Portal. The certificate is then used to authenticate subsequent access to SMAPI. SMAPI queues requests to the Windows Azure Fabric which then provisions, initializes, and manages the required application. Customers can monitor and manage their applications via the Portal or programmatically through SMAPI using the same authentication mechanism.

Access to Windows Azure storage is governed by a storage account key (SAK) that is associated with each Storage Account. A more sophisticated access control model can be achieved by creating a custom application “front end” to the storage, giving the application the storage key, and letting the application authenticate remote users and even authorize individual storage requests.

Window Azure VM Protection

Windows Azure creates a virtual machine (VM) for each role instance, then runs the role in one of those VMs. These VMs in turn run on a hypervisor that’s specifically designed for use in the Cloud (the Windows Azure Hypervisor).  One VM is special: it runs a hardened operating system called the root OS that hosts a fabric agent (FA). FAs are used in turn to manage guest agents (GA) within guest operating systems on customer VMs. FAs also manage storage nodes. The collection of Windows Azure hypervisor, root OS/FA, and customer VMs/GAs comprises a compute node.

After a compute node is booted, it starts the fabric agent (FA) and awaits connections and commands from the fabric controller (FC). The FC connects to the newly booted node using SSL, authenticating bi-directionally via SSL as described previously. FC communication with FAs is via one-way push, making it difficult to attack those higher in the chain of command because they cannot make direct requests to those components. Combined with the many mechanisms described above, these features help maintain the Fabric in a safe and clean state for customers.

All of these layers of VM management and the fact it is hardened makes it much very tough for anyone to gain unauthorized access.

Confidentiality of Windows Azure Customer Data

Confidentiality ensures that a customer’s data is only accessible by authorized entities. Windows Azure provides confidentiality via the following mechanisms. I will discuss each of them in subsequent sections.

  1. Identity and Access Management – Ensures that only properly authenticated entities are allowed access.
  2. Isolation – Minimizes interaction with data by keeping appropriate containers logically or physically separate.
  3. Encryption – Used internally within Windows Azure for protecting control channels and is provided optionally for customers who need rigorous data protection capabilities.
  4. Deletion of extraneous customer data – Removed and cleaned up when no longer needed
  5. Integrity of customer data – Fabric VM and tightly controlled access to virtual hard drive where the only way to the data is through Fabric
  6. Service Operations –  Data center personnel and processes along with enhanced network Security

 

1.    Azure Identity Access Management

Credential and key management are critical components of the security design and implementation of Windows Azure. Azure uses them to ensure only authenticated users get access to Azure resources.

Running applications with “least privilege” is widely regarded as an information security best practice. To align with the principle of least privilege, customers are not granted administrative access to their VMs, and customer software in Windows Azure is restricted to running under a low-privilege account by default (in future versions, customers may select different privilege models at their option). This reduces the potential impact and increases the necessary sophistication of any attack, requiring privilege elevation in addition to other exploits. It also protects the customer’s service from attack by its own end users.

All communications between Windows Azure internal components are protected with SSL. In most cases, the SSL certificates are self-signed.

2.    Isolation of Key Components

A critical boundary is the isolation of the root VM from the guest VMs and the guest VMs from one another, managed by the hypervisor and the root OS. The hypervisor/root OS pairing leverages Microsoft’s decades of operating system security experience, as well as more recent learning from Microsoft’s Hyper-V, to provide strong isolation of guest VMs.

As the central orchestrator of much of the Windows Azure Fabric, significant controls are in place to mitigate threats to fabric controllers, especially from potentially compromised FAs within customer applications. Communication from FC to FA is unidirectional – the FA implements an SSL-protected service that is accessed from the FC and replies to requests only. It cannot initiate connections to the FC or other privileged internal nodes. The FC strongly parses all responses as though they were untrusted communications. In addition, the FCs and devices incapable of implementing SSL are on separate VLANs. This limits exposure of their authentication interfaces to a compromised node that hosts VMs.

The hypervisor and the root OS provide network packet filters to ensure that the untrusted VMs cannot generate spoofed traffic, cannot receive traffic not addressed to them, cannot direct traffic to protected infrastructure endpoints, and cannot send or receive inappropriate broadcast traffic. Customer access to VMs is limited by packet filtering at edge load balancers and at the root OS. In particular, remote debugging, remote Terminal Services, or remote access to VM file shares is not permitted by default.

VLANs are used to isolate the FCs and other devices. VLANs partition a network such that no communication is possible between VLANs without passing through a router, which prevents a compromised node from faking traffic from outside its VLAN except to other nodes on its VLAN. It also cannot eavesdrop on traffic that is not to or from its VLANs.

3.    Encryption of Data and Communications

Encryption of data in storage and in transit can be used by customers within Windows Azure to align with best practices for ensuring confidentiality and integrity of data. As noted previously, critical internal communications are protected using SSL encryption. At the customer’s option, the Windows Azure SDK extends the core .NET libraries to allow developers to integrate the .NET Cryptographic Service Providers (CSPs) within Windows Azure. Developers familiar with .NET CSPs can easily implement encryption, hashing, and key management functionality for stored or transmitted data.

4.    Deletion of Customer Data

Where appropriate, confidentiality should persist beyond the useful lifecycle of data. Windows Azure’s Storage subsystem makes customer data unavailable once delete operations are called. All storage operations including delete are designed to be instantly consistent. Successful execution of a delete operation removes all references to the associated data item and it cannot be accessed via the storage APIs. All copies of the deleted data item are then garbage collected. The physical bits are overwritten when the associated storage block is reused for storing other data, as is typical with standard computer hard drives.

5.    Integrity of Customer Data

The primary mechanism of integrity protection for customer data lies within the Fabric VM design itself. Each VM is connected to three local Virtual Hard Drives (VHDs):

• The D: drive contains one of several versions of the Guest OS kept up with most current patches.

• The E: drive contains an image constructed by the FC based on the package provided by customer.

• The C: drive contains configuration information, paging files, and other storage.

The D: and E: virtual drives are effectively read-only because their ACLs are set to disallow write access from customer processes. Since the operating system may need to update those read-only volumes, they are implemented as VHDs with delta files. The initial VHDs for all role instances in an application generally start out identical. The delta drive for the D: drive is discarded any time Windows Azure patches the VHD containing the OS. The delta drive for the E: drive is discarded any time the VHD is updated with a new application image. This design strictly preserves the integrity of the underlying operating system and customer applications.

The configuration file is stored on the read/write C: drive specifying the connectivity requirements of all roles in the application. The FC takes the subset of that configuration file appropriate for each role and places it in the C: drive for each role instance. If the customer updates the configuration file while the role instances are running, the fabric controller (FC) – through the fabric agent (FA) – contacts the guest agent (GA) running in the VM’s guest OS and instruct it to update the configuration file on the C: drive. It can then signal the customer’s application to re-read the configuration file. Only authorized customers accessing their Hosted Services via the Widows Azure Portal or SMAPI (as described earlier) can change the configuration file. So, by the inherent design of Windows Azure, the integrity of the customer configuration is protected, maintained, and persisted constantly during an application’s lifetime. As for Windows Azure Storage, integrity is dictated by applications using the simple access control model described earlier. Each Storage Account has two storage account keys that are used to control access to all data in that Storage Account, and thus access to the storage keys provide full control over the associated data.

As for Windows Azure Storage, integrity is dictated by applications using the simple access control model described earlier. Each Storage Account has two storage account keys that are used to control access to all data in that Storage Account, and thus access to the storage keys provide full control over the associated data.

6.    Secure Service Operations

Microsoft deploys combinations of preventive, detective and reactive controls including the following mechanisms to help protect against unauthorized developer and/or administrative activity. They keep  tight access controls on sensitive data, and combinations of controls that greatly enhance independent detection of malicious activity.  Additionally, Microsoft conducts background verification checks of certain operations personnel, and limits access to applications, systems, and network infrastructure in proportion to the level of background verification.

Each datacenter facility has a minimum of two sources of electrical power, including a power generation capability for extended off-grid operation.

Windows Azure runs in geographically distributed Microsoft facilities, sharing space and utilities with other Microsoft Online Services. Each facility is designed to run 24 x 7 and employs various measures to help protect operations from power failure, physical intrusion, and network outages. These data centers comply with industry standards for physical security and reliability and they are managed, monitored, and administered by Microsoft operations personnel. They are designed for “lights out” operation

The Windows Azure internal network is isolated by strong filtering of traffic to and from other networks. This provides a “backplane” for internal network traffic that is high-speed and at low risk from malicious activity generally. The configuration and administration of network devices such as switches, routers, and load balancers is performed only by authorized Microsoft operations personnel, and generally only at major changes (such as when the data center itself is reconfigured). The virtualization provided by the Windows Azure Fabric makes such changes practically invisible to customers. Furthermore, any hardware that does not implement adequate communications security features (such as SSL) is administered over a separate LAN that is isolated from nodes that are exposed to the Internet, or customer access.

Business and Regulatory Requirements

The importance of business and regulatory compliance has increased dramatically with the proliferation of global standards including ISO 27001, Safe Harbor and many others. In many cases, failure to comply with these standards can have a dramatic impact on organizations, up to and including catastrophic financial penalties and damage to reputation. Any of the previously discussed threats can have an impact on compliance, but there are also threats that are directly related to failure to adhere to recognized practices, provide representation of compliance to independent auditors, support e-discovery, and otherwise facilitate reasonable efforts by customers to verify alignment with regulatory, legal, and Windows Azure Security Overview Microsoft 19 contractual requirements. Microsoft provides customers with the information they need to decide whether it is possible to comply with the laws and regulations to which they are subject within the context of Windows Azure and the tools to demonstrate that compliance when it is possible. Some of the ways Windows Azure assists customers with compliance are discussed below.

ISO 27001 Certification

Trusted third-party certification provides a well-established mechanism for demonstrating protection of customer data without giving excessive access to teams of independent auditors that may threaten the integrity of the overall platform. Windows Azure operates in the Microsoft Global Foundation Services (GFS) infrastructure, portions of which are ISO27001-certified. ISO27001 is recognized worldwide as one of the premiere international information security management standards. Windows Azure is in the process of evaluating further industry certifications. In addition to the internationally recognized ISO27001 standard, Microsoft Corporation is a signatory to Safe Harbor and is committed to fulfill all of its obligations under the Safe Harbor Framework. While responsibility for compliance with laws, regulations, and industry requirements remains with Windows Azure customers, Microsoft remains committed to helping customers achieve compliance through the features described above.

One of the key challenges inherent to Windows Azure is balancing compliance requirements against one of the key economic drivers behind cloud services: segmenting customer data and processing across multiple systems, geographies, and regulatory jurisdictions. Windows Azure addresses this challenge in a very simple way: customers choose where their data is stored. Data in Windows Azure is stored in Microsoft datacenters around the world based on the geo-location properties specified by the customer using the Windows Azure Portal. This provides a convenient way to minimize compliance risk by actively selecting the geographic locations in which regulated data will reside.

Summary

Windows Azure provides many mechanisms for protecting customer access and data.  Subscriptions manage access to Azure resources. Azure storage uses keys to protect access to the data stored in those entities.  Windows Azure VMs provide special hardened instances with the Windows Azure Hypervisor with many virtual layers to protect access much better than a physical server VMs are protected from each other by the Hypervisor. Confidentiality of Azure customer data is accomplished through Identity Access Management protection and runs applications with the least amount of privilege to prevent damage from occurring. Encryption of key communication is protected via SSL encryption.  Deletion management of customer data prevents it from being accessible after it has been removed. The read-only structure of the D and E virtual drives protect their contents from intrusion.  Tight controls and regulatory compliance on their personnel and data center off a level of physical protection not found in most companies.  Data stored in Window Azure are in most cases safer and more secure than it is within the walls of an on-premise system or database.

 

 

Helpful Links

  • Microsoft’s Global Foundation Services Security – Responsible for delivering the trustworthy, available operations environment that underlies Windows Azure

http://www.globalfoundationservices.com/security/

  • Windows Azure Trust Center – For concerns/questions on Azure security

http://www.windowsazure.com/en-us/support/trust-center/

  • Security Best Practices For Developing Windows Azure Applications

http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=7253

Follow

Get every new post delivered to your Inbox.