Archive for September, 2012


While putting a slide deck together for some customers I pulled info for a number of sources, added some of my own info, and created a good flow. I decided to then take the deck and the supporting notes pages I had used and put it into a blog post. Rather long, and it duplicates info found in other documents (see links at the end).  I like the way the deck blended it all together so thought you might  want to see it in text form and simplfied down to key points.  Enjoy!

——————-

One of the common fears I hear from companies who are hesitant to move to the Cloud is the terror of their data not being 100% under their control living within the walls of their own private data center servers.   I can understand those feelings.   And a few of these fears may be well-founded in some situations due to regulatory requirements. If legally you are required to keep your data on premise, stored in a certain format, keep it from being transferred through certain geographic areas, or stored away from certain other data (for instance, can’t store data for different companies in the same database) then storing your data in the Cloud is probably not for you.

However, that does not mean the Cloud as a whole is not for you.  With the advent of Virtual Networks with the new IaaS feature of Windows Azure, and the technology of Azure Connect, you may be able to run your application in the Cloud keeping your data on-premise. Or you can move your non-critical data to the Cloud and leave the important data on-premise.

Microsoft has invested a significant amount of planning and technologies into securing a customer’s data and access within Azure. Here I will try my best to answer some common discussions and concerns that customers often ask about Azure, such as:

  • How can I ensure my data remains private?
  • Can Azure prevent non-authorized calls into my code?
  • Do I need to encrypt my data for Azure?
  • What about multi-tenant operations and protecting shared data?
  • Are calls between Azure and customer, and internal to Azure, secure?
  • How is my data backed up?
  • When data is deleted logically can I ensure it is also done physically?
  • What about regulatory data requirements?

Access Security

Access control for Hosted Services and Storage Accounts is governed by the subscription. The ability to authenticate with the Live ID associated with the subscription grants full control to all of the Hosted Services and Storage Accounts within that subscription.  Administrators can create Co-Administrators who then have access to all the services in that subscription.

Customers upload developed applications and manage their Hosted Services and Storage Accounts through the Windows Azure Portal web site or programmatically through the Service Management API (SMAPI). Customers access the Windows Azure Portal through a web browser or access SMAPI through standalone command line tools, either programmatically or using Visual Studio.

SMAPI authentication is based on a user-generated public/private key pair and self-signed certificate registered through the Windows Azure Portal. The certificate is then used to authenticate subsequent access to SMAPI. SMAPI queues requests to the Windows Azure Fabric which then provisions, initializes, and manages the required application. Customers can monitor and manage their applications via the Portal or programmatically through SMAPI using the same authentication mechanism.

Access to Windows Azure storage is governed by a storage account key (SAK) that is associated with each Storage Account. A more sophisticated access control model can be achieved by creating a custom application “front end” to the storage, giving the application the storage key, and letting the application authenticate remote users and even authorize individual storage requests.

Window Azure VM Protection

Windows Azure creates a virtual machine (VM) for each role instance, then runs the role in one of those VMs. These VMs in turn run on a hypervisor that’s specifically designed for use in the Cloud (the Windows Azure Hypervisor).  One VM is special: it runs a hardened operating system called the root OS that hosts a fabric agent (FA). FAs are used in turn to manage guest agents (GA) within guest operating systems on customer VMs. FAs also manage storage nodes. The collection of Windows Azure hypervisor, root OS/FA, and customer VMs/GAs comprises a compute node.

After a compute node is booted, it starts the fabric agent (FA) and awaits connections and commands from the fabric controller (FC). The FC connects to the newly booted node using SSL, authenticating bi-directionally via SSL as described previously. FC communication with FAs is via one-way push, making it difficult to attack those higher in the chain of command because they cannot make direct requests to those components. Combined with the many mechanisms described above, these features help maintain the Fabric in a safe and clean state for customers.

All of these layers of VM management and the fact it is hardened makes it much very tough for anyone to gain unauthorized access.

Confidentiality of Windows Azure Customer Data

Confidentiality ensures that a customer’s data is only accessible by authorized entities. Windows Azure provides confidentiality via the following mechanisms. I will discuss each of them in subsequent sections.

  1. Identity and Access Management – Ensures that only properly authenticated entities are allowed access.
  2. Isolation – Minimizes interaction with data by keeping appropriate containers logically or physically separate.
  3. Encryption – Used internally within Windows Azure for protecting control channels and is provided optionally for customers who need rigorous data protection capabilities.
  4. Deletion of extraneous customer data – Removed and cleaned up when no longer needed
  5. Integrity of customer data – Fabric VM and tightly controlled access to virtual hard drive where the only way to the data is through Fabric
  6. Service Operations –  Data center personnel and processes along with enhanced network Security

 

1.    Azure Identity Access Management

Credential and key management are critical components of the security design and implementation of Windows Azure. Azure uses them to ensure only authenticated users get access to Azure resources.

Running applications with “least privilege” is widely regarded as an information security best practice. To align with the principle of least privilege, customers are not granted administrative access to their VMs, and customer software in Windows Azure is restricted to running under a low-privilege account by default (in future versions, customers may select different privilege models at their option). This reduces the potential impact and increases the necessary sophistication of any attack, requiring privilege elevation in addition to other exploits. It also protects the customer’s service from attack by its own end users.

All communications between Windows Azure internal components are protected with SSL. In most cases, the SSL certificates are self-signed.

2.    Isolation of Key Components

A critical boundary is the isolation of the root VM from the guest VMs and the guest VMs from one another, managed by the hypervisor and the root OS. The hypervisor/root OS pairing leverages Microsoft’s decades of operating system security experience, as well as more recent learning from Microsoft’s Hyper-V, to provide strong isolation of guest VMs.

As the central orchestrator of much of the Windows Azure Fabric, significant controls are in place to mitigate threats to fabric controllers, especially from potentially compromised FAs within customer applications. Communication from FC to FA is unidirectional – the FA implements an SSL-protected service that is accessed from the FC and replies to requests only. It cannot initiate connections to the FC or other privileged internal nodes. The FC strongly parses all responses as though they were untrusted communications. In addition, the FCs and devices incapable of implementing SSL are on separate VLANs. This limits exposure of their authentication interfaces to a compromised node that hosts VMs.

The hypervisor and the root OS provide network packet filters to ensure that the untrusted VMs cannot generate spoofed traffic, cannot receive traffic not addressed to them, cannot direct traffic to protected infrastructure endpoints, and cannot send or receive inappropriate broadcast traffic. Customer access to VMs is limited by packet filtering at edge load balancers and at the root OS. In particular, remote debugging, remote Terminal Services, or remote access to VM file shares is not permitted by default.

VLANs are used to isolate the FCs and other devices. VLANs partition a network such that no communication is possible between VLANs without passing through a router, which prevents a compromised node from faking traffic from outside its VLAN except to other nodes on its VLAN. It also cannot eavesdrop on traffic that is not to or from its VLANs.

3.    Encryption of Data and Communications

Encryption of data in storage and in transit can be used by customers within Windows Azure to align with best practices for ensuring confidentiality and integrity of data. As noted previously, critical internal communications are protected using SSL encryption. At the customer’s option, the Windows Azure SDK extends the core .NET libraries to allow developers to integrate the .NET Cryptographic Service Providers (CSPs) within Windows Azure. Developers familiar with .NET CSPs can easily implement encryption, hashing, and key management functionality for stored or transmitted data.

4.    Deletion of Customer Data

Where appropriate, confidentiality should persist beyond the useful lifecycle of data. Windows Azure’s Storage subsystem makes customer data unavailable once delete operations are called. All storage operations including delete are designed to be instantly consistent. Successful execution of a delete operation removes all references to the associated data item and it cannot be accessed via the storage APIs. All copies of the deleted data item are then garbage collected. The physical bits are overwritten when the associated storage block is reused for storing other data, as is typical with standard computer hard drives.

5.    Integrity of Customer Data

The primary mechanism of integrity protection for customer data lies within the Fabric VM design itself. Each VM is connected to three local Virtual Hard Drives (VHDs):

• The D: drive contains one of several versions of the Guest OS kept up with most current patches.

• The E: drive contains an image constructed by the FC based on the package provided by customer.

• The C: drive contains configuration information, paging files, and other storage.

The D: and E: virtual drives are effectively read-only because their ACLs are set to disallow write access from customer processes. Since the operating system may need to update those read-only volumes, they are implemented as VHDs with delta files. The initial VHDs for all role instances in an application generally start out identical. The delta drive for the D: drive is discarded any time Windows Azure patches the VHD containing the OS. The delta drive for the E: drive is discarded any time the VHD is updated with a new application image. This design strictly preserves the integrity of the underlying operating system and customer applications.

The configuration file is stored on the read/write C: drive specifying the connectivity requirements of all roles in the application. The FC takes the subset of that configuration file appropriate for each role and places it in the C: drive for each role instance. If the customer updates the configuration file while the role instances are running, the fabric controller (FC) – through the fabric agent (FA) – contacts the guest agent (GA) running in the VM’s guest OS and instruct it to update the configuration file on the C: drive. It can then signal the customer’s application to re-read the configuration file. Only authorized customers accessing their Hosted Services via the Widows Azure Portal or SMAPI (as described earlier) can change the configuration file. So, by the inherent design of Windows Azure, the integrity of the customer configuration is protected, maintained, and persisted constantly during an application’s lifetime. As for Windows Azure Storage, integrity is dictated by applications using the simple access control model described earlier. Each Storage Account has two storage account keys that are used to control access to all data in that Storage Account, and thus access to the storage keys provide full control over the associated data.

As for Windows Azure Storage, integrity is dictated by applications using the simple access control model described earlier. Each Storage Account has two storage account keys that are used to control access to all data in that Storage Account, and thus access to the storage keys provide full control over the associated data.

6.    Secure Service Operations

Microsoft deploys combinations of preventive, detective and reactive controls including the following mechanisms to help protect against unauthorized developer and/or administrative activity. They keep  tight access controls on sensitive data, and combinations of controls that greatly enhance independent detection of malicious activity.  Additionally, Microsoft conducts background verification checks of certain operations personnel, and limits access to applications, systems, and network infrastructure in proportion to the level of background verification.

Each datacenter facility has a minimum of two sources of electrical power, including a power generation capability for extended off-grid operation.

Windows Azure runs in geographically distributed Microsoft facilities, sharing space and utilities with other Microsoft Online Services. Each facility is designed to run 24 x 7 and employs various measures to help protect operations from power failure, physical intrusion, and network outages. These data centers comply with industry standards for physical security and reliability and they are managed, monitored, and administered by Microsoft operations personnel. They are designed for “lights out” operation

The Windows Azure internal network is isolated by strong filtering of traffic to and from other networks. This provides a “backplane” for internal network traffic that is high-speed and at low risk from malicious activity generally. The configuration and administration of network devices such as switches, routers, and load balancers is performed only by authorized Microsoft operations personnel, and generally only at major changes (such as when the data center itself is reconfigured). The virtualization provided by the Windows Azure Fabric makes such changes practically invisible to customers. Furthermore, any hardware that does not implement adequate communications security features (such as SSL) is administered over a separate LAN that is isolated from nodes that are exposed to the Internet, or customer access.

Business and Regulatory Requirements

The importance of business and regulatory compliance has increased dramatically with the proliferation of global standards including ISO 27001, Safe Harbor and many others. In many cases, failure to comply with these standards can have a dramatic impact on organizations, up to and including catastrophic financial penalties and damage to reputation. Any of the previously discussed threats can have an impact on compliance, but there are also threats that are directly related to failure to adhere to recognized practices, provide representation of compliance to independent auditors, support e-discovery, and otherwise facilitate reasonable efforts by customers to verify alignment with regulatory, legal, and Windows Azure Security Overview Microsoft 19 contractual requirements. Microsoft provides customers with the information they need to decide whether it is possible to comply with the laws and regulations to which they are subject within the context of Windows Azure and the tools to demonstrate that compliance when it is possible. Some of the ways Windows Azure assists customers with compliance are discussed below.

ISO 27001 Certification

Trusted third-party certification provides a well-established mechanism for demonstrating protection of customer data without giving excessive access to teams of independent auditors that may threaten the integrity of the overall platform. Windows Azure operates in the Microsoft Global Foundation Services (GFS) infrastructure, portions of which are ISO27001-certified. ISO27001 is recognized worldwide as one of the premiere international information security management standards. Windows Azure is in the process of evaluating further industry certifications. In addition to the internationally recognized ISO27001 standard, Microsoft Corporation is a signatory to Safe Harbor and is committed to fulfill all of its obligations under the Safe Harbor Framework. While responsibility for compliance with laws, regulations, and industry requirements remains with Windows Azure customers, Microsoft remains committed to helping customers achieve compliance through the features described above.

One of the key challenges inherent to Windows Azure is balancing compliance requirements against one of the key economic drivers behind cloud services: segmenting customer data and processing across multiple systems, geographies, and regulatory jurisdictions. Windows Azure addresses this challenge in a very simple way: customers choose where their data is stored. Data in Windows Azure is stored in Microsoft datacenters around the world based on the geo-location properties specified by the customer using the Windows Azure Portal. This provides a convenient way to minimize compliance risk by actively selecting the geographic locations in which regulated data will reside.

Summary

Windows Azure provides many mechanisms for protecting customer access and data.  Subscriptions manage access to Azure resources. Azure storage uses keys to protect access to the data stored in those entities.  Windows Azure VMs provide special hardened instances with the Windows Azure Hypervisor with many virtual layers to protect access much better than a physical server VMs are protected from each other by the Hypervisor. Confidentiality of Azure customer data is accomplished through Identity Access Management protection and runs applications with the least amount of privilege to prevent damage from occurring. Encryption of key communication is protected via SSL encryption.  Deletion management of customer data prevents it from being accessible after it has been removed. The read-only structure of the D and E virtual drives protect their contents from intrusion.  Tight controls and regulatory compliance on their personnel and data center off a level of physical protection not found in most companies.  Data stored in Window Azure are in most cases safer and more secure than it is within the walls of an on-premise system or database.

 

 

Helpful Links

  • Microsoft’s Global Foundation Services Security – Responsible for delivering the trustworthy, available operations environment that underlies Windows Azure

http://www.globalfoundationservices.com/security/

  • Windows Azure Trust Center – For concerns/questions on Azure security

http://www.windowsazure.com/en-us/support/trust-center/

  • Security Best Practices For Developing Windows Azure Applications

http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=7253

Advertisements

Before doing this you need to configure AzureWatch to point to your Azure subscription.

Local Perfmon

The Control Panel client version of AzureWatch uses your local Perfmon (where the tool is running) to display counters.  So you can go to Perfmon and add all instances of the AzureWatch counters if you choose to view it there instead of using AzureWatch.

 

How to View an Azure Queue

Go to Raw Metrics, right click on Managed Queue Counters and add the queue to be viewed.

Once you assign the raw metric of the queue, bring up Aggregated Metrics for the role that will be writing to the queue. Right click and add new Aggregate. Use a value of 2 mins for queues (5 mins for perfmon counters) and ‘Use latest value’.

Click on Publish Changes and wait a few minutes.

Within AzureWatch click on Metrics View – Live to see the current number of items in the queue.

Using Perfmon Counters

Simple formula to remember

  1. Create a Raw Metric
  2. Create an Aggregate
  3. Create a Rule

For the desired role click on Raw Metrics.

In the Raw Metrics Window right click on Managed Performance Counters and select Add New.

In the Performance Counter Properties dialog choose the category and the counter. When you hit okay it should now show up in the Raw Metrics box.

To create an aggregate, choose the raw metric you just created, calculate an average, and use 5 mins interval for perfmon.  Click OK.

Click Rules under the instance you want to monitor, right click New to bring up Rule Edit box.

Add a Boolean expression to trigger an alert.  You can make this as simple or complex as you like. Here I am using 20% to make the perfmon counter trigger. It will send an email to my account (configured earlier in Azure setup) and I can also choose to do autoscaling. Here I have arbitrarily decided to scale down (by one instance) since its faster than scaling up for demo sake.

Click OK and Publish Changes.

Start the app you are monitoring and view the counters either in Perfmon or in AzureWatch.

You should also have an email notification sent to the email address you configured at the start. The rules will be evaluated in order and once the first one is found it will trigger. Here I had previously configured a rule to trigger when the ASP Requests Queued was >5.

Scaling     action was successful
Rule mikesscaledownrequest triggered itops\Production\AspWebRole to perform     scale action: ‘Scale down by’
From instance count of 2 to 1
Rule formula: MikeRequestsQueued<5
All known parameter values:
Average20CPUTime: 9.82; (metric count: 31);
Avg5UnresponsiveInstanceCount: 0; (metric count: 5);
MikeRequestsQueued: 0; (metric count: 7);

You can also view the perfmon counters in the Dashboard view which shows each role and its corresponding items being monitored.

Window Management

Moving the windows and displaying them was a bit of  a challenge. For some reason I did not have a menu bar show up on my display.  When you start up the Explorer window is present and that’s what you need to view the other windows. If you close it down without the menu there is no way to bring it b back up except restart the tool. Clicking on the items in the Explorer window brings up their corresponding windows.  If you grab it and drag it a cross with arrows appears to allow you to move it to the location you desire by dropping it on one of the arrows.

Here I am dragging the explorer window to drop it on one of the arrows.

 

Instance Scaling Limits

You can configure the monitored app to not scale past or below a certain # of instances.

From the Explorer window double click on one of the roles.

Enter a minimum or maximum number of instances regardless of the rules you have configured. Remember in the rules you can scale up or down when the rule is true.

Got to sit in on a demo of AzureWatch today from Paraleap Technology’s founder and CEO Igor Papirov. Great guy, and awesome tool for monitoring your Azure applications. Here are some key features of the tool.

  • You DO have to enable DiagMon in your code
  • Unlike AzureOps or SCOM, you DON’T need to enable perfmon counters ahead of time via code or PowerShell scripts to have them show up in AzureWatch. It’s like W2K8 Server perfmon where you just select the counters you want to use for the app you want to monitor.
  • AzureWatch doesn’t require making any changes to your code or VM and just consumes the data that’s produced from Windows Azure Diagnostics.
  • With AzureWatch you do not need to install any agents anywhere to get perfmon data
  • AzureWatch currently only runs its configuration tool as a client application that is being moved to the Web. The monitoring service that runs from their Cloud-based servers.
  • It does not manage logging or displaying any log/trace file entries. But it does log errors AzureWatch encounters when monitoring.
  • Use WADPerfmonCounters table from which to both write and read data.
  • Configurable alerts and thresholds for emails and auto-scaling
  • You can store the config settings on the AzureWatch servers so you can access from any client platform and have access to them
  • Configurable auto-scaling (up or down) using many different counters and scale units you configure
  • View monitoring using RSS feed capabilities, email, mobile app/phone, or online
  • Monitors Windows Azure SQL Database instances and Windows Azure SQL Federations
  • Provides historical reports/views of past data captured and can export it to Excel
  • View custom perfmon counters (defined in your .NET code)
  • Monitors Azure storage queues
  • You have the option to define custom aggregates on your raw data (average, total, max, min, etc)
  • Once you define raw metrics and aggregations you can then define boolean rules (simple or complex) to either send an alert or to configure auto scaling (up or down)
  • You cannot leverage a single representation of a rule across multiple role instances but you can cut and paste the rules easily in the designer to span more than one instance
  • Very inexpensive pricing – to run it is about 1.5 cents per/hour, pay-for-what-you-use
  • http://azurewatch.net to sign up for free trial

With the release of Infrastructure as a Service (IaaS) this summer there seems to suddenly be an increased interest for IT Pros in Windows Azure. Originally most of Azure’s features, like Web and worker roles, were focused primarily upon developers. This accidentally led to Windows Azure being erroneously viewed by the IT Pro community as a chance for developers to bleed over into the roles of the IT Pro.  It has also been viewed unfairly as a driver behind their jobs eventually going away. After all – if the Cloud handles all the deployment, installation, and patching of the core software, and provisioning of the hardware, what roles still exists for IT Pros in that environment?

Once you get a better understanding of the types of opportunities that are created by the Cloud you can see that the theory of an obsolete IT Pro replaced by Windows Azure is not very realistic. For the IT Pro Azure VMs and Azure Web sites with the introduction of IaaS, Windows Azure is now also an ITPro platform. As an IT Pro it might help to think of Azure as an extension of your IT department.

Over the next few posts I will be discussing concise Windows Azure tips and best practices for IT Pros.  Let’s start with our first topic – Deployment.

Deployment

  • With Windows Azure, developers can go right to Azure and deploy their applications.  For your on-premise server you would not allow developers to do direct deployment to production. IT Pros need to take this control back! To manage deployment and keep developers from directly deploying from Visual Studio (they can also deploy with configuration files) create two Azure accounts – one account for development and one for production. Give them the development account to work with and do what they want. However, when they have to deploy to production they need to go through your account and you do it for them.
  • When you deploy try to keep the Azure storage and code on the same location.  You can create and use affinity groups to help with that. So if you have a market in the Far East you can easily deploy it all into the Far East.  This is an advantage for deployment you get with the cloud.  You cannot change a data center but you can choose a region of the world. The affinity group guarantees a hosted service and storage will be in the same datacenter. Group application pieces into a single deployment package when they must be hosted in the same data center.    Note – you cannot use affinity groups with Windows Azure SQL Database.

Management Certificates

Closely related to the topic of deployment is the management of certificates in the Azure portal.  Each subscription should have its own separate Azure service management certificate which is unique to that subscription.   To use a management or service certificates it must be uploaded through the Windows Azure Platform Management Portal.  Windows Azure uses both management certificates and service certificates.

  • Management Certificates – Stored at the subscription level, these certificates are used to enable Windows Azure using the ‘management’ tools:  SDK tools, the Windows Azure Tools for Visual Studio, or REST API calls.  These certificates are independent of any hosted service or deployment.
  • Service Certificates – Stored at the hosted service level, these certificates are used by your deployed services.

Typical IT policies define distinct roles for parties associated with application Development, Test, Integration, Deployment, and Production. These policies restrict roles from operations that exceed their defined responsibilities. Here are some suggestions on how to manage certificates for these roles.

Development – Share a certificate between all the developers to allow freedom of development.

Integration – Have its’ own management certificate

Test – Certificate shared only with the Operations team

Deployment – Used for deployment roles and only distributed to parties responsible for application deployment

Production – Certificate shared only with the Operations team

I have been recently playing around with SCOM 2012 and the System Center Monitoring Pack for Windows Azure Applications (http://www.microsoft.com/en-us/download/details.aspx?id=11324).  My goal was to understand how to monitor and view Performance counters for my Azure service using SCOM.  I had no previous experience with SCOM so this was a new adventure for me.

Overall I found SCOM very powerful, but not as straightforward to use as I had hoped.  There are more intuitive tools like AzureOps from OpStera to monitor Azure services and applications.  I had to create Run As accounts as Binary Authentication (for the certificate an private key) and Basic Authentication (for the certificate’s password). I then created a management pack which serves as a container for other SCOM entities.  From there I derived a Monitoring pack from the Windows Azure Application template. This is where I added the Azure-specific values to uniquely identify to SCOM the Azure service I wanted to monitor.   Finally I created rules, one per each performance counter I wanted to monitor.  Rule creation has a wizard to (most SCOM tasks I tried did) but a few of the fields were not as straightforward to complete, such as role instance type.

Counters used for an Azure application are a subset of those you would use for a Windows Server 2008 application.  For my Azure application I decided to use a sampling rate of one minute (1-2 is recommended) and a transfer rate of every 5 minutes. The transfer rate is how often Diagnostics Monitor will move the counter data from local storage into Azure storage.   I used the following Perfmon counters which are typical ones you would use in your Azure monitoring process.  The counters I monitored for a worker role are a subset of those I monitored for a Web role because the worker role does not include any IIS or ASP.NET functionality.

Counters for Web Roles

The following counter is used with the Network Interface monitoring object.

Bytes Sent/sec – Helps you determine the amount of Azure bandwidth you are using.

The following counters are used with the ASP.NET Applications monitoring object.

Request Error Events Raised – If this value is high you may have an application problem.  Excessive time in error processing can degrade performance.

  (__Total__)\Requests/Sec – If this value is high you can see displays how your application is behaving under stress. If low value, and other counters show a lot of activity going on (CPU or memory) there is probably a bottleneck or a memory leak.

 (__Total__)\Requests Not Found – If a lot of requests not found you may have a  virus or something wrong with the configuration of your Web site.

 (__Total__)\Requests Total – Displays the throughput of your application.  If this is low and CPU or memory are begin used in large amounts, you may have a bottleneck or memory leak.

(__Total__)\Requests Timed Out – A good indicator of performance. A high value means your system cannot turnover requests fast enough to handle new ones.  For an Azure application this might mean creating more instances of your Web role to the amount that these timeouts disappear.

(__Total__)\Requests Not Authorized – High value could mean a DoS attack. You can throttle them possibly to allow valid requests to come through.

Counters for Web and Worker Roles

For both worker and Web roles here are some counters to watch for your Azure service/application.

The following counter is used with the Processor monitoring object.

 (_Total)\% Processor Time – One of the key counters to watch in your application. If value is high along with the number of Connections Established you may want to increase the # of core in the VM for your hosted service.  If this value is high but low # of requests your application may be taking more CPU than it should.

The following counters are used with the Memory monitoring object.

Available Mbytes – If value is low you can increase the size of your Azure instance to make more memory available.

Committed Bytes – If constantly increasing it makes no sense to increase Azure instance size since you  most likely have a memory leak.

The following counters are used with the TCPv4 monitoring object.

Connections Established – shows how many connections to your service. If high, and Processor Time counter is low, you may not be releasing connections properly.

Segments Sent/sec – If high value may want to increase the Azure instance size.

Summary

In summary, using Perfmon counters is a valuable way to indirectly keep an eye on your application’s use of Azure resources. Often performance monitors can be used more effectively when in conjunction with each other. For instance, if you see a lot of memory being used you might want to check CPU utilization. If high CPU, lots of apps are using the memory and you need to scale up. If  low CPU then you probably have an issue with how the memory is being allocated or released.

You can use SCOM to track Perfmon if you know how to use it and your company has invested financially in a license.  Remember SCOM is a very rich and robust enterprise-scale tool with a ton of functionality. For instance, once you configure your hosted service as a monitoring pack, you can then view it in the Distributed Applications tab. This gives you consolidated and cascading summaries of the performance and availability of your Azure service.

If you don’t own or use SCOM, or if you merely want to keep it simple, then AzureOps is probably an easier option. It also has no installation/setup as well (runs as a Web service) and simple Azure auto-scaling based upon Perfmon threshold values. (http://www.opstera.com/products/Azureops/).

A special thanks to Nuno Godhino for providing me information for this post.

For those of you struggling with the cost and strategy of doing backups of your on-premise server data, Windows Azure Online Backup (preview) can help.   It permits backing up of server data to the Cloud to help prevent against data loss.  It runs as a cloud-based backup process and gives Admins a low-cost way to physically remote backup and recover their server data. Hopefully this is just another piece in the puzzle of the hybrid issues that most companies had to consider in the reality of their move to the Cloud.

http://blogs.msdn.com/b/windowsazure/archive/2012/09/07/windows-azure-online-backup-now-supports-system-center-2012-sp1.aspx

In my previous post, I discussed separate storage accounts and the locality of those accounts as well as transfer, sample and trace logging levels as ways to optimize using Diagnostics using Windows Azure. This post discusses six additional ways to optimize your Windows Azure Diagnostic experience.
  1. Data selection– Carefully select the minimal amount of diagnostic data that you need to monitor your application.  That data should contain only the information you need to identify the issue and troubleshoot your application.  Logging excess data increases the clutter of looking through logs data while troubleshooting and costs more to store in Windows Azure.
  2. Purge Azure diagnostic tables – Periodically purge the diagnostic tables for stale data that you will not need any more to avoid paying storage costs for dormant bits. You can store it back on-premise if you feel you will need it sometime later for historical or auditing purposes.  There are tools to help with this including System Center Monitoring Packfor Windows Azure.
  3. Set Perfmon Counters during role OnStart– Diagnostics are set per role instance. Due to scalability needs the number of role instances can increase or decrease. By putting the initialization of the Perfmon counters in the OnStart method (which is invoked when a role instance is instantiated) you can ensure your role instance will always start configured with the correct Perfmon counters.  If you don’t specifically setup the counters during the OnStart method, the configurations might be out of sync.  This is a common problem for customers who do not define the Perfmon counters in OnStart.
  4. Optimize Performance Counters– Sometimes diagnostics are like gift purchases before Christmas sale.  You need only a few of them but due to the lower prices you end up buying more than you need. The same goes with performance counters. Be sure what you are gathering are meaningful to your application and will be used for alerts or analysis.   Windows Azure provides a subset of the performance counters available for Windows Server 2008, IIS, and ASP.NET. Here are some of the categories of commonly used PerfMon counters for Windows Azure applications. For each of these categories there can be more than one actual counter to track:
    1. NET CLR exceptions
    2. NET CLR memory
    3. ASP.NET process and app restarts
    4. ASP. NET requests
    5. Memory and CPU
    6. TCPV4 connections
    7. Network Interface (Microsoft Virtual Machine Bus Network Adapter) bytes
  5. Manage max buffer size– When configuring Perfmon counters to be used in your role’s OnStart method you can specify a buffer size using the PerformanceCounters.BufferQuotaInMB property of the DiagnosticMonitorConfiguration object. If you set this to a value that fills up before the buffer is transferred from local to Azure storage you will lose the oldest events.  Make sure your buffer size has room to spare to prevent loss of diagnostic data.
  6. Consider WAD Config file – There are some cases where you may not want to put all the calls to configure Perfmon counters or logs to use in a config file instead of the code for the role. For instance, if you are using a VM which does not have a startup routine or non-default diagnostic operations, you can use the WAD config file to manage that.  The settings in the config file will be set up before OnStart method gets called.

Windows Azure Diagnostics enables you to collect diagnostic data from an application running in Windows Azure. This data can be used for debugging and troubleshooting, measuring performance, monitoring resource usage, traffic analysis and capacity planning, and auditing. In this blog post, I will share some optimizations that I have learned while working with Windows Azure Diagnostics during my time at Aditi Technologies.

  1. Keep separate storage accounts – Keep your Azure data storage in a different storage account than your application data. There is no additional cost to do this. If for some reason you need to work with Microsoft support and have them muck through your diagnostics storage locations you don’t have to allow them access to potentially sensitive application data.
  2. Locality of storage and application – Make sure and keep your storage account in the same affinity group (data center) as the application that is writing to it.  If for some reason you can’t do so, use a longer transfer interval so data is transferred less frequently but more of it is moved at once.
  3. Transfer interval – For most applications I have found that a transfer once every 5 minutes is a very useful rate.
  4. Sample interval – For most applications setting the sample rate to once per 1-2 minutes provides a good yet frugal sampling of data.  Remember that when your sampled data is move to Azure storage you pay for all you store. So you want to store enough information to help you get a true window into that performance counter, but not too much that you pay unnecessarily for data you won’t need.
  5. Trace logging level – While using the Verbose logging filter for tracing may give you lots of good information, it is also very chatty and your logs will grow quickly.  Since you pay for what you use In Azure only use the Verbose trace level when you are actively working on a problem. Once it is solved scale back to Warning, Error, or Critical levels which are less common and smaller amounts of messages written to the logs.

Stay tuned for my next post where I will write about the six additional ways to optimize the use of Windows Azure Diagnostics.