I have been recently playing around with SCOM 2012 and the System Center Monitoring Pack for Windows Azure Applications (http://www.microsoft.com/en-us/download/details.aspx?id=11324). My goal was to understand how to monitor and view Performance counters for my Azure service using SCOM. I had no previous experience with SCOM so this was a new adventure for me.
Overall I found SCOM very powerful, but not as straightforward to use as I had hoped. There are more intuitive tools like AzureOps from OpStera to monitor Azure services and applications. I had to create Run As accounts as Binary Authentication (for the certificate an private key) and Basic Authentication (for the certificate’s password). I then created a management pack which serves as a container for other SCOM entities. From there I derived a Monitoring pack from the Windows Azure Application template. This is where I added the Azure-specific values to uniquely identify to SCOM the Azure service I wanted to monitor. Finally I created rules, one per each performance counter I wanted to monitor. Rule creation has a wizard to (most SCOM tasks I tried did) but a few of the fields were not as straightforward to complete, such as role instance type.
Counters used for an Azure application are a subset of those you would use for a Windows Server 2008 application. For my Azure application I decided to use a sampling rate of one minute (1-2 is recommended) and a transfer rate of every 5 minutes. The transfer rate is how often Diagnostics Monitor will move the counter data from local storage into Azure storage. I used the following Perfmon counters which are typical ones you would use in your Azure monitoring process. The counters I monitored for a worker role are a subset of those I monitored for a Web role because the worker role does not include any IIS or ASP.NET functionality.
Counters for Web Roles
The following counter is used with the Network Interface monitoring object.
Bytes Sent/sec – Helps you determine the amount of Azure bandwidth you are using.
The following counters are used with the ASP.NET Applications monitoring object.
Request Error Events Raised – If this value is high you may have an application problem. Excessive time in error processing can degrade performance.
(__Total__)\Requests/Sec – If this value is high you can see displays how your application is behaving under stress. If low value, and other counters show a lot of activity going on (CPU or memory) there is probably a bottleneck or a memory leak.
(__Total__)\Requests Not Found – If a lot of requests not found you may have a virus or something wrong with the configuration of your Web site.
(__Total__)\Requests Total – Displays the throughput of your application. If this is low and CPU or memory are begin used in large amounts, you may have a bottleneck or memory leak.
(__Total__)\Requests Timed Out – A good indicator of performance. A high value means your system cannot turnover requests fast enough to handle new ones. For an Azure application this might mean creating more instances of your Web role to the amount that these timeouts disappear.
(__Total__)\Requests Not Authorized – High value could mean a DoS attack. You can throttle them possibly to allow valid requests to come through.
Counters for Web and Worker Roles
For both worker and Web roles here are some counters to watch for your Azure service/application.
The following counter is used with the Processor monitoring object.
(_Total)\% Processor Time – One of the key counters to watch in your application. If value is high along with the number of Connections Established you may want to increase the # of core in the VM for your hosted service. If this value is high but low # of requests your application may be taking more CPU than it should.
The following counters are used with the Memory monitoring object.
Available Mbytes – If value is low you can increase the size of your Azure instance to make more memory available.
Committed Bytes – If constantly increasing it makes no sense to increase Azure instance size since you most likely have a memory leak.
The following counters are used with the TCPv4 monitoring object.
Connections Established – shows how many connections to your service. If high, and Processor Time counter is low, you may not be releasing connections properly.
Segments Sent/sec – If high value may want to increase the Azure instance size.
In summary, using Perfmon counters is a valuable way to indirectly keep an eye on your application’s use of Azure resources. Often performance monitors can be used more effectively when in conjunction with each other. For instance, if you see a lot of memory being used you might want to check CPU utilization. If high CPU, lots of apps are using the memory and you need to scale up. If low CPU then you probably have an issue with how the memory is being allocated or released.
You can use SCOM to track Perfmon if you know how to use it and your company has invested financially in a license. Remember SCOM is a very rich and robust enterprise-scale tool with a ton of functionality. For instance, once you configure your hosted service as a monitoring pack, you can then view it in the Distributed Applications tab. This gives you consolidated and cascading summaries of the performance and availability of your Azure service.
If you don’t own or use SCOM, or if you merely want to keep it simple, then AzureOps is probably an easier option. It also has no installation/setup as well (runs as a Web service) and simple Azure auto-scaling based upon Perfmon threshold values. (http://www.opstera.com/products/Azureops/).
A special thanks to Nuno Godhino for providing me information for this post.