Azure has always offered out of the box auto scaling for classic VM’s (virtual machines in cloud services) but for ARM virtual machines no such offering exists.
Azure does offer scale sets for auto scaling ARM VM’s but scale sets may not fit your workflow or you may have existing ARM virtual machine farms that you want to apply auto scaling to. I will demonstrate how we can use Azure VM diagnostics and Azure Automation to create auto scaling systems that can work using a variety of metrics and scale out (increase VM number) as well as scale up (increase VM size)
Lets say we have a web application deployed on 5 virtual machines all sitting in an availability set. In quiet times we want to deallocate 3 of the VM’s and leave only 2 running (we will always leave 2 servers running to ensure high availability). At times of high demand we want the 3 powered off servers to be automatically powered on. We could detect these times of demand by monitoring a variety of metrics but for this example I will monitor server CPU usage.
The first step we need to do is enable diagnostics on all our VM’s. This involves installing and configuring an extension on all our VM’s that will collect advanced metrics and persist that data to an Azure storage table. There are two ways to enable VM diagnostics.
We can enable diagnostics from the Azure Portal by going to the diagnostics blade of our VM. Here we select which diagnostics we wish to capture (I recommend basic as a minimum) and we also specify a storage account were all the performance data will be persisted.
Using ARM Templates
A much better way enable diagnostics using is by using ARM template to install configure the diagnostics extension onto our target vm(s).
Here is a simple ARM template which installs and configures the diagnostics extension on an existing single virtual machine.
This article goes into more detail about the vm diagnostics extension and how to configure it using ARM.
Once that is done you will see a much richer set of diagnostics data available in the portal and if you care to look, the data is also being persisted to tables in the storage account you specified earlier.
Getting Performance Metrics Using Powershell
Now we are capturing a rich amount of data about our VM and also Powershell commands like Get-AzureRmMetric will start working and give us a way to automatically get performance data about our virtual machine. For example, here is a Powershell script that will return CPU usage data for every minute of the last 5 minutes for a given ARM virtual machine.
If you are wondering where “\Processor(_Total)\% Processor Time” came from then simply run Get-AzureRmMetricDefinition against the VM.
This lists all the metrics currently being captured for this VM and using these metric names we can get up to date diagnostics data on a huge range of performance metrics, not just CPU usage. We could even configure the diagnostics extension to capture custom metrics about our VM and monitor those.
We are going to use this metric data to drive our autoscaling solution.
Putting everything together, I want to write a Powershell script that will run in Azure Automation on a regular schedule (every 15 mins say). The script will
- Retrieve up to date performance data for the last 5 minutes (in this case CPU usage but it could any other metric) for each of my VMs.
- Determine which VM s are currently powered on and whether each VM is currently operating in some pre defined optimal performance range.
- If the VMs are operating above the optimal range then one or more powered off VMs will be powered on.
- If the VM’s are operating below the optimal range then one or more VMs will be powered off, always ensuring at least 2 VMs are kept running.
Steps 3 and 4 will require some logic to determine when and how to power on and power off machines and it can of course be customised for any particular scenario.
Here is the final script. It is designed to run in Azure Automation so it takes as input the name of an automation account (usually AzureRunAsConnection if you are want to use the default account created for you when you create your automation account in Azure)
The scaling logic I have shown is very simple. In the real world you may want to customise it to better suit your needs. For simplicity’s sake I have coded it so that if more VMs are working above the optimal range then are working below the optimal range then the server group is scaled up and if more VMs are working below the optimal range then are working above the optimal range then the server group is scaled down.
There are a couple of things to be aware of when trying to use the above script in Azure Automation. The script depends on AzureRM.Insights which is not (yet) imported into a default instance of Azure Automation. You will have to import it manually but before you do that you will have to update the other AzureRM modules in your automation account to their latest versions.
The best way to do that is by importing and running the Update-ModulesInAutomationToLatestVersion runbook. Once you have run that runbook then all the AzureRM modules will be at their latest versions and you can happily import AzureRM.Insights.
Scheduling the AutoScale Job
I would recommend running your autoscale job every 10-15 minutes but it depends on your requirements. Automation Scheduling doesn’t allow you to schedule a runbook more tha once an hour. The better way to schedule runbooks is by using webhooks and Azure Scheduler.
Simply create a web hook for the autoscale runbook and correctly specify all the paremeters. Then create a scheduled job to execute every 15 mins using the webhook url.
Thats it, using Azure VM diagnostics and Azure Automation lets you create and customise any number of custom auto scale strategies. For example, I could choose to monitor memory usage and if available memory becomes too low on a given VM then I could scale that VM up, i.e. increase it’s size.