Scaling of VM Roles and Worker Roles, why divide by total?

Category: azure management


BrianEh on Wed, 25 Jun 2014 19:29:10

I am trying to tweak with the auto-scale settings and it is my understanding that the CPU load is divided among the total number of instances within the Role.

I get this in concept. 

However, what I am seeing is that if I have two instances in my availability group, and I have one instance running.  That if I set the CPU load slider too high, my new instance never turns on / provisions.

It is as if the calculation is always averaging among the total possible - therefore if one is at 100% CPU, then that equates to 50% CPU.  And if my slider is set to 60%, my second VM never turns on. (I left it running all night this way once to see).

Where what I think the slider should mean is that I never want a single instance to go beyond 60% utilization, if it does then give me another instance.

Am I following this properly?

Since I only have the scaling metric of CPU (since I don't have a Service Bus queue) I have to use it wisely and make sure that it reacts when appropriate - and in my case I might want it to scale out at 40% to ensure that there is no negative user experience.

And right now, I am not seeing that I can make that happen.

Brian Ehlert<br/><br/> Learn. Apply. Repeat. <br/>


Jambor yao on Thu, 26 Jun 2014 08:01:49


Azure's auto-scaling engine examines 60-minute cpu-utilization averages every 5 minutes.  This means that every 5 minutes it has a chance to decide if your CPU utilization is too high and scale you up. How to scale effectively in windows azure? I suggest you read this thread: . Hope this helps, if not, please feel free to let me know.

Best Regards,


BrianEh on Thu, 26 Jun 2014 15:04:51

That my explain what I am experiencing then.  And why when I stand up new instances it takes some time before the scale up happens.

But, I still don't fully understand why the averaging is against the total and not the total running. (at least that is what the documentation and prompts are leading me to believe).

But I don't think that is the case and I am a little confused based on what I am seeing.

  • I have my CPU load slider set at 20% - 30%.
  • I induced load in one role instance at 40% and let that continue to run.
  • After a long wait, my additional instances were provisioned.
  • I allowed it to run overnight.
  • this morning I see an average CPU load of 24% and all three instances are running.

What I expected to see was two instances running since  (37% + 3% + 4%) / 3 = 15% - But as I think about this while typing...  (37% + 4%) / 2 = 20.5 and thus higher than my bottom threshold of 20%

What I am trying to achieve is that sweet spot of less than 50% CPU load, but yet scaling down efficiently enough to keep cost in line.

JIAN WU - MSFT on Fri, 11 Jul 2014 09:00:25


autoscaling per CPU is compared with the average CPU usage of all instances of the role.

so if you set the bottom bar is 20% and have 2 instances running there with CPU usage 37% and 4%, azure fabrci will autoscale it to 3 instances, because average cpu is 20.5% and beyond the bottom bar.

actually, autoscaling based on cpu works as below:

All instances are included when calculating the average percentage of CPU usage and the average is based on use over the previous hour. Depending on the number of instances that your application is using, it can take longer than the specified wait time for the scale action to occur if the wait time is set very low.

As a result, if you have an app that is at 0% load, and then start running a load test to make it go to 100% (and have a scale-up target of 80%) it will take at least 45 minutes before the scale action will start. However, this is not a typical real-world scenario. It’s more likely that your load is already high (say 75%). In this scenario, it would take much less time to trigger the scale action.

One of the reasons that we do an hourly average is, with the current platform, it’s impossible to get metrics from Virtual Machines  or Cloud Services under a 15 minute latency. So, if we scaled based just on the last 5 or 10 minutes, we would never have data to scale on. You can see this in the screenshot below, it was taken around 5:30, but the most recent data point is at 5:15.

In the future, the azure platform is looking at ways to speed up metric collection, but this is likely not coming for quite some time. As a result, the best we can do is a rolling average over a larger time window.

let me know if there is any question.

best regards