How to guard your application from Azure outages

How to guard your application from Azure outages
10/2/2014 5:13:51 PM

There have been a few outages in Azure land in the past month or so.  This of course has impacted some of their customer base and created the loud question “how do I ensure that their outages don’t bring my applications down?”.  The first step is to understand that simply having your application in a cloud does not make that application Highly Available.  HA is not a default behavior…it is a planned and configured behavior.  Let’s learn how to make an application highly available in Azure.

image

Availability sets, planned, and unplanned maintenance events

Planned: A planned maintenance event is one that is done to the underlying fabric of the Azure system.  This doesn’t always impact your applications.  But from time to time a restart of the VM may be required.  Generally these types of events will be communicated ahead of time so that you can react to them if need be.  But we will see that with proper planning around this notion you should never been impacted.

Unplanned: An unplanned even is when something happens to the physical hardware that your application is running on.  This could be a disk failure.  A server failure.  Etc.  In this case the Azure system will relocate your VM or web site to hardware that is not having issues and attempt to bring it back up.  If you only have one instance of your application running then some down time is to be expected.  However, we will see how even this sort of event can be planned for.

Availability sets:  You need to understand that an Availability set is the first way to guard against impact of planned and unplanned events.  From the planned event point of view – you should know that you can configure your application for redundancy by installing more than one instance of your application in the same availability set.  When you install VM1 and VM2 in the same Availability Set you get some immediate protection.  That is because each VM is automatically placed in what is called a Fault Domain (FD) and an Update Domain (UD).  The FD protects you from hardware, power, and switching failures.  And the UD protects you from rolling out updates where only one UD at a time will be taken down and brought back up.  There are five UD’s and two FD’s per availability set.  As you add a new VM it is added to the next FD and UD.  VM1 goes to UD1 and FD1.  VM2 goes to UD2 and FD2.  VM3 goes to UD3 and FD1.  Etc.  When you get to VM6 it rotates back to UD1.  And so on.  Neither the FDs or the UDs will protect you from operating system or application failures!

Understanding availability sets in your application: With this understanding you can see why it is important to put each tier of your application into its own availability set with at least two instances of your application per each availability set.  You would want your web tier in Availability Set 1.  Your cron jobs in Availability Set 2.  Your read layers denormalization service in Availability Set 3.  Etc.  This way you will always have at least one instance of each application concern available.

How to configure an availability set

Now that we understand what role an availability set plays for our application, let’s quickly see how to implement one.  Know that the configuration of an availability set is an extra few steps when creating your VM’s.

I started by creating my first VM from the quick create screen.

image

Then I will click on the name of the new VM to see the details for that VM instance.

image

Then click the configure tab.

image

Now you can set the availability set.  If this is the first time you have done this there won’t be any existing sets in the list.  So select create availability set from the drop down menu.

image

And enter the name of the availability set.  Then click save.

image

Once you click save, the VM will be taken down and reconfigured.  This can take a minute or so.  Then it will be started back up.

image

image

Once the VM comes back online you can see a new message in place stating that there is only one VM in the set.  And that this isn’t enough to ensure the SLA for the availability set.

image

Now we can create the next VM(s).  This can’t currently be done via the quick create as there is no way to associate the new VM’s with an existing cloud service.  And in order to share the availability set the new VM’s must be on the same cloud service.  For that reason use the create from gallery option to get into the details of your new VM’s creation.

Notice in the Virtual machine configuration screen that we can set the existing cloud service and pick the new availability set.

image

Now we can see in both VM configurations that they are attached to the same availability set.

image

For more on understanding availability sets take a look at the MS documentation: http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-manage-availability/

comments powered by Disqus