Question

David Burg on Tue, 10 Dec 2013 00:59:45


This is an analysis of disaster recovery for the different components in use for a WABS solution, organized by component. Please let me know where my analysis is incorrect, and if the issues I identify are indeed issue, how can they be mitigated?

 

Azure Storage

It is my understanding that Azure services rely on Windows Azure Storage Geo-Redundancy for Disaster Recovery. This because Azure web and worker roles are stateless while state is preserved in storage. The storage Geo-Redundancy is detailed in this blog release announcement:

 

http://blogs.msdn.com/b/windowsazurestorage/archive/2011/09/15/introducing-geo-replication-for-windows-azure-storage.aspx

 

SQL Azure (tracking DB)

SQL Azure, which is also used by WABS, is not Geo-Redundant by itself. The following MSDN article suggests to export the SQL DB regularly to a blob to leverage the blob's Geo-Redundancy (as one option):

 

http://msdn.microsoft.com/en-us/library/windowsazure/hh852669.aspx#adr3

 

Service Bus

Service Bus, which we need to implicitly use for the Relay part of the Adapter Service and may choose to use for Topics and Queues, is not Geo-Redundant by itself. The following MSDN article describes different methods to achieve Geo Redundancy and it seems that Active Replication is what we would want as we want to ensure that no messages are lost in case of a disaster:

 

http://msdn.microsoft.com/en-us/library/jj554355.aspx

 

There is likely an issue here that WABS does not either support duplicated relays for its adapter service piece, or Active Replication for Service Bus Queue/Topic as bridge source or destination. Is this concern correct?

 

ACS

ACS is not Geo-Redundant by itself, and while the Azure team performs daily backups they do not offer an SLA for failover, with a freakish "the recovery time can be several days depending on the scenario."

 

http://msdn.microsoft.com/en-us/library/windowsazure/hh873027.aspx#RegionAccessControlService

 

It seems we would need to setup a duplicate ACS in the fail-over data center - region and implement programmatic regular export from primary data center and import to recovery data center. The good thing is that the restore function of WABS already permits to input new ACS settings for the BizTalk Service restore. I am concerned however this applies only to the service configuration, not to each contained bridge configuration and its source-destination ACS information.

 

WABS tenant information

Backup is described in this Azure article:

 

http://www.windowsazure.com/en-us/manage/services/biztalk-services/backup-restore/

 

Interestingly, this says: "

The Tracking   Database is not automatically backed up.

"

 

Ouch! To backup you have to suspend WABS. That's not Ok from an HA perspective. How to backup WABS without interrupting the service's availability?

 

Summary

So the DR readiness story for WABS seems to be:

  • Azure Storage artifacts are      Geo-Redundant by default.
  • Tracking SQL DB should be      exported to blob storage on regular basis to benefit from Geo-Redundancy.
  • ACS should be exported from      primary data center and imported to secondary data center on a regular      basis using custom code.
    • Probably incomplete WABS       support here in that likely they do not automatically update the bridge       source and destination ACS information.
  • Service Bus Relay for Adapter      Service and any Queue or Topic we choose to use must be configured as      Geo-Redundant.
    • Likely not supported       currently.
  • WABS instance itself should      be backed-up on a regular basis.
    • Currently the need to       suspend the instance to do a backup is an issue.

Senior Soft. Dev. Eng. | Microsoft IT | Microsoft Corporation


Sponsored