David Burg on Tue, 10 Dec 2013 00:59:45
This is an analysis of disaster recovery for the different components in use for a WABS solution, organized by component. Please let me know where my analysis is incorrect, and if the issues I identify are indeed issue, how can they be mitigated?
It is my understanding that Azure services rely on Windows Azure Storage Geo-Redundancy for Disaster Recovery. This because Azure web and worker roles are stateless while state is preserved in storage. The storage Geo-Redundancy is detailed in this blog release announcement:
SQL Azure (tracking DB)
SQL Azure, which is also used by WABS, is not Geo-Redundant by itself. The following MSDN article suggests to export the SQL DB regularly to a blob to leverage the blob's Geo-Redundancy (as one option):
Service Bus, which we need to implicitly use for the Relay part of the Adapter Service and may choose to use for Topics and Queues, is not Geo-Redundant by itself. The following MSDN article describes different methods to achieve Geo Redundancy and it seems that Active Replication is what we would want as we want to ensure that no messages are lost in case of a disaster:
There is likely an issue here that WABS does not either support duplicated relays for its adapter service piece, or Active Replication for Service Bus Queue/Topic as bridge source or destination. Is this concern correct?
ACS is not Geo-Redundant by itself, and while the Azure team performs daily backups they do not offer an SLA for failover, with a freakish "the recovery time can be several days depending on the scenario."
It seems we would need to setup a duplicate ACS in the fail-over data center - region and implement programmatic regular export from primary data center and import to recovery data center. The good thing is that the restore function of WABS already permits to input new ACS settings for the BizTalk Service restore. I am concerned however this applies only to the service configuration, not to each contained bridge configuration and its source-destination ACS information.
WABS tenant information
Backup is described in this Azure article:
Interestingly, this says: "
The Tracking Database is not automatically backed up.
Ouch! To backup you have to suspend WABS. That's not Ok from an HA perspective. How to backup WABS without interrupting the service's availability?
So the DR readiness story for WABS seems to be:
- Azure Storage artifacts are Geo-Redundant by default.
- Tracking SQL DB should be exported to blob storage on regular basis to benefit from Geo-Redundancy.
- ACS should be exported from primary data center and imported to secondary data center on a regular basis using custom code.
- Probably incomplete WABS support here in that likely they do not automatically update the bridge source and destination ACS information.
- Service Bus Relay for Adapter Service and any Queue or Topic we choose to use must be configured as Geo-Redundant.
- Likely not supported currently.
- WABS instance itself should be backed-up on a regular basis.
- Currently the need to suspend the instance to do a backup is an issue.
Senior Soft. Dev. Eng. | Microsoft IT | Microsoft Corporation