Azure storage: Disaster Recovery and failover capabilities

Microsoft recently announced a new feature that allows, for geo-redundant Azure storage account, to carry out a piloted failover. This feature increases control on this type of storage accounts, allowing greater flexibility in Disaster Recovery scenarios. This article shows the working principle and the procedure to follow in order to use this new feature.

Types of storage accounts

In Azure there are different types of storage account with distinct replication characteristics, to obtain different levels of redundancy. If you wish to keep the data present on the storage account even in the face of failures of an entire region of Azure it is necessary to adopt the geo-redundant storage account, among them there are two different types:

  • Geo-redundant storage (GRS): the data is replicated asynchronously into two geographical region of Azure, distant hundreds of miles between them.
  • Read-access geo-redundant storage (RA-GRS): it follows the same replication principle as previously described, but with the characteristic that the secondary endpoint can be accessed to read the replicated data.

Using these types of storage account are maintained three copies of the data in the primary region of Azure, selected during the configuration phase, and an additional three asynchronous copies of the data in another region of Azure, following the principle of Azure Paired Regions.

Figure 1 - Normal operation of the storage type GRS/RA-GRS

For more information about the different types of storage account and its redundancy you can consult the Microsoft's official documentation.

Characteristics of storage account failover process

Thanks to this new feature, the administrator has the option to start the account failover process deliberately, when deemed appropriate. The failover process update the public DNS record of the storage account, in this way, the requests are routed to the endpoint of the storage account in the secondary region.

Figure 2 – Failover process for a GRS/RA-GRS storage account

After the failover process, the storage account is configured to be a locally redundant storage (LRS) and it is necessary to proceed with its configuration to make geo-redundant again.

An important aspect to keep in mind, when you decide to take a failover of the storage account, is that this operation can result in a loss of data, because replication between the two regions of Azure is done asynchronously. Because of this aspect, in case of unavailability of the primary region, may not have been replicated to the secondary region all changes. To verify this condition you can refer to the property Last Sync Time that indicates when it is guaranteed that the data was successfully replicated to the secondary region.

Storage account failover procedure from the Azure Portal

Following, shows the steps to fail over to a storage account directly from the Azure Portal.

Figure 3 – Storage failover process account

Figure 4 – Storage account failover process confirmation

The procedure to start the failover of a storage account can be carried out not only by the portal Azure, but also through PowerShell, Azure CLI, or by using the API for the Azure Storage resources.

How to identify the problems on the storage account

Microsoft recommends that applications that use the storage accounts are designed to support possible errors in the writing stage. In this way, the application should expose any failures encountered in writing, in order to be alerted to the possible unavailability in gaining access to storage in a given region. This would allow take corrective actions, such as the failover of the GRSRA-GRS storage account.

Natively the platform , through the service Azure Service Health, provides detailed information if you experience conditions that affect the operation of its services available in Azure, including storage. Thanks to the complete integration of Service Health on Azure Monitor, which holds the alerting engine of Azure, you can configure specific Alerts if there are issues on Azure side, that impact on the operation of the resources present on your own subscription.

Figure 5 - Create Health alert in the Service Health Service

Figure 6 - Notification rule of issues on storage

The notification occurs through Action Groups, following which it is possible to evaluate the real need to take the storage account failover process.

Conclusions

Before the release of this feature, with GRS/RA-GRS storage account type, failover still had to be driven by Microsoft staff against a storage fault of an entire Azure region. This feature provide to the administrator the ability to failover, providing greater control over storage account. At the moment this feature is available for preview and only for storage accounts created in certain Azure regions. As with other Azure functionality in preview it is best to wait for the official release before using it for workloads in a production environment.