The Evolution of High Availability and Disaster Recovery in Modern Infrastructures: The Azure Local Case

High availability and disaster recovery solutions are playing an increasingly central role in modern infrastructure adoption strategies. Azure Local, Microsoft’s on-premises cloud-connected platform, exemplifies this transformation.

Starting with version 23H2, Azure Local introduces a new generation of features, moving away from the traditional Stretched Cluster model to propose more modern and flexible approaches designed to optimize resilience and simplify management. Through new configurations such as Rack Aware Cluster and disaster recovery support via Azure Site Recovery, Azure Local positions itself as a strategic platform for organizations seeking robust, scalable solutions aligned with the Azure ecosystem. In this article, we will explore the key features introduced in Azure Local version 23H2, analyzing the new high-availability options, disaster recovery strategies, and the impact of transitioning from Stretched Clusters to a more advanced model.

Azure Local, Version 23H2: An Arc-Enabled Evolution

The new version 23H2 marks a significant leap forward, transforming from a simple cloud-connected operating system to an Azure Arc-enabled solution with integrated features such as Arc Resource Bridge, Arc VM, and AKS. This transformation expands the possibilities for managing and controlling distributed environments, providing a unified administrative experience. Moreover, multi-site management extends beyond the operating system level, rendering the functionality of previous Stretched Clusters obsolete and introducing new paradigms of resilience and reliability.

High Availability Options

Rack Aware Cluster: High Availability for Short Distances

The standout feature for short-distance scenarios is the Rack Aware Cluster, a configuration that enables:

  • Deploying the cluster across two racks or rooms within the same Layer-2 network (e.g., within a manufacturing plant or campus).
  • Functioning as a local availability zone, ensuring fault isolation and optimal workload placement.

Figure 1 – Rack Aware Cluster: Network Architecture

This configuration offers an ideal solution for combining efficiency and ease of management in local environments. By leveraging a single storage pool, it reduces complexity and enhances overall efficiency, avoiding the overhead caused by excessive data replication. The Rack Aware Cluster is particularly suited for edge locations and can scale up to 8 nodes (4 per rack). Currently in private preview, public availability is expected by 2025.

Notably, even within Azure Local, the concept of availability zones has been introduced, aligning significantly with the established Azure model to ensure maximum reliability and operational continuity.

Disaster Recovery Options

Cloud Replication with Azure Site Recovery

For long-distance disaster recovery scenarios, Azure Local leverages Azure Site Recovery (ASR) to replicate on-premises virtual machines to the Azure cloud. This solution enables:

  • Replication: Transferring VM disks to an Azure storage account, safeguarding data from potential disasters.
  • Failover: Running replicated VMs directly in Azure during a disaster, ensuring operational continuity.
  • Re-protect: Replicating VMs back to the local cluster, maintaining a continuous protection cycle.
  • Failback: Bringing workloads back from the cloud to the on-premises system with minimal disruption.

These operations are managed centrally through the Azure portal, ensuring simplicity and efficiency for system administrators.

Local Replication with Hyper-V Replica

For workloads that cannot be moved to the cloud, Azure Local supports Hyper-V Replica, a solution that replicates Arc VMs to a secondary site. This approach allows:

  • Ensuring operational continuity by replicating data to a remote location.
  • Managing VM recovery as Hyper-V virtual machines at the secondary site and reverting to Arc VMs upon restoration on the primary cluster.

This feature, integrated into the Hyper-V role, represents an essential option for resilience in multi-site scenarios.

The Transition from Stretched Clusters

Introduced with Azure Local version 22H2, Stretched Clusters utilized Storage Replica to ensure resilience between two node groups located in distinct sites. This configuration:

  • Required at least two nodes per site and replicated storage synchronously to ensure data integrity in the event of failures.
  • Supported live migration of VMs between sites, facilitating smooth transitions for planned maintenance.

However, this solution required manual operations to reverse the direction of storage replication, a process that could introduce complexity and impact performance. With version 23H2, Stretched Clusters are no longer supported. Clusters configured with version 22H2 can still be partially upgraded to the 23H2 operating system, maintaining compatibility but without benefiting from the new features of the latest version.

For customers still using this configuration, it is advisable to consider adopting the new high availability and disaster recovery options offered by Azure Local, which guarantee greater efficiency and reliability.

Conclusions

The new features in Azure Local version 23H2 reflect a significant evolution toward more flexible, modern management aligned with the Azure ecosystem. With solutions like Rack Aware Cluster and integration with Azure Site Recovery, organizations can enhance the resilience of their local environments and ensure scalable and integrated disaster recovery options. Furthermore, abandoning the Stretched Cluster model paves the way for more efficient and streamlined configurations, enabling customers to fully leverage the potential offered by Azure technologies.

Please follow and like us: