Maximize the performance of Azure Stack HCI: discover the best configurations for networking

Hyperconverged infrastructure (HCI) are increasingly popular as they allow you to simplify the management of the IT environment, reduce costs and scale easily when needed. Azure Stack HCI is the Microsoft solution that allows you to create a hyper-converged infrastructure for the execution of workloads in an on-premises environment and which provides a strategic connection to various Azure services to modernize your IT infrastructure. Properly configuring Azure Stack HCI networking is critical to ensuring security, application reliability and performance. In this article, the fundamentals of configuring Azure Stack HCI networking are explored, learning more about available networking options and best practices for networking design and configuration.

There are different network models that you can take as a reference to design, deploy and configure Azure Stack HCI. The following paragraphs show the main aspects to consider in order to direct the possible implementation choices at the network level.

Number of nodes that make up the Azure Stack HCI cluster

A single Azure Stack HCI cluster can consist of a single node and can scale up to 16 nodes.

If the cluster consists of a single server at the physical level it is recommended to provide the following network components, also shown in the image:

  • single TOR switch (L2 or L3) for north-south traffic;
  • two-four teamed network ports to handle management and computational traffic connected to the switch;

Furthermore, optionally it is possible to provide the following components:

  • two RDMA NIC, useful if you plan to add a second server to the cluster to scale your setup;
  • a BMC card for remote management of the environment.

Figure 1 – Network architecture for an Azure Stack HCI cluster consisting of a single server

If your Azure Stack HCI cluster consists of two or more nodes you need to investigate the following parameters.

Need for Top-Of-Rack switches (TOR) and its level of redundancy

For Azure Stack HCI clusters consisting of two or more nodes, in production environment, the presence of two TOR switches is strongly recommended, so that we can tolerate communication disruptions regarding north-south traffic, in case of failure or maintenance of the single physical switch.

If the Azure Stack HCI cluster is made up of two nodes, you can avoid providing a switch connectivity for storage traffic.

Two-node configuration without TOR switch for storage communication

In an Azure Stack HCI cluster that consists of only two nodes, to reduce switch costs, perhaps going to use switches already in possession, storage RDMA NICs can be connected in full-mesh mode.

In certain scenarios, which include for example branch office, or laboratories, the following network model can be adopted which provides for a single TOR switch. By applying this pattern, you get cluster-wide fault tolerance, and is suitable if interruptions in north-south connectivity can be tolerated when the single physical switch fails or requires maintenance.

Figure 2 – Network architecture for an Azure Stack HCI cluster consisting of two servers, without storage switches and with a single TOR switch

Although the SDN services L3 are fully supported for this scheme, routing services such as BGP will need to be configured on the firewall device that sits on top of the TOR switch, if this does not support L3 services.

If you want to obtain greater fault tolerance for all network components, the following architecture can be provided, which provides two redundant TOR switches:

Figure 3 – Network architecture for an Azure Stack HCI cluster consisting of two servers, without storage switches and redundant TOR switches

The SDN services L3 are fully supported by this scheme. Routing services such as BGP can be configured directly on TOR switches if they support L3 services. Features related to network security do not require additional configuration for the firewall device, since they are implemented at the virtual network adapter level.

At the physical level, it is recommended to provide the following network components for each server:

  • two-four teamed network ports, to handle management and computational traffic, connected to the TOR switches;
  • two RDMA NICs in a full-mesh configuration for east-west traffic for storage. Each cluster node must have a redundant connection to the other cluster node;
  • as optional, a BMC card for remote management of the environment.

In both cases the following connectivities are required:

Networks Management and computational Storage BMC
Network speed At least 1 GBps,

10 GBps recommended

At least 10 GBps Tbd
Type of interface RJ45, SFP+ or SFP28 SFP+ or SFP28 RJ45
Ports and aggregation Twofour ports in teaming Two standalone ports One port

Two or more node configuration using TOR switches also for storage communication

When you expect an Azure Stack HCI cluster composed of more than two nodes or if you don't want to preclude the possibility of being able to easily add more nodes to the cluster, it is also necessary to merge the traffic concerning the storage from the TOR switches. In these scenarios, a configuration can be envisaged where dedicated network cards are maintained for storage traffic (non-converged), as shown in the following picture:

Figure 4 – Network architecture for an Azure Stack HCI cluster consisting of two or more servers, redundant TOR switches also used for storage traffic and non-converged configuration

At the physical level, it is recommended to provide the following network components for each server:

  • two teamed NICs to handle management and computational traffic. Each NIC is connected to a different TOR switch;
  • two RDMA NICs in standalone configuration. Each NIC is connected to a different TOR switch. SMB multi-channel functionality ensures path aggregation and fault tolerance;
  • as optional, a BMC card for remote management of the environment.

These are the connections provided:

Networks Management and computational Storage BMC
Network speed At least 1 GBps,

10 GBps recommended

At least 10 GBps Tbd
Type of interface RJ45, SFP+ or SFP28 SFP+ or SFP28 RJ45
Ports and aggregation Two ports in teaming Two standalone ports One port

Another possibility to consider is a "fully-converged" configuration of the network cards, as shown in the following image:

Figure 5 – Network architecture for an Azure Stack HCI cluster consisting of two or more servers, redundant TOR switches also used for storage traffic and fully-converged configuration

The latter solution is preferable when:

  • bandwidth requirements for north-south traffic do not require dedicated cards;
  • the physical ports of the switches are a small number;
  • you want to keep the costs of the solution low.

At the physical level, it is recommended to provide the following network components for each server:

  • two teamed RDMA NICs for traffic management, computational and storage. Each NIC is connected to a different TOR switch. SMB multi-channel functionality ensures path aggregation and fault tolerance;
  • as optional, a BMC card for remote management of the environment.

These are the connections provided:

Networks Management, computational and storage BMC
Network speed At least 10 GBps Tbd
Type of interface SFP+ or SFP28 RJ45
Ports and aggregation Two ports in teaming One port

SDN L3 services are fully supported by both of the above models. Routing services such as BGP can be configured directly on TOR switches if they support L3 services. Features related to network security do not require additional configuration for the firewall device, since they are implemented at the virtual network adapter level.

Type of traffic that must pass through the TOR switches

To choose the most suitable TOR switches it is necessary to evaluate the network traffic that will flow from these network devices, which can be divided into:

  • management traffic;
  • computational traffic (generated by the workloads hosted by the cluster), which can be divided into two categories:
    • standard traffic;
    • SDN traffic;
  • storage traffic.

Microsoft has recently changed its approach to this. In fact,, TOR switches are no longer required to meet every network requirement regarding various features, regardless of the type of traffic for which the switch is used. This allows you to have physical switches supported according to the type of traffic they carry and allows you to choose from a greater number of network devices at a lower cost, but always of quality.

In this document lists the required industry standards for specific network switch roles used in Azure Stack HCI implementations. These standards help ensure reliable communication between nodes in Azure Stack HCI clusters. In this section instead, the switch models supported by the various vendors are shown, based on the type of traffic expected.

Conclusions

Properly configuring Azure Stack HCI networking is critical to ensuring that hyper-converged infrastructure runs smoothly, ensuring security, optimum performance and reliability. This article covered the basics of configuring Azure Stack HCI networking, analyzing the available network options. The advice is to always carefully plan the networking aspects of Azure Stack HCI, choosing the most appropriate network option for your business needs and following implementation best practices.

Please follow and like us: