Building modern IT architectures for Machine Learning

For most companies, the ability to continuously provide and integrate artificial intelligence solutions within their own applications and business workflows, is considered a particularly complex evolution. In the rapidly evolving artificial intelligence landscape, machine learning (ML) plays a fundamental role together with "data science". Therefore, to increase the successes of certain artificial intelligence projects, organizations must have modern and efficient IT architectures for machine learning. This article describes how these architectures can be built anywhere thanks to the integration between Kubernetes, Azure Arc ed Azure Machine Learning.

Azure Machine Learning

Azure Machine Learning (AzureML) is a cloud service that you can use to accelerate and manage the life cycle of machine learning projects, bringing ML models into a secure and reliable production environment.

Kubernetes as a compute target for Azure Machine Learning

Azure Machine Learning recently introduced the ability to activate a new target for computing: AzureML Kubernetes compute. In fact,, it is possible to use an Azure Kubernetes Service cluster (AKS) existing or an Azure Arc-enabled Kubernetes cluster as a compute target for Azure Machine Learning and use it to validate and deploy ML models.

Figure 1 - Overview on how to take Azure ML anywhere thanks to K8s and Azure Arc

AzureML Kubernetes compute supports two types of Kubernetes clusters:

  • Cluster AKS (in Azure environment). Using an Azure Kubernetes Service managed cluster (AKS), you can get a flexible environment, secure and capable of meeting compliance requirements for ML workloads.
  • Arc-enabled Kubernetes Cluster (in environments other than Azure). Thanks to Azure Arc-enabled Kubernetes it is possible to manage Kubernetes running in different environments from Azure clusters (on-premises or on other clouds) and use them to deploy ML models.

To enable and use a Kubernetes cluster to run AzureML workloads you need to follow the following steps:

  1. Activate and configure an AKS cluster or an Arc-enabled Kubernetes cluster. In this regard it is also recalled the possibility of activate AKS in Azure Stack HCI environment.
  2. Distribute the extension AzureML on the cluster.
  3. Connect the Kubernetes cluster to the Azure ML workspace.
  4. Use the Kubernetes compute target from CLI v2, SDK v2 and the Studio UI.

Figure 2 - Step to enable and use a K8s cluster for AzureML workloads

Infrastructure management for ML workloads can be complex and Microsoft recommends that it be done by the IT-operations team, so that the data science team can focus on the efficiency of the ML models. In light of this consideration, the division of roles can be as follows:

  • The IT-operation Team is responsible for the former 3 steps above. Furthermore, typically performs the following activities for the data science team:
    • Make configurations of aspects related to networking and security
    • Create and manage instance types for different ML workload scenarios in order to achieve efficient use of compute resources.
    • It deals with troubleshooting the workload of Kubernetes clusters.
  • The Data science Team, completed the activation activities in charge of IT-operation Team , can locate a list of compute targets and instance types available in the AzureML workspace. These compute resources can be used for training or inference workloads. The compute target is chosen by the team using specific tools such as AzureML CLI v2, Python SDK v2 or Studio UI.

Usage scenarios

The ability to use Kubernetes as a compute target for Azure Machine Learning, combined with the potential of Azure Arc, allows you to create, train and deploy ML models in any on-premises infrastructure or on different clouds.

This possibility activates different new usage scenarios, previously unthinkable using only the cloud environment. The following table provides a summary of the use scenarios made possible by Azure ML Kubernetes compute, specifying where the data resides, the motivation that drives each usage model and how it is implemented at the infrastructure and Azure ML level.

Table 1 - New usage scenarios made possible by Azure ML Kubernetes compute

Conclusions

Gartner expects that by 2025, due to the rapid spread of AI initiatives, the 70% of organizations will have operationalized IT architectures for artificial intelligence. Microsoft, thanks to the integration between different solutions, offers a series of possibilities to activate flexible and cutting-edge architectures for Machine Learning, an integral part of artificial intelligence.

Please follow and like us: