Category Archives: Cloud & Datacenter Management (2024-2025)

AI from Cloud to Edge: Innovation Powered by Azure Local and Azure Arc

In the era of Artificial Intelligence, which is significantly transforming business models, the adoption of local and distributed infrastructures is crucial for managing specific and mission-critical workloads. In this context, Azure Local emerges as an innovative solution capable of bridging the gap between cloud and edge computing, delivering applications, data, and AI services exactly where they are needed. This article will explore real-world scenarios where Azure Local, combined with Azure Arc, enables real-time data processing “at the source” and the deployment of advanced AI solutions. We will also delve into the new Azure AI services designed for Azure Local, focusing on maximizing the potential of on-premises data.

Real-World Scenarios of Local and Distributed Infrastructure with Azure Local

In the following sections, we will examine concrete examples that demonstrate how Azure Local, in synergy with Azure Arc, effectively addresses the needs of distributed infrastructure, ensuring low latency, security, and operational continuity across various business and industrial contexts.

Figure 1 – Real-World Scenarios for Local and Distributed Infrastructure with Azure Local

Local AI Inferencing

In many situations, analyzing data in real-time directly at the edge (e.g., within a retail store or an industrial facility) provides significant advantages in terms of latency and reduced bandwidth usage. Azure Local enables on-site data processing, eliminating the need to transfer all data to the cloud before performing critical analyses. Here are some examples:

  • Retail Loss Prevention: With AI integrated locally, suspicious behaviors and potential thefts can be identified in real-time, allowing retailers to act immediately and reduce losses.
  • Smart Self-Checkout: Video surveillance and visual analysis facilitate automatic item recognition, improving customer experience and reducing wait times.
  • Pipeline Monitoring: In sectors like oil & gas, real-time video monitoring of infrastructure helps detect anomalies and leaks, reducing environmental risks and ensuring timely interventions.

Operational Continuity in Mission-Critical Environments

The ability to ensure business continuity during network or power outages is a crucial aspect. With Azure Local, robust systems can be implemented to preserve operations even when cloud connectivity is limited or unavailable. Examples include:

  • Factory and Warehouse Operations: Production and inventory management cannot stop; having a local solution ensures that production lines and management systems continue functioning despite network disruptions.
  • Stadiums and Event Venues: Services like security, ticketing, and lighting must remain operational to safeguard both spectator experience and safety.
  • Transport Hubs: Constant operation of ticketing systems, scheduling, and communications is essential for passenger flow and safety in large transit hubs.

Control Systems and Near Real-Time Processing

Some industrial, financial, and healthcare environments demand extremely low response times to avoid errors, ensure safety, or maximize performance. Azure Local, combined with Azure Arc, can meet these latency requirements:

  • Manufacturing Execution Systems (MES): Continuous synchronization and monitoring of production machinery optimize processes and minimize downtime.
  • Industrial Quality Assurance (QA): Immediate quality checks and verifications identify defects before they reach the final stage of production, increasing compliance and reducing waste.
  • Financial Infrastructures: Low-latency transaction processing and rapid risk assessment are critical for market competitiveness and stability.

Regulatory Compliance and DDIL Connectivity (Disconnected, Degraded, Intermittent, Limited)

For many organizations (governmental, military, or those operating critical infrastructures), data protection and secure management, even in the absence of reliable connectivity, are top priorities. Azure Local supports the need for on-premises data and control:

  • Government and Military Sectors: Data confidentiality is paramount, requiring local management to ensure continuous access even in compromised network scenarios.
  • Energy Infrastructures: The stability of distribution networks and control of pipelines and refineries require resilience under limited connectivity conditions, while adhering to stringent regulations.

Azure’s Adaptive Cloud Approach

Microsoft’s adaptive cloud approach, enabled by Azure Arc, helps organizations unify hybrid, multicloud, and edge infrastructures within Azure. With Azure Arc, the same cloud-native experiences and capabilities—such as security, updates, management, and scalability—can be extended anywhere, from on-premises data centers to distributed locations.

Figure 2 – Adaptive Cloud Approach

Azure Local, connected to the cloud through Azure Arc, enables:

  • Operating and scaling distributed infrastructure via the Azure portal and the same APIs.
  • Running fundamental compute, network, storage, and application services locally, choosing hardware from the preferred vendor.
  • Strengthening the security of apps and data with Azure technologies, protecting them against advanced threats.

A key feature is the presence of Azure Kubernetes Service (AKS), Microsoft’s managed Kubernetes solution. On Azure Local, AKS can be configured and updated automatically, providing everything needed (storage drivers, container images for Linux and Windows, etc.) to support containerized applications. Moreover, each cluster is automatically enabled with Azure Arc, allowing integration with services like Microsoft Defender for Containers, Azure Monitor, and GitOps for continuous delivery.

Figure 3 – Bring Azure Apps, Data, and AI Anywhere

New Azure AI Services with Azure Local and Azure Arc

On-Premises Data Search with Generative AI

In recent years, generative AI has made significant strides, driven by the introduction of language models (like GPT) capable of interpreting and generating natural language text. Public tools like ChatGPT work well for general knowledge queries but cannot address questions about private business data on which they have not been trained. To bridge this gap, the concept of Retrieval Augmented Generation (RAG) was introduced, a technique that “enhances” language models with proprietary data, enabling more advanced and customized use cases.

Within the Azure Local framework, Microsoft has announced a new service that brings generative AI and RAG directly to the edge, where the data resides. Within minutes, organizations can deploy (via an Azure Arc extension) everything needed to query their on-premises data, including:

  • Small and large language models (SLM/LLM) running locally, with support for both CPUs and GPUs.
  • An end-to-end data ingestion and RAG pipeline that keeps all information on-premises, with RBAC (Role-Based Access Control) ensuring secure access.
  • An integrated tool for prompt engineering and result evaluation to optimize model settings and performance.
  • APIs and interfaces aligned with Azure standards, facilitating integration into enterprise applications, plus a preconfigured UI for immediate service use.

This feature is now available in private preview for Azure Local customers, with Microsoft planning to expand availability to other Arc-enabled platforms in the near future.

“Edge RAG”: The Local Retrieval-Augmented Generation Ecosystem

This new service, known as “Edge RAG”, integrates seamlessly into the Azure ecosystem and supports various input components, such as:

  • Azure AI Search: Provides document search and indexing functionality, enabling quick identification of relevant content within large datasets.
  • Azure OpenAI: Offers advanced AI models (like GPT) capable of generating, understanding, and summarizing text in natural language.
  • Azure AI Studio: A platform for developing and managing AI assets (datasets, models, pipelines) centrally.

Together, these components power an integrated flow—from data ingestion to inference and result presentation via chat or other development interfaces. This enables the creation of chatbots, knowledge discovery tools, and other AI-driven solutions that leverage internal business data in a secure, customizable, and compliant environment.

Deploying Open-Source AI Models via Azure Arc

Another key feature of Azure AI is the availability of a catalog of AI models tested, validated, and guaranteed by Microsoft. These models are ready for deployment and provide consistent inference endpoints. This functionality is now extended to the edge, where Microsoft makes selected models available directly from the Azure portal:

  • Phi-3.5 Mini (language model with 3.8 billion parameters)
  • Mistral 7B (language model with 7.3 billion parameters)
  • MMDetection YOLO (object detection)
  • OpenAI Whisper Large (speech-to-text recognition)
  • Google T5 Base (automatic translation)

These models can be deployed in just a few steps on an Arc AKS cluster running on-premises. Most models require only a CPU, but Phi-3.5 and Mistral 7B also support GPUs for enhanced performance in intensive inference scenarios.

Azure AI Offerings: From Cloud to Edge

Microsoft’s approach spans the full spectrum of AI capabilities, offering services and tools that can be delivered in the Azure cloud or extended to on-premises and edge environments via Azure Arc. The offering consists of four main pillars:

  • Application Development
    • Azure AI Studio: A development environment for AI applications (e.g., chatbots, virtual agents) with a complete set of APIs and interfaces for seamless AI integration.
  • AI Services
    • Azure AI Language and Model Services: Preconfigured services for NLP, computer vision, and other AI functionalities.
    • Solutions like Edge RAG, Video Indexer, and Managed AI Containers for local deployment of “ready-to-use” AI models.
  • Machine Learning & ML Ops
    • Azure Machine Learning Studio: A comprehensive platform for creating, training, optimizing, and managing machine learning models.
    • With Azure Arc, ML Ops capabilities can extend to the edge via extensions like the AML Arc Extension, enabling Azure ML tools on on-premises and edge infrastructures.
  • Infrastructure
    • Azure Global Infrastructure: Azure’s cloud foundation, including compute, storage, and networking resources.
    • Arc-Enabled Edge Infrastructure: Extends Azure capabilities to data centers or edge devices, managed as if they were cloud resources.

Conclusion

Microsoft’s strategy is built on delivering the best of the cloud “anywhere.” Azure Local epitomizes this vision: a solution that brings all the benefits of the cloud—agility, scalability, security—directly to local environments, meeting the needs for low latency, operational continuity, and regulatory compliance.

Thanks to Azure Arc, organizations can leverage Azure AI services such as advanced language models, Retrieval-Augmented Generation (RAG) pipelines, and ML Ops tools in a hybrid mode. Applications range from factory quality control to retail theft prevention, from critical government data centers to energy infrastructure monitoring.

In a world where data continues to grow exponentially and the need for on-site analysis becomes increasingly urgent, solutions like Azure Local represent the next step toward a new generation of distributed infrastructures. This is how Microsoft meets the challenge of uniting cloud potential with on-premises reality, creating opportunities for innovation and growth across all sectors.

The Evolution of High Availability and Disaster Recovery in Modern Infrastructures: The Azure Local Case

High availability and disaster recovery solutions are playing an increasingly central role in modern infrastructure adoption strategies. Azure Local, Microsoft’s on-premises cloud-connected platform, exemplifies this transformation.

Starting with version 23H2, Azure Local introduces a new generation of features, moving away from the traditional Stretched Cluster model to propose more modern and flexible approaches designed to optimize resilience and simplify management. Through new configurations such as Rack Aware Cluster and disaster recovery support via Azure Site Recovery, Azure Local positions itself as a strategic platform for organizations seeking robust, scalable solutions aligned with the Azure ecosystem. In this article, we will explore the key features introduced in Azure Local version 23H2, analyzing the new high-availability options, disaster recovery strategies, and the impact of transitioning from Stretched Clusters to a more advanced model.

Azure Local, Version 23H2: An Arc-Enabled Evolution

The new version 23H2 marks a significant leap forward, transforming from a simple cloud-connected operating system to an Azure Arc-enabled solution with integrated features such as Arc Resource Bridge, Arc VM, and AKS. This transformation expands the possibilities for managing and controlling distributed environments, providing a unified administrative experience. Moreover, multi-site management extends beyond the operating system level, rendering the functionality of previous Stretched Clusters obsolete and introducing new paradigms of resilience and reliability.

High Availability Options

Rack Aware Cluster: High Availability for Short Distances

The standout feature for short-distance scenarios is the Rack Aware Cluster, a configuration that enables:

  • Deploying the cluster across two racks or rooms within the same Layer-2 network (e.g., within a manufacturing plant or campus).
  • Functioning as a local availability zone, ensuring fault isolation and optimal workload placement.

Figures 1 – Rack Aware Cluster: Network Architecture

This configuration offers an ideal solution for combining efficiency and ease of management in local environments. By leveraging a single storage pool, it reduces complexity and enhances overall efficiency, avoiding the overhead caused by excessive data replication. The Rack Aware Cluster is particularly suited for edge locations and can scale up to 8 nodes (4 per rack). Currently in private preview, public availability is expected by 2025.

Notably, even within Azure Local, the concept of availability zones has been introduced, aligning significantly with the established Azure model to ensure maximum reliability and operational continuity.

Disaster Recovery Options

Cloud Replication with Azure Site Recovery

For long-distance disaster recovery scenarios, Azure Local leverages Azure Site Recovery (ASR) to replicate on-premises virtual machines to the Azure cloud. This solution enables:

  • Replication: Transferring VM disks to an Azure storage account, safeguarding data from potential disasters.
  • Failover: Running replicated VMs directly in Azure during a disaster, ensuring operational continuity.
  • Re-protect: Replicating VMs back to the local cluster, maintaining a continuous protection cycle.
  • Failback: Bringing workloads back from the cloud to the on-premises system with minimal disruption.

These operations are managed centrally through the Azure portal, ensuring simplicity and efficiency for system administrators.

Local Replication with Hyper-V Replica

For workloads that cannot be moved to the cloud, Azure Local supports Hyper-V Replica, a solution that replicates Arc VMs to a secondary site. This approach allows:

  • Ensuring operational continuity by replicating data to a remote location.
  • Managing VM recovery as Hyper-V virtual machines at the secondary site and reverting to Arc VMs upon restoration on the primary cluster.

This feature, integrated into the Hyper-V role, represents an essential option for resilience in multi-site scenarios.

The Transition from Stretched Clusters

Introduced with Azure Local version 22H2, Stretched Clusters utilized Storage Replica to ensure resilience between two node groups located in distinct sites. This configuration:

  • Required at least two nodes per site and replicated storage synchronously to ensure data integrity in the event of failures.
  • Supported live migration of VMs between sites, facilitating smooth transitions for planned maintenance.

However, this solution required manual operations to reverse the direction of storage replication, a process that could introduce complexity and impact performance. With version 23H2, Stretched Clusters are no longer supported. Clusters configured with version 22H2 can still be partially upgraded to the 23H2 operating system, maintaining compatibility but without benefiting from the new features of the latest version.

For customers still using this configuration, it is advisable to consider adopting the new high availability and disaster recovery options offered by Azure Local, which guarantee greater efficiency and reliability.

Conclusions

The new features in Azure Local version 23H2 reflect a significant evolution toward more flexible, modern management aligned with the Azure ecosystem. With solutions like Rack Aware Cluster and integration with Azure Site Recovery, organizations can enhance the resilience of their local environments and ensure scalable and integrated disaster recovery options. Furthermore, abandoning the Stretched Cluster model paves the way for more efficient and streamlined configurations, enabling customers to fully leverage the potential offered by Azure technologies.

Ladies and Gentlemen, Welcome Azure Local!

Microsoft Ignite 2024 brought several exciting announcements, but one of the most significant was undoubtedly Azure Local. This is not merely a rebranding of Azure Stack HCI; it is a platform that redefines how we think about hybrid and on-premises infrastructures. Azure Local is designed to bring the essence of the cloud directly to local datacenters, offering a rich experience highly integrated with Azure services. With a suite of innovative features and a flexible approach, Azure Local promises to redefine the future of local infrastructures. Below, we explore all the updates on this solution.

A Name that Reflects a Vision

The name Azure Local is straightforward and on point. It represents the idea of having core Azure services—compute, networking, storage, and applications—available directly in local datacenters. This vision materializes through a cloud-connected platform that offers flexibility, scalability, and operational control.

Hardware: Choice, Flexibility, and New Opportunities

One of the most intriguing features of Azure Local is its wide range of supported hardware. With over 100 validated platforms, including major vendors like Dell and Lenovo, businesses can select solutions that best meet their needs and budget. Compatibility with GPUs like Nvidia A2, A16, and L40 makes Azure Local ideal for advanced workloads like artificial intelligence and virtual desktops.

Cost-Effective Options for the Edge

For environments with lighter compute requirements or tighter budgets, Azure Local supports micro, tower, and rugged hardware. This is a great opportunity for companies operating in edge or industrial environments. The minimum requirements include a compatible machine with an additional SSD and a 1 Gbps Ethernet network, eliminating the need for expensive switches. These options open new possibilities for deployments in remote or hard-to-reach locations, ensuring performance and consistency even in challenging operating conditions.

Simplified Provisioning

Thanks to the FIDO Device Onboard (FDO) protocol, onboarding machines is automated, greatly simplifying the activation of new edge nodes or IoT devices. This approach eliminates the need for complex manual interventions, making infrastructure deployment faster and more efficient.

Identity Management: With or Without Active Directory

Azure Local introduces long-awaited flexibility in identity management. If you don’t want to use on-premises Active Directory, the new “Local Identity” feature is available. This solution uses local accounts and certificates while retaining advanced functionalities like live VM migration. Additionally, local secrets are safeguarded with Azure Key Vault, ensuring high security levels even without external identity systems.

Centralized Management and Monitoring

One of Azure Local’s key strengths is its integration with Azure Arc, which extends Azure services to on-premises and other cloud environments. Infrastructure management happens directly from the Azure portal, where you can configure clusters, networking, and storage. For those seeking operational consistency, Azure Local allows configurations to be defined using ARM (Azure Resource Manager) templates, ensuring scalable and repeatable management. Furthermore, the Infrastructure-as-Code approach simplifies deployment in distributed environments, ensuring consistency and reducing errors.

Simplified Updates

Azure Local software updates come in a single monthly package, including drivers, firmware, and software stacks. This method enables sequential updates of physical machines while ensuring workload continuity. The ability to automatically orchestrate updates in multi-node environments is a significant advantage for organizations needing to minimize downtime.

Integrated Monitoring

Azure Local integrates natively with Azure Monitor, providing a unified view of all distributed resources. With over 50 standard metrics, preconfigured dashboards, and alert rules, businesses can monitor CPU, memory, storage, and network usage, setting up email notifications or automated actions in case of failures. Furthermore, data collection rules can be customized, and advanced dashboards can be created via Workbooks.

Figure 2 – Centralized visibility across all your locations

New Features and Services

Azure Local doesn’t stop at enhancing infrastructure—it also introduces new features and services that expand its usability.

Figure 3 – Azure Apps, Data, and AI in Azure Local

Migration from VMware

For organizations looking to move away from VMware, Azure Local offers a migration solution (in preview) via Azure Migrate. This tool enables the transfer of VMDKs to Azure Local, eliminating dependence on Broadcom and its associated costs. The migration process uses the same portal and APIs as Azure, ensuring a seamless experience for those already familiar with Azure tools.

Figure 4 – Migrating from VMware to Azure Local

PaaS and AI Services

Azure Local enables the use of Azure PaaS services like Azure Virtual Desktop and SQL Managed Instance. Additionally, the new Azure IoT Operations service provides a unified platform for edge data collection and analysis. For companies interested in AI, Azure Local introduces local AI search capabilities (preview) that leverage advanced language models to analyze on-premises data. This innovation opens new opportunities for process automation and data valorization.

Figure 5 – Azure AI Services with Azure Local

Disconnected Operations

For customers who cannot connect to the cloud due to regulatory or other reasons, Azure Local offers a disconnected option (in preview). In this configuration, Azure services, including the portal and Azure Resource Manager, are hosted locally, ensuring a consistent experience even without connectivity.

Figure 6 – Disconnected operations

Advanced Security

Security is a cornerstone of Azure Local, with new features enhancing resource protection.

Network Security Groups (NSG)

This functionality allows granular access rules between resources, filtering traffic based on parameters like source IP, port, and protocol. NSGs offer precise control over network traffic, reducing the risk of unauthorized access.

Figure 7 – Network Security Group in Azure Local

Trusted Launch

Azure Local introduces Trusted Launch, which protects VMs from rootkits and bootkits through Secure Boot and BitLocker encryption. This feature also ensures secure VM migration within the cluster, preserving data integrity and enhancing infrastructure resilience. Azure’s attestation services will also provide continuous system integrity monitoring, offering advanced security and visibility.

Getting Started

Existing Customers

Existing Azure Stack HCI customers need to do nothing—software updates will ensure a smooth transition to Azure Local, granting immediate access to new features.

New Installations

Azure Local is available in version 2411 for new deployments.

Virtual Sandbox

For those wanting to try Azure Local without dedicated hardware, Azure Arc Jumpstart offers a virtual sandbox environment, accessible via an Azure subscription. This option is ideal for testing features before deploying in production environments.

Conclusion

Microsoft Ignite 2024 highlighted a significant milestone in the hybrid infrastructure landscape with Azure Local. It’s not just an evolution of Azure Stack HCI but a platform that redefines how businesses leverage the cloud in their datacenters. With a focus on flexibility, integration, and security, Azure Local combines the best of the on-premises and cloud worlds, enabling organizations to adopt a truly connected and coherent hybrid strategy.

Its distinctive features, such as simplified provisioning, centralized management with Azure Arc, and support for disconnected scenarios, make it an ideal solution for addressing complex business needs.

Moreover, its attention to specific workloads like AI and virtual desktops, along with advanced security features like Trusted Launch and NSGs, strengthens Azure Local’s ability to adapt to diverse operational contexts.

Azure Local represents a significant step toward the future of hybrid infrastructures, delivering a seamless cloud experience directly to local datacenters. For both existing and new customers, this solution marks the beginning of a new era in IT resource management, bringing the cloud closer to business needs.

WSUS Retirement: Impact and Strategies for Server System Updates

With Microsoft’s announcement regarding the retirement of Windows Server Update Services (WSUS), a new chapter begins in managing updates for server systems. After almost 20 years since its release, WSUS will no longer be actively developed, creating uncertainty among IT administrators who have relied on this tool to distribute updates in enterprise environments. In this article, we will analyze the impact of this decision and possible migration strategies, with a particular focus on server systems.

What is WSUS and What Does Its Retirement Mean?

Windows Server Update Services (WSUS) has been the go-to tool for managing and distributing Microsoft product updates within enterprise networks for years. IT administrators can approve, schedule, and control the distribution of updates, deciding which devices receive them. WSUS also offers automation capabilities via PowerShell and integrates with Group Policy, making centralized management easier.

With the retirement announcement, Microsoft specified that WSUS will not be removed immediately, but it will no longer receive future developments or enhancements. The current functionality will be maintained, and Microsoft will continue to release updates through WSUS, but no new features will be introduced.

Implications for IT Administrators and Migration Strategies

The announcement has sparked doubts among IT administrators, especially regarding the continuity of support and the need to find alternative solutions. While WSUS will continue to be available in in-market versions of Windows Server, including the upcoming Windows Server 2025, it is crucial for administrators to start planning a transition to new solutions.

One important aspect to consider regarding the retirement of WSUS is its impact on Microsoft Configuration Manager. Although WSUS is being gradually retired, its deprecation will not directly impact the existing functionalities of Microsoft Configuration Manager, which will continue to use WSUS as a mechanism for managing and distributing updates. In other words, Configuration Manager will remain a viable option for organizations that rely on it to manage updates, with WSUS still serving as the distribution channel.

However, it is essential to note that even though WSUS will still be usable within Configuration Manager, Microsoft recommends planning a transition to cloud-based solutions such as Azure Update Manager to leverage new capabilities and improve the efficiency of update management in the long term. Migrating to the cloud is not only a natural evolution but also an opportunity to ensure more flexible and efficient server update management in line with modern business needs. This shift reflects the move towards a more cloud-oriented update management model, consistent with Microsoft’s strategy of simplifying Windows management through cloud-based solutions.

Azure Update Manager: A Worthy Replacement, But…

Azure Update Manager is a service that helps manage and govern updates for all machines, whether in Azure, on-premises, or on other cloud platforms connected via Azure Arc. From a single management console, it is possible to monitor update compliance for Windows and Linux servers, apply updates in real-time, or schedule them in defined maintenance windows.

With Azure Update Manager, you can:

  • Control and distribute security or critical updates to protect machines.
  • Enable periodic assessments to check for updates.
  • Use flexible patching options, such as scheduling updates in custom time windows.
  • Monitor update compliance for all machines, including hybrid or other cloud environments connected via Azure Arc.

Azure Update Manager offers several advantages, but there are some aspects to consider carefully.

Azure Update Manager respects the update source already configured on the machine, whether it is Windows Update for OS updates, Microsoft Update for product updates, or WSUS for a combination of both. In this context, WSUS can still be used in parallel with Azure Update Manager to provide additional capabilities, such as storing or caching updates locally.

The critical point concerns organizations with a large number of on-premises servers, where managing updates exclusively through Azure Update Manager requires further evaluation. The main concern is related to the bandwidth needed to download updates directly from the Internet to each server, which could saturate the network. Additionally, the micro-segmentation typical of server security policies makes it difficult to use peer-to-peer technologies such as Delivery Optimization.

Currently, if you want to explore a long-term strategy for enterprise companies and avoid this pain point, it’s necessary to evaluate solutions like Microsoft Connected Cache or explore options from other vendors.

Another relevant aspect is the cost associated with Azure Update Manager for servers managed through Azure Arc. While the service is free for systems residing in Azure, servers enabled for Azure Arc are subject to a cost of around €4.48 per server per month. However, there are cases where there are no charges for Azure Update Manager when the servers are:

  • Enabled for Extended Security Updates (ESU).
  • Managed through Defender for Servers Plan 2.
  • Hosted on Azure Stack HCI, when these machines are enabled for Azure benefits and managed via Azure Arc.

Conclusion

The retirement of WSUS will bring significant changes in the long term for IT administrators, especially in large environments with a high number of servers. While WSUS will continue to be available, companies should start considering long-term strategies to ensure efficient and secure update management. Azure Update Manager is a viable alternative but requires careful analysis of the economic and operational implications of this change.

For those interested in a more comprehensive approach in terms of security and centralized management, combining Azure Update Manager with Defender for Cloud (Plan 2) offers an interesting solution. This combination not only allows for update management but also provides advanced features for server system protection, ensuring a higher level of security.

In conclusion, although WSUS will remain available for a few more years, Microsoft’s direction is clear: the future of update management is moving towards the cloud, and organizations must prepare to face this transition in a strategic and proactive manner.

Proactive Cloud Protection: Experiences and Strategies for Cloud Security

With the growing adoption of cloud platforms, organizations face new security challenges that require a structured and proactive approach. Field experience has shown how critical it is to implement effective Cloud Security Posture Management (CSPM) solutions to continuously monitor and protect cloud infrastructures. These tools enable the detection and resolution of risks before they can evolve into critical threats. In this article, I will share practical advice for tackling these challenges, exploring the importance of CSPM, key risks to consider, and how Microsoft Defender for Cloud (MDfC) stands out as a comprehensive solution for managing cloud security. Additionally, we will review the essential steps for effectively implementing a CSPM solution and best practices to maximize security.

Understanding CSPM and Its Importance

Cloud Security Posture Management (CSPM) refers to a suite of tools and practices that continuously monitor and protect cloud infrastructures. Through direct experience with various projects, I have observed how organizations increasingly rely on cloud platforms, often exposing themselves to misconfigurations, compliance violations, and vulnerabilities. CSPM acts as a continuous supervisor, detecting and mitigating risks before they become critical threats, providing constant oversight over cloud environments.

The main risks that a CSPM solution helps to address include:

  • Data Breaches: Misconfigurations can inadvertently expose sensitive data, making it vulnerable to external threats.
  • Compliance Violations: Non-compliance with regulations can result in legal penalties and financial losses.
  • Reputational Damage: A security breach can undermine customer trust, negatively impacting the company’s reputation.

Microsoft Defender for Cloud: A Comprehensive CSPM Solution

Microsoft Defender for Cloud (MDfC) is an advanced Cloud Security Posture Management (CSPM) solution that excels in protecting heterogeneous cloud environments. Working directly on various projects, I have seen how MDfC, operating as a Cloud Native Application Protection Platform (CNAPP), offers comprehensive protection throughout the application lifecycle, from development to deployment. Its scalability allows it to adapt to the evolving needs of organizations, supporting platforms like Azure, AWS, and GCP.

Figures 1 – Microsoft Cloud-Native Application Protection Platform (CNAPP)

MDfC stands out by managing various security areas in addition to CSPM:

  • Cloud Workload Protection Platform (CWPP): This feature provides real-time threat detection and response for virtual machines, containers, Kubernetes, databases, and more, helping to reduce the attack surface.
  • Multi-Pipeline DevOps Security: It offers a centralized console to manage security across all DevOps pipelines, preventing misconfigurations and ensuring vulnerabilities are detected early in the development process.
  • Cloud Infrastructure Entitlement Management (DIES): It centralizes the management of permissions across cloud and hybrid infrastructures, preventing the misuse of privileges.

Additionally, Cloud Security Network Services (CSNS) solutions integrate with CWPP to protect cloud infrastructure in real-time. A CSNS solution may include a wide range of security tools, such as distributed denial-of-service (DDoS) protection and web application firewalls.

Implementing CSPM: Planning and Strategies

To implement a CSPM solution effectively, a detailed plan is essential to ensure alignment with business needs. Here are some practical suggestions:

  1. Assess Security Objectives: Organizations should start by evaluating their cloud environments, identifying critical resources, and understanding their exposure to risks. This requires a thorough analysis of the IT security landscape, including identifying any gaps in infrastructure and compliance requirements.
  2. Define Security Requirements: Once the cloud environment is understood, the next step is to establish security policies that protect high-value workloads and sensitive data. It’s crucial to outline risk management strategies that include preventive measures, such as audits and vulnerability scans, as well as reactive measures like breach response plans.
  3. Select the Appropriate CSPM Solution: MDfC offers various levels of CSPM services. Organizations can start with basic functionalities, such as compliance controls and vulnerability assessments, and then evolve toward advanced capabilities, including in-depth security analysis, threat management, and governance tools.

Figures 2 – CSPM Plans (Foundational VS Defender CSPM)

Turning Strategy into Action

Once the planning phase is complete, it’s time to operationalize CSPM, translating strategic security objectives into concrete actions integrated into daily operations. Based on my experience, the key steps include:

  • Defining Roles and Responsibilities: Clearly assigning roles to team members is critical to ensuring accountability and effective management of CSPM tools. For example, security architects can focus on the overall strategy, while IT administrators handle the configuration and daily management of CSPM tools.
  • Establishing Solid Processes: Implementing workflows for regular security assessments, managing compliance, and resolving issues is crucial. Automation plays a key role at this stage, simplifying operations and reducing the risk of human error.
  • Continuous Monitoring and Improvement: Effective use of CSPM requires ongoing monitoring to identify new vulnerabilities and threats. Real-time monitoring tools, such as those provided by Defender for Cloud, enable organizations to respond swiftly to security incidents, ensuring a high level of protection.

Best Practices for Maximizing CSPM Effectiveness

To get the most out of CSPM, organizations should follow some best practices that I have found to be particularly effective:

  • Align with Industry Standards: Ensure that CSPM implementation complies with industry standards and best practices, such as the CIS Benchmarks and the NIST Cybersecurity Framework. This ensures that the security measures adopted meet the required levels of protection and compliance.
  • Shift-Left Security: Integrate security into every phase of IT operations, from application design and development to deployment and maintenance. This approach, known as “shift-left,” reduces the risk of vulnerabilities being introduced into systems from the earliest stages.
  • Automate Security Processes: Automating tasks such as compliance checks, threat detection, and issue resolution significantly improves the efficiency of security operations, freeing up resources to address more complex threats.
  • Cultivate a Security Awareness Culture: Security must be a shared responsibility, not limited to the IT department. All employees should be trained and aware of their role in maintaining organizational security. Regular training sessions and workshops help to promote this culture of awareness.

Best Practices Specific to Defender CSPM

To optimize the use of Microsoft Defender for Cloud (MDfC) as a CSPM solution, it is useful to follow these best practices:

  • Customize MDfC Settings: Tailor MDfC configurations to the organization’s specific needs and risk profile, implementing targeted security policies, custom threat detection rules, and compliance benchmarks.
  • Prioritize Alerts: Configure MDfC to categorize and prioritize alerts based on severity, resource sensitivity, and potential impact on business activities, ensuring a prompt response to critical threats.
  • Customize Dashboards: Adapt MDfC dashboards to highlight the most relevant security metrics, compliance status, and operational insights, facilitating monitoring and management of security.

Conclusion

Cloud Security Posture Management (CSPM) solutions are essential to ensure security and compliance in evolving cloud environments. With advanced tools like Microsoft Defender for Cloud, organizations can monitor and protect their data and infrastructures, minimizing risks and maintaining a robust security posture. Implementing a CSPM solution properly requires strategic planning and continuous adaptation to new threats, but the benefits in terms of protection and resilience are significant. By following best practices and integrating security into every phase of IT operations, companies can ensure proactive and enduring protection while preserving customer trust and corporate reputation.

Windows Server 2025 vs. Azure Stack HCI: Who Wins the Virtualization Challenge?

Recently, the virtualization landscape has seen significant changes, pushing companies to evaluate new solutions for their IT environments. Specifically, the acquisition of VMware by Broadcom has raised concerns among many customers, leading them to explore alternatives for their virtualization infrastructures. In this context, Windows Server 2025 and Azure Stack HCI emerge as two key options offered by Microsoft. Both play a fundamental role in cloud and on-premises architectures, but they cater to very different needs and contexts. In this article, we will delve into the differences between these two platforms, highlighting their strengths and use cases to understand how they fit into the adoption of new virtualization and hybrid cloud solutions.

Background: The Evolution from Traditional Infrastructure to Hyper-Converged Infrastructure (HCI)

Before the widespread adoption of hyper-converged infrastructure (HCI), virtualization was often implemented through a three-tier infrastructure, consisting of servers, switches, and SAN (Storage Area Network). The SAN represented the shared storage to which servers accessed via protocols such as iSCSI or Fibre Channel. This approach enabled and still enables the management of workloads across multiple hosts, ensuring redundancy and high availability through advanced failover and resilience mechanisms.

With the introduction of hyper-converged solutions, such as Azure Stack HCI—Microsoft’s solution for implementing a hyper-converged infrastructure—the management paradigm and architecture radically change: storage, networking, and computing are integrated into a single software-defined platform, eliminating the need for many dedicated hardware components.

Figure 1 – “Three Tier” Infrastructure vs Hyper-Converged Infrastructure (HCI)

This allows for greater simplicity in management, reduced costs associated with hardware, rack space, and cooling, and more flexibility in deployment.

Windows Server 2025: The Operating System for All Needs

Windows Server 2025 represents the latest evolution of Microsoft’s proven server operating system. This new version is designed to be a versatile, general-purpose platform, aimed at meeting the needs of businesses of any size. Windows Server 2025 continues to support a wide range of workloads, from traditional services like Active Directory and SQL Server to advanced virtualization scenarios with Hyper-V.

Some of the key innovations and features of Windows Server 2025 include:

  • Virtualization enhancements: Hyper-V has been further enhanced to support advanced features like GPU partitioning and optimized performance for virtual machines (VMs). This makes it ideal for companies heavily dependent on virtualization and needing to manage high-intensity workloads.
  • Storage Spaces Direct (S2D): This feature allows the creation of distributed storage clusters, transforming local disks into shared, highly available storage pools, with a strong focus on performance and resilience.
  • Hybrid cloud support: Although primarily designed for on-premises environments, Windows Server 2025 offers strong integration with Azure Arc, enabling hybrid and centralized management of both local and cloud resources. This feature opens up new scenarios, where on-premises resources can be managed directly from the Azure portal.
  • Flexible licensing: Windows Server continues to offer adaptable licensing models to meet business needs. In fact, Microsoft plans to sell Windows Server 2025 not only through traditional perpetual licenses but also through a pay-as-you-go subscription option.

Scalability and Performance with Windows Server 2025

One of the standout aspects of Windows Server 2025 is its focus on scalability. With support for up to 240 terabytes of memory and 248 virtual processors per virtual machine, this platform is designed to handle extremely intensive workloads, such as artificial intelligence and big data processing. Additionally, optimization for NVMe storage ensures a performance improvement of up to 70% compared to previous versions, positioning Windows Server 2025 as an excellent choice for businesses needing high-speed storage.

Another significant innovation is support for AD-less clustering, designed for edge scenarios where traditional Active Directory (AD) management might not be practical. This is particularly useful for companies operating in decentralized environments, such as remote industrial sites or branch offices.

Azure Stack HCI: The Hyper-Converged Virtualization Platform

Unlike Windows Server, Azure Stack HCI is not a general-purpose operating system. It is a platform specifically designed for virtualization and containerization environments. Azure Stack HCI combines compute, networking, and storage in a software-defined solution, offering simplified on-premises workload management with strong Azure cloud integration. It is an ideal solution for organizations seeking a scalable HCI infrastructure that can be managed through the Azure portal.

Key features of Azure Stack HCI include:

  • Focus on virtualization: Azure Stack HCI is optimized to run virtual machines and containers, without offering traditional server roles like Active Directory or file servers. This makes it a solution focused on specific workloads, such as managing virtualization and containerization environments through Hyper-V and Kubernetes.
  • Advanced cloud integration: Azure Stack HCI integrates seamlessly with Azure services, enabling the management of both on-premises and cloud resources through a single interface. This hybrid capability simplifies tasks such as provisioning, monitoring, and governance of resources in geographically distributed environments.
  • Security: Azure Stack HCI implements over 100 predefined security best practices.
  • Costs and licensing: Azure Stack HCI adopts a subscription-based licensing model, ensuring constant updates and security patches. While this approach makes costs more predictable, it may be less advantageous for smaller setups compared to the traditional Windows Server licensing model.

When to Choose Windows Server 2025?

Windows Server 2025 is a versatile and reliable choice for a wide range of IT scenarios, thanks to its general-purpose nature. However, to determine whether this platform is suitable for a specific organization, it is important to evaluate technical, economic, and operational requirements. Situations where Windows Server 2025 might be the ideal solution include:

  • On-premises-focused workloads: If an organization needs to keep most of its workloads on-premises with minimal cloud integration, Windows Server 2025 is the best choice.
  • Limited budget: If the company is not ready to invest in subscription-based solutions, the traditional Windows Server licensing model might be more cost-effective.
  • Hardware compatibility: If the goal is to reuse existing hardware, Windows Server offers greater flexibility in terms of compatibility.

When to Choose Azure Stack HCI?

Azure Stack HCI stands out for its strong cloud integration and ability to provide a modern hyper-converged infrastructure. While not a general-purpose operating system, its architecture makes it particularly suited to specific needs related to virtualization and containerized workloads. Situations where Azure Stack HCI emerges as the optimal choice include:

  • Hybrid cloud environments: If a company has already adopted a hybrid cloud strategy, Azure Stack HCI offers integrated management with the Azure portal, simplifying the control of resources in distributed environments.
  • Resilience and disaster recovery: Thanks to support for stretched clusters across multiple geographic locations, Azure Stack HCI provides greater resilience and advanced disaster recovery options.
  • Infrastructure modernization: If you are looking to modernize infrastructure by adopting hyper-converged technologies and close cloud integration, Azure Stack HCI is the perfect solution to support the transition.

Conclusions

Windows Server 2025 and Azure Stack HCI are robust and powerful platforms, but designed for different needs. Windows Server 2025 is perfect for organizations needing a general-purpose platform with a strong on-premises presence and some hybrid cloud capabilities. Azure Stack HCI, on the other hand, is the ideal choice for companies looking to fully embrace hybrid cloud with simplified management and strong Azure integration.

The choice between the two will depend on the specific requirements of the organization, budget, and long-term goals. It’s not about deciding which is the “better” virtualization platform, but which better meets the company’s operational and strategic needs.

Azure Governance 2.0: managing Hybrid and Multi-Cloud environments with AI support

Hybrid and multi-cloud IT environments are revolutionizing how companies manage their digital infrastructures, offering the flexibility to combine on-premise resources with cloud services. This approach allows for optimal workload management and unprecedented scalability, but it also brings challenges that organizations must address to maintain control and security over their IT environments. In this article, we will explore the challenges of hybrid and multi-cloud IT environments, examining best practices for effective governance and the growing role of artificial intelligence (AI) in simplifying the management of these complex infrastructures.

Challenges of Hybrid and Multi-Cloud IT Environments

The adoption of hybrid and multi-cloud IT environments offers companies flexibility and scalability, but it introduces a series of challenges that need to be addressed to ensure effective resource management.

High and Unnecessary Service Costs

One of the main difficulties lies in cost management. Companies often face unexpected or excessive expenses due to a poor understanding of cloud provider pricing models or the activation of unnecessary services. To avoid this, it’s crucial to implement cost management strategies, including constant resource monitoring and service optimization based on real business needs.

New Security Threats

The move to the cloud brings new security threats. Organizations must tackle risks such as unauthorized access, data breaches, and misconfigured cloud services. To mitigate these dangers, solid security strategies are essential, including data encryption, multi-factor authentication, and advanced identity management. Training IT personnel on security risks and protection best practices is another critical component.

Delegation of Service Activation and Risk of Losing Control

The ease of activating cloud services can lead to “shadow IT” scenarios, where departments activate services autonomously without centralized IT control. This can result in resource dispersion and overall loss of control over the infrastructure. Clear guidelines and a centralized approval process are needed to ensure each cloud service activation is carefully evaluated and monitored.

Compliance Challenges

Integrating cloud solutions poses complex compliance challenges, especially in highly regulated industries. Companies must understand and comply with the specific regulatory requirements of their sector, working with cloud service providers to ensure compliance. Regular audits and compliance assessments are essential to keep cloud systems aligned with current regulations.

Managing Complex Technologies with Reduced Staff

Managing complex, ever-evolving IT environments requires specific skills and a sufficient number of qualified personnel. However, many organizations face this challenge with small IT teams. In these cases, investing in staff training and adopting automation and AI technologies can help reduce the manual workload, improving operational efficiency and resource management.

Although hybrid and multi-cloud environments offer many benefits, it’s crucial for companies to proactively address these challenges by implementing robust governance strategies, advanced security solutions, and automation tools to ensure effective and efficient IT resource management.

Cloud Governance: Essential for Control

Cloud Governance is a set of processes and tools that maintain technological and financial control over IT environments. This includes cost management, security, and resource standardization. An emerging aspect of governance also involves energy consumption and sustainability. Monitoring tools for emissions and environmental impact data collection help companies to be more conscious and responsible in their cloud strategy. Therefore, it is crucial to adopt Cloud Governance based on solid, time-tested frameworks.

Technologies and Best Practices for Governance with Microsoft Solutions

Effective cloud governance also requires advanced tools and established best practices. Microsoft provides an integrated ecosystem of solutions to manage, optimize, and protect cloud resources, ensuring security, cost control, and compliance.

  • Azure Cloud Adoption Framework The Azure Cloud Adoption Framework (CAF) offers guidelines for planning, building, and managing cloud environments, with a dedicated section for governance. It helps companies structure security and compliance policies and optimize deployment processes, reducing risks.
  • Azure Policy Azure Policy ensures resources comply with company rules by applying automated security and compliance policies at scale. Policies identify and correct non-compliant configurations, ensuring constant control and protection.
  • Resource Lock Resource Locks prevent accidental modifications or deletions of critical resources, ensuring operational stability, particularly in production environments.
  • Resource Tagging Tagging simplifies the organization and management of components, particularly cloud resource costs, allowing clear budget division between projects or departments.
  • Microsoft Defender for Cloud Microsoft Defender for Cloud offers advanced multi-layered security, covering:
    • CSPM to monitor security and fix risky configurations.
    • CWP to protect workloads on Azure and multi-cloud environments.
    • DevSecOps by integrating security into development processes with Azure DevOps and GitHub.
  • Azure Cost Management and Azure Advisor Azure Cost Management provides tools to monitor resource usage and optimize costs, while Azure Advisor offers suggestions related to reliability, security, performance, operational excellence, and cost reduction.
  • Azure Arc Azure Arc extends Azure governance to on-premise and multi-cloud environments, allowing centralized and consistent management of all resources, regardless of their location, improving control and efficiency.

360° IT Governance Strategy

An effective IT governance strategy must go beyond the cloud to include on-premise and edge resources. This holistic approach ensures consistency, efficiency, and security across the entire IT ecosystem, preventing operational silos. In this context, Azure Arc plays a crucial role, extending Azure cloud management services to on-premise and edge environments. Additionally, it allows companies to apply uniform security and compliance policies to all resources, regardless of their location.

The Role of AI in IT Governance

AI is revolutionizing how organizations manage and govern IT environments, especially in hybrid and multi-cloud contexts. AI technologies help address the growing complexity and data volumes generated, offering tools that not only monitor but predict and proactively solve issues. This allows companies to make faster, more accurate decisions, improve security, optimize costs, and ensure regulatory compliance.

The following sections illustrate the main areas where AI has already introduced and will continue to bring significant innovations in Cloud Governance.

Predictive Monitoring and Analysis

AI can monitor IT resources distributed across various environments in real time, detecting anomalies or inefficiencies before they have a significant impact. Predictive analysis, one of AI’s key features, allows for the anticipation of failures or overloads, enabling proactive maintenance and more efficient resource lifecycle management. Through machine learning, these systems learn from historical data, continuously improving the accuracy of predictions and minimizing downtime.

Automating Operational Processes

AI plays a crucial role in automating repetitive and complex tasks that often require manual intervention by IT personnel. This includes automated management of cloud resources, infrastructure scalability, and service provisioning and de-provisioning. Intelligent automation reduces the risk of human errors and frees up resources for strategic tasks, improving the overall efficiency of the IT environment.

Financial Optimization

In terms of cost management, AI provides advanced tools to monitor cloud resource usage and suggest optimizations. For example, solutions like Azure Cost Management use AI algorithms to analyze consumption patterns, identify underutilized or unused resources, and offer recommendations to reduce costs. Additionally, AI helps create detailed spending forecasts and suggest dynamic resource resizing based on actual demand, ensuring efficient IT budget use.

Proactive Security and Incident Response

On the security front, AI offers advanced threat detection and response capabilities. Solutions like Microsoft Defender for Cloud use AI algorithms to analyze vast amounts of data, identify suspicious behaviors, and automatically respond to potential threats. This proactive approach allows companies to block malicious activities in real time, drastically reducing the risks of breaches and attacks. AI also facilitates the adoption of DevSecOps practices, integrating security in the early stages of application development and reducing vulnerabilities throughout the software lifecycle.

AI Integration for Innovation

Finally, AI not only optimizes operations but also acts as a catalyst for innovation. AI empowers companies to experiment with new solutions, test different scenarios, and optimize the performance of cloud applications. In DevOps and DevSecOps contexts, AI can speed up development cycles, improving efficiency and software quality while ensuring security is not compromised.

Conclusion

Hybrid and multi-cloud IT environments offer companies unprecedented opportunities for flexibility, scalability, and resource optimization. However, fully exploiting these advantages requires proactive management of challenges related to costs, security, compliance, and governance. Adopting advanced technologies such as artificial intelligence and specific management tools like Azure Policy and Microsoft Defender for Cloud is crucial to maintaining control over complex and distributed environments.

Additionally, a holistic IT governance approach, encompassing on-premise, edge, and cloud resources, is essential to avoid operational silos and ensure consistency, security, and efficiency. AI, with its capabilities for automation, predictive monitoring, and optimization, not only simplifies operational management but also fosters continuous innovation, enabling companies to improve performance, reduce costs, and strengthen security in an increasingly dynamic and complex IT ecosystem.

The Importance of GPUs in the Field of Artificial Intelligence and the Innovations Introduced in Windows Server 2025

The evolution of technologies related to artificial intelligence (AI) has led to an increasing demand for computing power, essential for managing the training, learning, and inferencing of machine learning and deep learning models. In this context, GPUs (Graphics Processing Units) have established themselves as fundamental components, thanks to their ability to perform large-scale parallel computations extremely efficiently. With the upcoming releases of Windows Server 2025 and Azure Stack HCI 24H2, Microsoft introduces significant innovations that enable companies to fully harness the potential of GPUs not only in AI but beyond. These advanced new features simplify hardware resource management and provide an optimized platform for developing and deploying AI solutions on a large scale. In this article, we will explore the importance of GPUs in the AI ecosystem and analyze how the new versions of Windows Server 2025 further enhance these capabilities, transforming how companies tackle the challenges and opportunities presented by AI.

Computing Power and GPU Optimization for Deep Learning on Virtual Infrastructures

Deep learning, an advanced branch of artificial intelligence that leverages deep artificial neural networks, requires a vast amount of computing power to function effectively. Training these models involves processing large volumes of data through multiple layers of interconnected nodes, each performing complex mathematical operations. While traditional CPUs are highly powerful in sequential data processing, they are not optimized to handle a large number of parallel operations, as required by deep learning models.

In this context, GPUs (Graphics Processing Units) are particularly well-suited due to their ability to execute thousands of operations simultaneously. This makes GPUs ideal for training deep learning models, especially complex ones like convolutional neural networks (CNNs), which are widely used in image recognition. For example, training a CNN on a large dataset could take weeks on a CPU, while with the help of a GPU, the time required can be drastically reduced to just days or even hours, depending on the model’s complexity and the dataset’s size.

With the imminent release of Windows Server 2025 and Azure Stack HCI 24H2, Microsoft will offer its customers the ability to allocate an entire GPU’s capacity to a virtual machine (VM), which can run both Linux and Windows Server operating systems within a fault-tolerant cluster, thanks to Discrete Device Assignment (DDA) technology. This means that critical AI workloads for businesses can be reliably executed on a VM within a cluster, ensuring that, in the event of an unexpected failure or planned migration, the VM can be restarted on another node in the cluster using the GPU available on that node.

Microsoft recommends working closely with OEM (Original Equipment Manufacturer) partners and independent GPU hardware manufacturers (IHV) to plan, order, and configure the necessary systems to support the desired workloads with the right configurations and software. Additionally, if GPU acceleration via DDA is desired, it is advisable to consult with OEM and IHV partners to obtain a list of GPUs compatible with DDA. To ensure the best possible performance, Microsoft also suggests creating a homogeneous configuration for GPUs across all servers in the cluster. A homogeneous configuration implies installing the same GPU model and configuring the same number of partitions on all GPUs in the cluster’s servers. For example, in a cluster consisting of two servers each with one or more GPUs, all GPUs should be of the same model, brand, and size, and the number of partitions on each GPU should be identical.

Scalability and Flexibility of GPUs in AI Computing Architectures

In addition to their extraordinary computational speed, GPUs also offer significant advantages in terms of scalability, a crucial factor in modern AI computing architectures. Often, the datasets used to train AI models are so vast that they exceed the computational capabilities of a single processor. In these cases, GPUs allow the workload to be distributed across multiple computing units, ensuring high operational efficiency and enabling the simultaneous processing of enormous amounts of data.

Another critical aspect of GPUs is their flexibility in handling a variety of workloads, ranging from real-time inference, used for example in speech recognition applications, to the training of complex models that require weeks of intensive computation. This versatility makes GPUs an indispensable tool not only for advanced research centers but also for commercial applications that require high performance on a large scale.

GPU Partitioning: Maximizing Efficiency and Resource Utilization

One of the most significant innovations in the field of GPUs is the concept of GPU Partitioning, which is the ability to divide a single GPU into multiple virtual partitions, each of which can be dedicated to different workloads. This technique is crucial for optimizing GPU resources, as it maximizes operational efficiency while minimizing waste. In the context of artificial intelligence, where computational requirements can vary significantly depending on the models used, GPU Partitioning offers the flexibility to dynamically allocate portions of the GPU to various tasks, such as training machine learning models, real-time inference, or other parallel operations. This approach is particularly advantageous in data centers, as it allows multiple users or applications to share the same GPU resources without compromising overall system performance.

The introduction of GPU Partitioning not only improves the flexibility and scalability of computing infrastructures but also helps reduce operational costs by avoiding the need to purchase additional hardware when not strictly necessary. Additionally, this technology promotes a more balanced use of resources, preventing situations of GPU overload or underutilization, contributing to more sustainable and efficient management of AI-related operations.

With the release of Windows Server 2025 Datacenter, Microsoft has integrated and enhanced support for GPU Partitioning, allowing customers to divide a supported GPU into multiple partitions and assign them to different virtual machines (VMs) within a fault-tolerant cluster. This means that multiple VMs can share a single physical GPU, with each receiving an isolated portion of the GPU’s capabilities. For example, in the retail and manufacturing sectors, customers can perform inferences at the edge using GPU support to obtain rapid results from machine learning models, results that can be used before the data is sent to the cloud for further analysis or continuous improvement of ML models.

GPU Partitioning utilizes the Single Root IO Virtualization (SR-IOV) interface, which provides a hardware-based security boundary and ensures predictable performance for each VM. Each VM can only access the GPU resources dedicated to it, with secure hardware partitioning preventing unauthorized access by other VMs.

Another significant development concerns live migration capability for VMs using GPU Partitioning. This allows customers to balance critical workloads across various cluster nodes and perform hardware maintenance or software updates without interrupting VM operations. During a planned or unplanned migration, the VMs can be restarted on different nodes within the cluster, using available GPU partitions on those nodes.

Finally, Microsoft has made the Windows Administration Center (WAC) available to configure, use, and manage VMs that leverage virtualized GPUs, both in standalone configurations and in failover clusters. The WAC centralizes the management of virtualized GPUs, significantly simplifying administrative complexity.

Innovations and Future Prospects

The future of GPUs in artificial intelligence looks extremely promising. With the increasing complexity of AI models and the growing demand for solutions capable of leveraging real-time AI, the parallel computing power offered by GPUs will become increasingly essential. In particular, their ability to perform a large number of simultaneous operations on vast datasets makes them an indispensable component in cloud solutions.

The significant innovations in GPUs, supported by the upcoming releases of Windows Server 2025 and Azure Stack HCI 24H2, are the result of ongoing and close collaboration between Microsoft and NVIDIA. Microsoft Azure handles some of the world’s largest workloads, pushing CPU and memory capabilities to the limit to process enormous volumes of data in distributed environments. With the expansion of AI and machine learning, GPUs have become a key component of cloud solutions as well, thanks to their extraordinary ability to perform large-scale parallel operations. Windows Server 2025 will bring many benefits to the GPU sector as well, further enhancing features related to storage, networking, and the scalability of computing infrastructures.

Conclusions

The importance of GPUs in the field of artificial intelligence is set to grow exponentially, thanks to their ability to process large volumes of data in parallel with efficiency and speed. The innovations introduced in Windows Server 2025 and Azure Stack HCI 24H2 represent a significant step toward optimizing computing infrastructures, providing companies with advanced tools to manage and fully exploit GPU resources. These developments not only enhance the computing power necessary for AI but also introduce greater flexibility and scalability, essential for addressing future challenges. With the adoption of technologies like GPU Partitioning and support for live VM migration, Microsoft demonstrates its leadership in providing solutions that not only improve performance but also enhance the reliability and sustainability of AI-related business operations. The future prospects see GPUs playing an increasingly crucial role, not only in data centers but also in edge and cloud applications, ensuring that technological innovation continues to drive the evolution of AI across all sectors.

Useful References

Everything you need to know about the new OEM Licensing model for Azure Stack HCI

Microsoft recently introduced a new OEM licensing model for Azure Stack HCI, designed to simplify the licensing process and offer numerous benefits. This new model, available through major hardware vendors like HPE, Dell, and Lenovo, provides companies with an additional option to manage their Azure Stack HCI licenses. In this article, we will explore the current licensing options in detail and the features of the new OEM license, highlighting the technical aspects and benefits for users.

Existing Licensing Options

Before diving into the new OEM licensing option, it is essential to understand the currently available licensing models for Azure Stack HCI. For all details on the Azure Stack HCI cost model, you can consult this article.

Overview of the New OEM License

The new OEM licensing option for Azure Stack HCI is a prepaid license available through specific hardware vendors, such as HPE, Dell, and Lenovo. Intended for Azure Stack HCI hardware, including Premier Solutions, Integrated Systems, and Validated Nodes, this license offers a pre-installed solution that is activated in Azure and remains valid for the duration of the hardware.

The Azure Stack HCI OEM license includes three essential components:

  • Azure Stack HCI: The foundational platform for hybrid cloud that enables running virtualized workloads.
  • Azure Kubernetes Services (AKS): The container orchestration service that simplifies the management and deployment of containerized applications.
  • VM and guest containers: Through Windows Server Datacenter 2022, Windows Server VMs can be activated on an Azure Stack HCI cluster using generic keys for Automatic Virtual Machine Activation (AVMA), via Windows Admin Center or PowerShell.

This license ensures access to the latest versions of Azure Stack HCI and AKS, allowing for the use of unlimited VMs and containers.

OEM License Features

The features of the Azure Stack HCI OEM license are as follows:

  • Inclusion of Azure Stack HCI and AKS: The license includes Azure Stack HCI and Azure Kubernetes Services (AKS) with unlimited virtual CPUs. This is a significant advantage compared to the Azure Hybrid Benefit, which limits the use of AKS to the number of licensed physical cores.
  • Physical core licensing: Each physical core in the server must be licensed. The base license covers up to 16 cores, with additional components available in two and four core increments for systems with more than 16 cores. For example, a 36-core system requires two 16-core licenses plus an additional four-core license. This license does not support a dynamic per-core model.
  • Prepaid and permanent license: This license does not require annual renewals or subscriptions. It is a prepaid license that remains valid for the duration of the hardware on which the Azure Stack HCI operating system is installed.
  • No support for mixed nodes: Currently, this license does not support environments with mixed nodes in the same Azure Stack HCI system. For more information, it is advisable to consult the mixed node scenarios.
  • Non-transferable license: The license is tied to the original hardware on which the Azure Stack HCI operating system is pre-installed and cannot be transferred to different hardware or systems. This approach ensures that the license and its benefits remain specific to the initial hardware configuration.
  • Automatic activation: This pre-installed license does not include product keys or COA. The license is automatically activated once the device is registered in Azure. In the event of a failure requiring reinstallation, it is necessary to contact the OEM vendor.
  • No CAL requirements: For this specific license, no Device or User CAL is required.

Technical Details

The new OEM license is pre-installed on the hardware and automatically activates in Azure. This process eliminates the need for physical licenses or additional activation steps. When users connect Azure Stack HCI nodes to Azure, the system recognizes the OEM license and automatically activates the associated benefits.

To verify if you have an active OEM license for Azure Stack HCI, you can follow these steps:

  1. Access the Azure portal.
  2. Search for your Azure Stack HCI cluster.
  3. Under the cluster, select Overview to check the billing status.
    • If you have an active OEM license for Azure Stack HCI, the billing status should be OEM License, and the OEM license status should be Activated.

Figure 1 – Azure Stack HCI Billing status

For support with the Azure Stack HCI OEM license, you must first contact your OEM vendor. If support is not available from the vendor, it is advisable to open an Azure support request through the Azure portal.

Advantages of the New OEM Licensing Mechanism

The new OEM licensing option offers several significant advantages for Azure Stack HCI users:

  • Simplified licensing: Users do not need to manage separate licenses or worry about additional documentation. The license is embedded in the hardware, simplifying the entire process and reducing administrative complexity.
  • Different and more predictable cost model: By prepaying the license, users avoid recurring monthly or annual costs, which can result in significant long-term savings. Users benefit from a one-time purchase that includes hardware, software, and full support, simplifying IT resource procurement and management.
  • Unlimited use of AKS: The inclusion of unlimited virtual CPUs for Azure Kubernetes Services (AKS) is a substantial advantage, particularly for organizations that extensively use Kubernetes for containerized applications.
  • Operational efficiency: The automatic activation feature ensures that users can quickly and easily start using their Azure Stack HCI infrastructure without additional configuration or licensing steps, improving operational efficiency. Moreover, a single license covers Azure Stack HCI, AKS, and Windows Server 2022 as guest VMs, offering an integrated solution that simplifies overall license management.

Conclusion

The new OEM licensing model for Azure Stack HCI represents a new opportunity for licensing hybrid infrastructures. Through direct integration with major hardware vendors like HPE, Dell, and Lenovo, this solution offers a prepaid and permanent license, simplifying the purchasing process and reducing administrative complexity. The benefits include unlimited use of Azure Kubernetes Services, a more predictable cost model, and automatic activation that allows users to quickly start using their infrastructure. While this licensing model does not support mixed node environments and is non-transferable, it makes Azure Stack HCI an even more attractive choice for companies seeking efficiency and flexibility in managing Microsoft hybrid solutions.

The New Azure Arc Solution for Efficient Management of Multicloud Environments

Companies are increasingly adopting a multicloud approach to leverage the specific advantages offered by various cloud service providers. This strategy helps avoid vendor lock-in, improve resilience, and optimize costs by utilizing the best offers available on the market. However, managing resources distributed across multiple cloud platforms presents significant challenges, especially regarding inventory management, reporting, analysis, consistent resource tagging, and provisioning. In this article, we will examine how the Azure Arc Multicloud Connector can help overcome these challenges, offering centralized and efficient management of cloud resources.

Challenges in Multicloud Management

Managing a multicloud environment involves numerous challenges that organizations must address to ensure effective and smooth operations. Key difficulties include:

  • Inventory Management: Keeping track of all resources distributed across various clouds.
  • Reporting and Analysis: Conducting detailed reports and analysis of cloud resources.
  • Consistent Resource Tagging: Applying tags uniformly to resources across all cloud platforms.
  • Provisioning and Management Tasks: Performing provisioning and other management operations consistently across multiple clouds.

What is the Azure Arc-Enabled Multicloud Connector?

The Azure Arc-enabled Multicloud Connector is a solution that allows the connection of non-Azure public cloud resources to Azure, providing a centralized source for managing and governing cloud resources. Currently, it supports AWS as a public cloud. This connector simply uses API calls to collect and manage resources without the need to install appliances within AWS.

Figure 1 – Solution overview

NOTE: The Multicloud Connector can work alongside the AWS connector of Defender for Cloud. If desired, both connectors can be used for more comprehensive cloud resource management.

The following paragraphs describe the currently supported features: inventory and onboarding.

Inventory Features

The Inventory solution of the Multicloud Connector provides an up-to-date view of resources from other public clouds within Azure, offering a single reference point to view all cloud resources. Once the Inventory solution is enabled, the metadata of the source cloud’s resources are included in the resource representations in Azure, allowing the application of Azure tags and policies. Additionally, it enables querying all cloud resources through the Azure Resource Graph, for example, to find all Azure and AWS resources with a specific tag.

The Inventory solution regularly scans the source cloud to keep the view of resources in Azure updated.

Representation of AWS Resources in Azure

After connecting the AWS cloud and enabling the Inventory solution, the Multicloud Connector creates a new resource group using the naming convention aws_IDAccountAws. The Azure representations of AWS resources are created in this group, using the AwsConnector namespace values described earlier. Azure tags and policies can be applied to these resources. The resources discovered in AWS and projected in Azure are placed in Azure regions using a standard mapping scheme, allowing consistent management of AWS resources within the Azure ecosystem.

Periodic Synchronization Options

The periodic synchronization time selected during the Inventory solution configuration determines how frequently the AWS account is scanned and synchronized with Azure. Enabling periodic synchronization ensures that changes to AWS resources are automatically reflected in Azure. For example, if a resource is deleted in AWS, the corresponding resource in Azure will also be deleted. Periodic synchronization can be disabled during solution configuration, but this may result in an outdated representation of AWS resources in Azure.

Querying for Resources in Azure Resource Graph

Azure Resource Graph is a service designed to extend Azure resource management by providing efficient and performant resource exploration capabilities. Large-scale queries across a set of subscriptions help manage the environment effectively. Queries can be executed using the Resource Graph Explorer in the Azure portal, with query examples for common scenarios available for consultation.

Arc Onboarding Features

The Arc onboarding automatically identifies EC2 instances running in the AWS environment and installs the Azure Connected Machine agent on the VMs, allowing them to be integrated into Azure Arc. Currently, AWS EC2 instances are supported. This simplified experience allows using Azure management services, such as Azure Monitor, on these VMs, providing a centralized method for jointly managing Azure and AWS resources.

Representation of AWS Resources in Azure

After connecting the AWS cloud and enabling the Arc Onboarding solution, the Multicloud Connector creates a new resource group following the naming convention aws_IDAccountAws. When EC2 instances are connected to Azure Arc, their representations appear in this resource group. These resources are assigned to Azure regions using a standard mapping scheme. By default, all regions are scanned, but specific regions can be excluded during solution configuration.

Connectivity Method

During the Arc Onboarding solution creation, it is possible to choose whether the Connected Machine agent should connect to the Internet via a public endpoint or a proxy server. If the proxy server is chosen, the URL of the proxy server to which the EC2 instance can connect must be provided.

Periodic Synchronization Options

The periodic synchronization time selected during the Arc Onboarding solution configuration determines how frequently the AWS account is scanned and synchronized with Azure. Enabling periodic synchronization ensures that whenever a new EC2 instance that meets the prerequisites is detected, the Arc agent will be automatically installed. If preferred, periodic synchronization can be disabled during solution configuration. In this case, new EC2 instances will not be automatically integrated into Azure Arc, as Azure will not be able to scan for new instances.

Configuration and Operational Details

The initial configuration of the multicloud connector requires using the Azure portal to create the connector itself, specifying the resource group and AWS account to be integrated. Subsequently, it is necessary to download and apply the CloudFormation templates in AWS to configure the required IAM roles. Finally, it is important to configure the synchronization intervals to periodically update resource information, with a default interval of one hour.

Pricing

The Multicloud Connector is free but integrates with other Azure services that have their own pricing models. Any Azure service used with the Multicloud Connector, such as Azure Monitor, will be charged according to the specific service pricing. For more information, you can consult the official Azure cost page.

After connecting the AWS cloud, the Multicloud Connector queries the AWS resource APIs multiple times a day. These read-only API calls incur no costs in AWS but are logged in CloudTrail if a trail for read events has been enabled.

Conclusions

The Azure Arc Multicloud Connector represents an advanced and strategic solution for addressing the challenges of multicloud management. By centralizing the governance and inventory of cloud resources, companies can achieve a unified and consistent view of their distributed infrastructures. This tool not only improves operational efficiency through periodic synchronization and consistent resource tagging but also enables more secure management integrated with Azure services. Moreover, adopting the Azure Arc Multicloud Connector allows organizations to optimize costs and enhance resilience by leveraging the best offers from various cloud providers without the risk of vendor lock-in. Ultimately, this solution proves fundamental for companies aiming for efficient, innovative, and scalable multicloud management.