mundophone

MICROSOFT

Maia 200: the AI chip Microsoft wants to start the “era of reasoning AI”

Microsoft announced on Monday (26) the launch of Maia 200, its next-generation artificial intelligence accelerator, developed to meet the demands of what the company called the “era of reasoning AI.” The new chip is designed to offer high performance in inference, with significant efficiency gains and cost reductions in large-scale AI workloads.

Maia 200 is already in operation in the central region of the United States in Microsoft's data centers, with expansion planned for the Phoenix, Arizona region, as well as other locations in the future. The first systems are being used to power new Microsoft Superintelligence models, accelerate Microsoft Foundry projects, and support Microsoft Copilot.

By operating some of the world's most demanding AI workloads, the company claims to be able to precisely align silicon design, model development, and application optimization, generating consistent gains in performance, energy efficiency, and scale in Azure.

In practice, the Maia 200 chip is capable of running the largest current AI models, with room for even larger models in the future, according to Microsoft.

The accelerator features native FP8 and FP4 tensor cores, over 100 billion transistors, and a redesigned memory subsystem with 216 GB of HBM3e and bandwidth up to 7 TB/s, plus 272 MB of integrated SRAM. In practice, each chip delivers over 10 petaFLOPS in FP4 precision and about 5 petaFLOPS in FP8, capable of running the largest current AI models and scaling to even more complex architectures.

Microsoft claims that the Maia 200 outperforms Amazon's third-generation Trainium in FP4 and Google's seventh-generation TPU in FP8, in addition to offering 30% more performance per dollar compared to the most advanced hardware currently used by the company. The project's focus is on optimizing the so-called token economy, one of the main cost bottlenecks in operating generative models at scale.

The new accelerator integrates Azure's heterogeneous AI infrastructure and will be used to run multiple models, including the latest versions of OpenAI's GPT-5.2, as well as applications such as Microsoft Foundry and Microsoft 365 Copilot. Microsoft's Superintelligence team will also employ the Maia 200 in synthetic data generation and reinforcement learning pipelines, accelerating the improvement of proprietary models.

In general, it was designed to run the largest current language models with headroom for future ones. The chip has over 100 billion transistors and promises to deliver 10 petaflops in FP4, as well as around 5 petaflops in FP8.

Currently, several technology companies are moving to create their own components for this same purpose. Microsoft's Maia 200 aims to compete in the market with Google's TPUs and Amazon's Trainium line. In fact, it offers 3x more performance in FP4 than Amazon's 3rd generation model and FP8 performance superior to Google's TPU v7.

From a systems perspective, the Maia 200 introduces a scalable, two-tier network architecture based on standard Ethernet, with an integrated NIC and proprietary communication protocols between accelerators. Each chip offers 2.8 TB/s of dedicated bidirectional bandwidth for expansion and supports clusters of up to 6,144 accelerators, which, according to Microsoft, reduces energy consumption and total cost of ownership (TCO) in high-density inference environments.

The Maia 200 is already being deployed in the Azure US Central region, with expansion planned for US West 3 in Arizona and other locations. Microsoft also announced a preview of the Maia SDK, with integration to PyTorch, the Triton compiler, and optimized kernel libraries, allowing portability between different accelerators and greater control for developers.

Importance of Maia 200...AI inference has become a critical and expensive part of the operation of AI companies. The launch of Maia 200, then, focuses on reducing costs, increasing energy efficiency and decreasing dependence on NVIDIA GPUs, as well as optimizing the execution of models such as Copilot within Azure data centers.

In addition to powering Copilot operations, the Maia 200 is also expected to support models from Microsoft's superintelligence team. Not only that, but the company has opened the chip's SDK to developers, academics, and AI labs.

mundophone

mundophone

Monday, January 26, 2026

No comments:

Post a Comment

Report Abuse