Microsoft Maia 200 Delivers 10 PetaFLOPS and Takes Aim at Nvidia

By SaveDelete March 8, 2026 Updated: March 8, 2026

Microsoft has unveiled the Maia 200, a custom AI inference accelerator built on TSMC's 3nm process that delivers over 10 petaFLOPS in 4-bit precision. The chip is already deployed in Azure data centers and will power Microsoft 365 Copilot, Azure Foundry, and OpenAI's GPT-5.2 models. Microsoft claims it is the most performant first-party silicon from any hyperscaler.

The Numbers That Matter

Maia 200 packs over 140 billion transistors with native FP8/FP4 tensor cores, 216GB of HBM3e memory at 7 TB/s bandwidth, and 272MB of on-chip SRAM. Each chip delivers over 10 petaFLOPS in FP4 and over 5 petaFLOPS in FP8, all within a 750W power envelope.

Microsoft claims three times the FP4 performance of Amazon's third-generation Trainium and FP8 performance exceeding Google's seventh-generation TPU. The company also says Maia 200 offers 30% better performance per dollar than the latest generation hardware currently in their Azure fleet.

Already Deployed in Azure

Unlike many chip announcements that are aspirational, Maia 200 is already running in Microsoft's US Central datacenter region near Des Moines, Iowa, with US West 3 (Phoenix, Arizona) coming next. It is being used for Microsoft 365 Copilot, Azure Foundry, and OpenAI's GPT-5.2 inference workloads.

Microsoft's Superintelligence team is also using Maia 200 for synthetic data generation and reinforcement learning to train next-generation in-house models.

The Scale-Up Network

Maia 200 introduces a two-tier scale-up network built on standard Ethernet rather than proprietary interconnects. Each accelerator provides 2.8 TB/s of bidirectional dedicated bandwidth and supports clusters of up to 6,144 accelerators. Four chips are fully connected within each tray using direct, non-switched links for optimal inference efficiency.

Developer SDK Preview

Microsoft has released a preview of the Maia SDK with PyTorch integration, a Triton compiler, optimized kernel libraries, and access to Maia's low-level programming language. This is aimed at researchers and developers who want to optimize models specifically for Maia 200 hardware.

The Bottom Line

Microsoft building its own AI inference chip is the clearest signal yet that hyperscalers are serious about reducing their dependence on Nvidia. The specs are genuinely impressive — 10 petaFLOPS in FP4 from a single chip is massive — and the fact that it is already deployed (not just announced) adds credibility. But the real question is whether the software ecosystem can match Nvidia's CUDA moat. Custom silicon means custom toolchains, and developers historically resist switching unless the performance gap is overwhelming. Microsoft's 30% cost advantage is significant but not game-changing. This is a long game, and Nvidia still has the ecosystem advantage.