Nvidia Debuts Nemotron 3 Super, Plans to Spend $26B on Open AI Models

Q: What Is Nemotron 3 Super?

Nemotron 3 Super is a 120B-parameter model built on a hybrid architecture that combines three different technologies: Mamba-2 (a state-space model for efficient sequence processing), standard Transformer attention layers, and a Latent Mixture of Experts (LatentMoE) system. Only 15 billion parameters are active during any given inference pass, which means the model runs much cheaper than its 120B size suggests. The model was created by distilling knowledge from Nvidia’s larger Llama-3.1-Nemotron-Ultra-253B model down into this more compact form. It’s designed for coding, math, reasoning, and instruction-following tasks.

By Jaspal March 12, 2026 Updated: March 12, 2026

Nvidia just released Nemotron 3 Super, a 120-billion-parameter open-weight AI model that it claims outperforms every other open model on the market. More significantly, Nvidia disclosed in an SEC filing that it plans to spend $26 billion over five years developing open AI models — a figure that dwarfs what most AI companies spend building closed ones. The GPU monopoly isn’t content being just the arms dealer anymore. It wants to own the ammunition too.

What Is Nemotron 3 Super?

Nemotron 3 Super is a 120B-parameter model built on a hybrid architecture that combines three different technologies: Mamba-2 (a state-space model for efficient sequence processing), standard Transformer attention layers, and a Latent Mixture of Experts (LatentMoE) system. Only 15 billion parameters are active during any given inference pass, which means the model runs much cheaper than its 120B size suggests.

The model was created by distilling knowledge from Nvidia’s larger Llama-3.1-Nemotron-Ultra-253B model down into this more compact form. It’s designed for coding, math, reasoning, and instruction-following tasks.

The Benchmark Numbers

According to Nvidia’s own benchmarks, Nemotron 3 Super:

Ranks #1 on DeepResearch Bench among open models
Delivers 2.2x higher throughput than GPT-OSS (the closest open competitor)
Achieves 7.5x higher throughput than Qwen3-235B-A22B on Nvidia Blackwell hardware
Outperforms Llama 4 Maverick, DeepSeek-R1, and Qwen3 on reasoning tasks

These are Nvidia’s numbers, of course. Independent benchmarks will tell the real story. But the throughput claims are particularly notable — they suggest this model is designed to be fast and cheap to run, not just accurate.

The Blackwell Optimization

Here’s the strategic angle: Nemotron 3 Super is specifically optimized for Nvidia’s Blackwell GPU architecture. It uses NVFP4 (4-bit floating point) quantization, which Nvidia claims delivers 4x faster inference compared to standard FP8 on Blackwell hardware. Translation: this model runs best on Nvidia’s newest, most expensive GPUs.

Give away the model, sell more chips. It’s the razor-and-blades playbook, except the razor costs $30,000 and the blades are free.

The $26 Billion Question

The real headline isn’t the model — it’s the SEC filing. Nvidia disclosed plans to invest $26 billion over five years in developing open AI models. For context, Anthropic has raised roughly $10 billion total. OpenAI’s annual compute budget is estimated at $5-7 billion. Nvidia is planning to outspend both of them — on models it gives away for free.

Why? Because every developer who builds on Nemotron models needs Nvidia hardware to run them efficiently. The models are the marketing budget for the chips.

The License Fine Print

Nemotron 3 Super is released under the Nvidia Open Model License, not a standard open-source license. The key restriction: Nvidia reserves the right to revoke the license if the model is used in ways that violate its safety guidelines. It’s open-weight, not open-source — Nvidia keeps the training data and methodology proprietary, and maintains a kill switch on usage.

The Bottom Line

Nvidia is making a $26 billion bet that giving away AI models will sell more GPUs. Nemotron 3 Super is the first major product of that strategy — a genuinely competitive open model that happens to run best on Nvidia’s own hardware. It’s a smart play: commoditize the model layer to protect the chip layer. The question is whether the AI community will embrace Nvidia’s models or view them as a Trojan horse for hardware lock-in. Either way, when the GPU maker starts outspending the model makers, the competitive landscape just shifted.