NVIDIA Cosmos 3: The First Fully Open 'Omnimodel' for Physical AI

NVIDIA just released a single AI model that can see, hear, read, generate video — and tell a robot how to move. Cosmos 3 is the company's biggest bet yet that AI's next era happens in the physical world. Here's what it does and why it matters.

For years, NVIDIA sold the shovels for the AI gold rush — the chips that train chatbots. Now it wants to define what comes next. At its GTC keynote in Taipei, CEO Jensen Huang unveiled Cosmos 3, which NVIDIA calls the first fully open "omnimodel" for physical AI: a single model that can understand and generate text, images, video, sound — and the actual movements a robot makes.

It's a striking shift. Instead of a model that writes essays, Cosmos 3 is built to perceive and act in the real world. Huang framed it bluntly: "the big bang of physical AI is just around the corner." Here's what was announced and why it could matter as much as the chatbot boom did.

The News in Brief

  • What: Cosmos 3, NVIDIA's open foundation model for physical AI — "the first fully open omnimodel."
  • Modalities: natively understands and generates text, images, video, ambient sound, and actions.
  • Scale: trained on 20 trillion tokens of multimodal data.
  • Sizes: Cosmos 3 Super (32B) and Cosmos 3 Nano (8B).
  • Alongside it: the Vera Rubin platform in full production, a new Vera CPU for AI agents, RTX Spark, and a memory partnership with SK hynix.

What Cosmos 3 Is

Cosmos 3 is a world foundation model — an AI trained to understand how the physical world looks, sounds, and behaves. Where a language model predicts the next word, a world model predicts what happens next in a scene: how objects move, how a room is laid out, what a robot arm will touch if it moves a certain way.

What makes Cosmos 3 stand out is that it doesn't just watch the world — it can generate actions. The model can output numerical action data such as joint angles, gripper positions, and trajectory points: the precise instructions a robot needs to complete a task. It can also understand and simulate a scene from any perspective, first-person or third-person, which is exactly what an autonomous machine needs to plan its next move.

NVIDIA released Cosmos 3 in two sizes so it can run everywhere from a data center to an on-board robot computer:

Version Parameters Best for
Cosmos 3 Super32 billionMaximum capability — research, simulation, complex reasoning
Cosmos 3 Nano8 billionEfficient, on-device and robotic hardware where speed and power matter
Diagram of an AI omnimodel processing text, image, video, sound and action modalities in one system

What "Omnimodel" Actually Means

You've heard "multimodal" — models that handle a few input types, like text plus images. NVIDIA is pushing further with omnimodel: one system that natively handles many modalities, both in and out. Cosmos 3 spans five: text, images, video, ambient sound, and physical actions.

Capability What it enables
TextUnderstand instructions and describe scenes in language
Images & videoPerceive and generate visual scenes from any viewpoint
Ambient soundUse audio cues to understand and simulate environments
ActionsOutput robot motion data — joint angles, gripper positions, trajectories
World simulationPredict what happens next, so machines can "think before they act"

The point of folding all of this into one model is coherence: a robot that sees, hears, and acts using the same underlying understanding is far more capable than one stitched together from separate, disconnected models.

The Big Bang of Physical AI

NVIDIA's pitch is that AI is leaving the chat window. Huang described physical AI as the frontier where agents "don't just read and write text, but perceive, reason and act in the real world." Cosmos 3 is the model layer for that vision — the brain that robots, autonomous machines, and industrial systems can run on.

"The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models." — Jensen Huang, NVIDIA CEO

It's a vision shared across the industry. Jeff Bezos just raised $12 billion for a startup building an "artificial general engineer" aimed squarely at the physical world — a sign that some of the biggest names in tech are converging on the same bet. (We covered that in our piece on Bezos's Prometheus.)

The Hardware Behind It

A model this ambitious needs serious silicon, and NVIDIA used the keynote to show it. The Vera Rubin platform — the successor to Grace Blackwell — is ramping into full production, with a supply chain NVIDIA says is twice the size of its predecessor. The company also introduced Vera, a CPU purpose-built for AI agents that it claims completes tasks up to 1.8x faster than x86 chips, and showed RTX Spark, a PC superchip pairing its AI platform with Windows.

To feed all of that, NVIDIA announced a multiyear memory partnership with SK hynix to co-develop next-generation memory for Vera Rubin supercomputers, Vera CPUs, RTX Spark PCs, and Jetson Thor robotics platforms. Samsung, SK hynix, and Micron were all named as HBM4 suppliers for Vera Rubin — a reminder that the AI build-out is reshaping the entire memory industry, too.

Why "Fully Open" Is a Big Deal

Perhaps the most strategic part of the announcement is the word open. By releasing Cosmos 3 openly — in both a powerful 32B "Super" version and an efficient 8B "Nano" version — NVIDIA invites every robotics startup, lab, and manufacturer to build on its model instead of starting from scratch. That seeds an entire ecosystem of physical-AI applications that, conveniently, all run best on NVIDIA hardware.

It's the same playbook that made NVIDIA indispensable in the chatbot era, now aimed at robots and machines. And it arrives just as the economics of AI are shifting under everyone's feet — smaller, cheaper, openly available models are increasingly doing the heavy lifting, a trend we unpacked in our look at the 2026 AI price war.

Frequently Asked Questions

What is NVIDIA Cosmos 3?

Cosmos 3 is NVIDIA's open foundation model for physical AI. NVIDIA calls it the first fully open "omnimodel" — a single model that can natively understand and generate text, images, video, ambient sound and actions. Trained on 20 trillion tokens of multimodal data, it is designed to perceive and simulate the physical world and to output the actions a robot needs to perform a task.

What is an omnimodel?

An omnimodel is an AI model that handles many input and output types — or modalities — in one system. Cosmos 3 works across text, images, video, ambient sound and physical actions natively, rather than bolting separate models together. Crucially, it can both understand these modalities and generate them, including numerical action data like joint angles and gripper positions for robots.

What sizes does Cosmos 3 come in?

Cosmos 3 is available in two sizes: Cosmos 3 Super, a 32-billion-parameter model for maximum capability, and Cosmos 3 Nano, an 8-billion-parameter model optimized to run efficiently on smaller, on-device or robotic hardware. Both are released openly so developers and researchers can build on them.

What is physical AI?

Physical AI refers to AI systems that perceive, reason about and act in the real world, rather than only processing text or images. NVIDIA CEO Jensen Huang says "the big bang of physical AI is just around the corner," driven by advances in multimodal reasoning, vision and world models. It powers robots, autonomous machines and other systems that interact with physical environments.

What else did NVIDIA announce alongside Cosmos 3?

NVIDIA said its Vera Rubin platform is ramping into full production with a supply chain twice the size of Grace Blackwell, and unveiled Vera, a new CPU built for AI agents that completes tasks up to 1.8x faster than x86 chips. It also showed RTX Spark, a PC superchip, and announced a multiyear memory partnership with SK hynix to co-develop memory for Vera Rubin, Vera CPUs, RTX Spark PCs and Jetson Thor robotics platforms.

Final Thoughts

Cosmos 3 is NVIDIA planting a flag in the ground for the next phase of AI. If the last three years were about teaching machines to talk, the next few look set to be about teaching them to act — to drive, build, assemble, and assist in the physical world. By making the model open and tying it to its hardware roadmap, NVIDIA is positioning itself to power that shift end to end.

Physical AI is still early, and turning impressive demos into safe, reliable robots in the real world is hard. But the direction is unmistakable, and the biggest players are all moving the same way. We'll keep covering it — from chips and models to the companies betting billions on it, like Jeff Bezos's Prometheus. Bookmark SaveDelete for clear, no-hype AI coverage.