NVIDIA Open-Sources DreamDojo: A Robot Brain Trained on 44,711 Hours of Human Video

NVIDIA has released DreamDojo, an open-source foundation world model for robotics that learns how machines should interact with the physical world by watching tens of thousands of hours of human video. The release includes model weights, code, training datasets, and evaluation benchmarks — a comprehensive open-source package that positions the chipmaker at the center of a rapidly accelerating robotics industry.
Learning From Humans, No Physics Engine Required
DreamDojo represents a fundamental shift in how robots learn about the physical world. Traditional robotics simulation relies on physics engines with hand-authored dynamics — painstakingly coded rules about how objects behave. DreamDojo eliminates that entirely. It takes robot motor controls as input and generates predicted future states entirely in pixels, learning physical intuition directly from watching human behavior.
"It's simulation 2.0. Time for robotics to take the bitter lesson pill," wrote NVIDIA's Dr. Jim Fan in the announcement. The "bitter lesson," a concept from AI researcher Rich Sutton, argues that general methods leveraging massive computation always win over approaches relying on human-engineered knowledge — and DreamDojo is that philosophy applied to robotics.
The Largest Robotics Dataset Ever Built
At the core of DreamDojo is DreamDojo-HV, what the research team calls the largest egocentric human video dataset ever assembled for world model pretraining. The numbers are staggering:
- 44,711 hours of human video footage
- 6,015 unique tasks captured
- Over 1 million trajectories
- 15x larger than any prior robotics dataset
- 2,000x more scene-diverse than existing alternatives
Because human videos don't come with robot-specific action labels, the team developed a breakthrough technique called "continuous latent actions" — a self-supervised method that infers what changed between video frames without needing to know the underlying hardware. This allows DreamDojo to treat any first-person video as though it came with motor commands attached, effectively making the entire internet's worth of human activity footage into potential training data.
From Human Video to Robot Hardware
After pretraining on human footage, DreamDojo undergoes a second phase of post-training on target robot data, adapting its broad understanding of physics and object manipulation to specific hardware platforms. The model has been tested with GR-1, G1, and AgiBot humanoid robots.
Through a distillation pipeline, DreamDojo achieves real-time inference at 10.81 frames per second, remaining stable for over a minute of continuous operation. This speed unlocks practical applications including live teleoperation via VR controllers, policy evaluation without physical deployment, and model-based planning that demonstrated a 17% improvement in real-world success rates on a fruit-packing task.
Two Models, Fully Open
NVIDIA released two variants: a 2-billion-parameter model and a 14-billion-parameter version, both pretrained on 256 H100 GPUs and built on the company's open-weight Cosmos-Predict2.5 platform. The project was developed in collaboration with researchers from UC Berkeley, Stanford, the Hong Kong University of Science and Technology, the University of Texas at Austin, and several other institutions.
The Strategic Play
The open-source release is a classic NVIDIA ecosystem play. By giving away the software for free — model weights, code, datasets, and benchmarks — NVIDIA ties the global robotics research community to its hardware and software stack. Every lab, startup, and corporation that builds on DreamDojo will likely do so on NVIDIA GPUs.
NVIDIA CEO Jensen Huang has repeatedly framed physical AI as a generational opportunity, declaring at CES 2026 that "the ChatGPT moment for robotics is here." With robotics startups raising a reported $26.5 billion in 2025 alone, the timing of this release is deliberate. Competitors like Google DeepMind (Genie 3) and 1X Technologies (1XWM) are developing similar world models, but NVIDIA's open-source approach and existing hardware dominance give it a significant structural advantage.
The Bottom Line
DreamDojo isn't just a research paper — it's NVIDIA's bid to become the default platform for the coming robotics revolution. By open-sourcing a robot brain trained on 44,711 hours of human video, NVIDIA is betting that the company that provides the foundation will capture the value as the $26.5 billion robotics industry scales. If the "ChatGPT moment for robotics" truly is here, NVIDIA just made sure it's running on their hardware.