World Models

This chapter provides a comprehensive review of world models, learned predictive models of environment dynamics that enable agents to plan, reason, and learn by "imagining" the outcomes of actions. We trace the evolution from Ha and Schmidhuber's foundational work through the Dreamer series (PlaNet, Dreamer v1/v2/v3) to modern foundation-scale systems (Genie, Cosmos, UniSim, Sora). We cover the problem formulation (dynamics modeling, latent vs. pixel-space prediction, planning with world models), a detailed taxonomy organized by prediction space, generative paradigm, application domain, and level of structure, deep dives into classic latent-space models, video prediction (stochastic, autoregressive, diffusion-based, and object-centric approaches), foundation world models, applications in robotics (TD-MPC, DayDreamer) and autonomous driving (GAIA-1, OccWorld), the intersection with reasoning and language (LLM-as-world-model, Dynalang, JEPA), and the integration with reinforcement learning (MuZero, EfficientZero, IRIS, MBPO). We discuss benchmarks, evaluation challenges, and conclude with open problems including long-horizon prediction, compositional world models, scaling laws, and the fundamental question of whether video generators are sufficient as world models.