World models are the next frontier of generative AI; Sweden's second act in tech; all the news from Google I/O; UBS deploys AI analysts; Anthropic's race to AGI; tech CEOs clone themselves with AI
The battle for superstar AI talent; inside the first Stargate AI data center; Europe's share of AI is worryingly small; doing the math on AI's energy footprint; AI is changing weather prediction
All eyes were on Google I/O this week as the company held an AI extravaganza with over 100 updates and new releases of models, products and services. One of the most viral models on display was Veo 3, which lets users generate video with audio, excelling at real-world physics and accurate lip syncing.
On stage, Demis Hassabis, the CEO and co-founder of Google DeepMind, explained that one of the reasons why Veo maintains accuracy and consistency, and Gemini performs so well on reasoning for tasks involving the physical environment is that they rely on Genie 2, Google’s improved world model.
Here’s the relevant section in his keynote, I encourage you to watch the full clip:
Google isn’t the only company building world models. Tesla, Wayve or NVIDIA have similar projects in the works though they are domain limited (self-driving cars and robots, respectively). More recently, Fei-Fei Li threw her hat in the ring with a startup called World Labs which demonstrated a few examples of spatial intelligence at the end of last year.
Yesterday evening however, while Dario Amodei from Anthropic was announcing a new family of Claude models, another event was taking place in parallel at the Computer History Museum in Mountain View, California.
During a keynote presentation, Eric Xing, president and university professor at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), unveiled PAN — an ambitious next-generation world model aiming to revolutionize machine reasoning and intelligence by simulating infinitely diverse realities, from simple physical interactions to complex multi-agent systems.
This is a short clip from his presentation:
Generative AI so far has been largely dominated by models that operate on predicting the next word from a given prompt. PAN takes this concept a significant step further, predicting the next world state. According to MBZUAI researchers, this shift from word-prediction to world-prediction enables an unprecedented range of reasoning capabilities, crucial for tasks that require nuanced interactions with physical reality, complex strategy, and long-term planning.
PAN integrates multi-modal inputs (language, video, spatial data, and embodied actions) to construct an intricate internal representation of the world. Unlike existing models that specialize in specific applications like autonomous driving or robotic manipulation, PAN generalizes across domains, maintaining consistent, interactive control over simulated realities.
A key innovation of PAN is its multi-level latent space reasoning. At its core, PAN employs hierarchical latent representations, where abstract reasoning and fine-grained sensory modalities coexist seamlessly. This architecture enables the simulation of scenarios from multiple perspectives and timeframes, from immediate, detailed interactions such as robotic manipulation to complex, long-term scenarios involving strategic decisions and interactions among numerous agents.
For instance, PAN's demonstrations include simulations of autonomous vehicles navigating dynamic environments, drones exploring various terrains, and robots performing tasks like setting tables or sorting objects. Crucially, PAN’s architecture facilitates dynamic interactions within these simulations, allowing real-time adjustments and actions based on evolving scenarios.
PAN's simulations transcend traditional bounds, extending from realistic physical worlds to surreal or hypothetical scenarios. For example, one demo I saw vividly depicted real-time world transitions, from a snowstorm into a rainforest and then to a volcanic landscape, illustrating PAN's robust handling of diverse environmental conditions.
Further, PAN excels in long-horizon simulations, maintaining quality and accuracy over extended periods, which supports strategic planning and complex decision-making. This sustained performance is particularly relevant in scenarios requiring extensive forward-looking reasoning, such as climate modeling, urban planning, or long-term autonomous operations.
Alongside PAN’s world simulations, MBZUAI demonstrated PAN-Agent, an AI agent with an integrated vision-language model (VLM) designed specifically for multimodal reasoning. PAN-Agent leverages PAN’s simulated environments to perform advanced reasoning tasks, including mathematics and coding, showcasing the potential of world models in refining decision-making processes across diverse and complex scenarios.
Future developments envision PAN-Agent mastering an even broader range of infinitely diverse environments, with capabilities expanding through continuous interaction and reinforcement within PAN’s simulations.
The implications of PAN’s capabilities extend across traditional fields such as robotics and autonomous vehicles, as the world model could be beneficial for strategic planning and disaster forecasting. The model’s ability to manage complex interactions and predict outcomes with remarkable accuracy signifies its potential to surpass human-level reasoning in many contexts, particularly in scenarios too complex for conventional human cognition.
MBZUAI’s PAN model, with its comprehensive multi-modal integration and simulative depth, therefore emerges as a powerful platform redefining the scope and ambition of artificial intelligence, laying the foundation for the next generation of general (and potentially super-) intelligent AI systems.
And now, here are the week’s news:
❤️Computer loves
Our top news picks for the week - your essential reading from the world of AI
Google I/O 2025
Big Technology: Demis Hassabis and Sergey Brin on AI Scaling, AGI Timeline, Robotics, Simulation Theory
TechCrunch: Google rolls out Project Mariner, its web-browsing AI agent
TechCrunch: Google updates the Gemini app with real-time AI video, Deep Research, and more
TechCrunch: Google unveils new AI features coming to Gmail, Docs, and Vids
TechCrunch: Imagen 4 is Google’s newest AI image generator
TechCrunch: Google debuts an AI-powered video tool called Flow
TechCrunch: Google’s NotebookLM is getting Video Overviews
TechCrunch: Veo 3 can generate videos — and soundtracks to go along with them
The Verge: Google made an AI coding tool specifically for UI design
The Verge: Google will let you ‘try on’ clothes with AI
WSJ: Warby Parker Partners With Google to Develop AI Glasses
Bloomberg: UAE’s AI University Aims to Become Stanford of the Gulf
Sifted: Stockholm’s second act: Why the world is watching Sweden’s AI founders
The New York Times: What if Making Cartoons Becomes 90% Cheaper?
Reuters: OpenAI, Google and xAI battle for superstar AI talent, shelling out millions
The Verge: Tech CEOs are using AI to replace themselves
WSJ: Meet Fidji Simo, the Instacart CEO Tasked With Getting OpenAI to Turn a Profit
Bloomberg: Inside the First Stargate AI Data Center
WSJ: The Tech Industry Is Huge—and Europe’s Share of It Is Very Small
Bloomberg: Why the AI future is unfolding faster than anyone was expecting
MIT Technology Review: We did the math on AI’s energy footprint. Here’s the story you haven’t heard.
Bloomberg: Anthropic Is Trying to Win the AI Race Without Losing Its Soul
TechCrunch: MIT disavows doctoral student paper on AI’s productivity benefits
Business Insider: The inside story of how Silicon Valley's hottest AI coding startup almost died
Keep reading with a 7-day free trial
Subscribe to Computerspeak by Alexandru Voica to keep reading this post and get 7 days of free access to the full post archives.