2025 is the year of open-endedness
With Ed Hughes, Staff Research Engineer of Google DeepMind at RAAIS 2025.
If 2016 was the year AI shocked the world by mastering Go, 2025 is shaping up to be the year they learn to innovate. This is the thesis of Ed Hughes, long-time researcher at Google DeepMind and one of the few voices charting a credible path toward AI systems capable of doing science.
In his closing talk at RAAIS this year, Hughes argued that we’re entering a new phase in the evolution of artificial intelligence: one where open-endedness becomes the central organizing principle. Not just solving problems, but defining them. Not just predicting the next token, but surfacing previously unknown unknowns. If he’s right, the next generation of AI systems won’t just be tools, they’ll be participants in the scientific process itself.
Open-endedness as the engine of innovation
The conceptual heart of Ed’s talk is a deceptively simple idea: to build AI that does science, you need to build AI that innovates. And to automate innovation, you need open-endedness: models that continuously generate novelty and ensure that the novelty is learnable.
This distinction matters. Static randomness (like staring at a noisy TV) is novel but useless. AlphaGo’s Move 37, by contrast, was novel and altered the theory of Go. It was innovation in the full sense: a step forward in the knowledge matrix from unknown unknown to known known.
Ed formalizes this through a recursive cycle: conjecture, experiment, analysis, theory-building that mirrors the scientific method. If that sounds familiar, it should: it’s AlphaGo’s architecture, abstracted.
But AlphaGo is bound by the constraints of its game. The real prize lies in domain-general innovation. Ed’s vision is to take the open-ended loop of AlphaGo and scale it to the messy, multidimensional complexity of reality.
Simulators as scientific petri dishes
To get there, DeepMind turned to simulation. In XLand 2.0, a procedurally generated Unity world with a task space of 10⁴⁰ variations, the team trained an Adaptive Agent agent called AdA. This wasn’t just another reinforcement learner. AdA was trained to meta-learn, which means it forms hypotheses, runs experiments, analyzes outcomes, and refines theories in real time, across an ever-shifting task space.
In one task, AdA has to produce a black cube by manipulating various objects under hidden rules. After stumbling through its first attempt, it starts to identify causal rules and eventually solves the task with efficiency and grace. In other words, it discovers, adapts, and improves like a scientist. At 500M parameters, AdA was once among the largest RL agents ever built. Today, that number feels quaint, but the behavior remains profound.
Still, simulation has limits. Even XLand’s enormous variety doesn’t approach the richness of the real world. So DeepMind introduced Genie, a generative world model trained on 200,000 hours of internet gameplay video. Genie can render playable environments from text prompts or images, letting kids play games based on their own drawings or run scientific-style experiments in 2D and 3D simulations.
As in Genie 1, in Genie 2 these worlds evolve further: reflections emerge, parallax motion appears, and consistency across time steps hints at deeper physical priors. With agents like SIMA layered on top, these world models become testbeds for behavior, turning the internet’s passive data into an active substrate for scientific discovery.
From language models to Popperian AI
So far, the story tracks the standard DeepMind arc: large-scale, multimodal systems in custom environments. But Hughes pivots into more controversial terrain with his proposal for AI²: AI systems that can perform AI research.
This is where language models come in. Hughes argues that LLMs, thanks to their training on massive corpora of human discourse, are already latent models of human notions of “interestingness.” They don’t just complete prompts, they filter ideas through a socially tuned aesthetic lens.
In experiments like the Donor Game, LLMs mutate and evolve strategies through gameplay. Claude 3.5 Sonnet forms cooperative cultures. GPT-4o becomes risk-averse. These behaviors, Hughes suggests, can be repurposed: LLMs as mutation operators, selection filters, and even code-generating experimenters.
Enter Popper. Hughes proposes a “Popperian AI” that uses LLMs to propose falsifiable theories, write experiments to test them, and recursively improve itself. This isn’t hypothetical: Sakana’s AI Scientist has already produced auto-generated workshop papers; AlphaEvolve discovered a more efficient matrix multiplication algorithm than any known since 1969.
The most recent instantiation is the Darwin-Gödel Machine, a recursive system that interleaves self-modification with downstream evaluations. It achieves state-of-the-art results on software engineering benchmarks on an academic compute budget.
Safety, culture, and the edge of the unknown
There’s a looming question behind all this: is recursive self-improvement safe? Hughes takes a cultural evolutionary stance. Human societies already implement safe recursive intelligence: through education, institutions, norms. We’re not afraid of raising children, he notes, because we’ve wrapped superintelligence (read: our kids) in culture.
By this logic, open-ended AI systems embedded in human cultural processes may be safer than brittle tools isolated in labs. But that requires visibility, control, and aligned incentives, none of which are guaranteed today.
In a world where benchmarks are saturating and scientific discovery is slowing, Hughes offers a reframing: perhaps the bottleneck is no longer in answering questions, but in asking the right ones. This is the domain of open-endedness. And in 2025, it might just be the most important problem in AI.