ExoBrain weekly AI news

30th May 2025: The Darwin Gödel machine, China forges its own path to AGI, and Claude codes

Joel Miller
May 30, 2025

Welcome to our weekly email newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week:

AI systems learning to rewrite their own code for self-improvement
China's rapid AI advances and strict control mechanisms
Claude 4's code generation skills

The Darwin Gödel machine

For decades, researchers have dreamt of AI that never stops improving. Schmidhuber's Gödel Machine offered a blueprint: AI that rewrites its own code when it can mathematically prove an improvement. Such proof was once a high bar, but the latest generation of AI research abandons proof for pragmatism. Instead of waiting for mathematical certainty, it harnesses evolution's blind search: generate variations, test empirically, keep what works. And towards this goal, models are beginning to teach themselves, not just through massive datasets or human feedback, but by listening to their own internal signals, reasoning beyond words, and discovering algorithms that improve the very systems that created them. Several research papers published this week presage the likely direction of AI models in the coming year, and tell a story of how AI is learning to bootstrap its own intelligence

Traditional AI development has been resource-intensive. Past LLMs depended on millions of labelled examples and computational budgets only tech giants could afford. But the landscape is changing. The INTUITOR system from UC Berkeley demonstrates that language models can achieve competitive reasoning performance using nothing but their own confidence as a reward signal. On mathematical benchmarks, INTUITOR matches traditional methods while achieving a 65% improvement on coding tasks. It's evidence that models possess richer internal signals than we've recognised. Research from Tsinghua's LeapLab reveals current reinforcement learning doesn't actually teach models new reasoning patterns - it simply makes existing patterns more likely to be used. This might sound discouraging, but it could mean the opposite. Are we still barely tapping into these models' true potential?

The implications of internal learning go further than just replacing external methods. Recent work on Hybrid Reasoning Policy Optimization (HRPO) reveals that models can perform reasoning not just through explicit token generation, but within their latent representations - the continuous hidden states that exist between layers. Traditional reasoning approaches force models to "think out loud" through chains of words. But HRPO demonstrates that by gradually blending in hidden states, models achieve superior performance on both knowledge-intensive and mathematical tasks. On challenging benchmarks like MATH, smaller models using this reasoning match or exceed much larger traditional models. These models spontaneously develop cross-lingual reasoning patterns, fluidly integrating concepts across languages within their hidden representations. They produce more compact yet accurate responses, requiring fewer tokens because richer context is encoded in the continuous space. This is evidence that genuine reasoning can occur in the model's inner world, beyond the tokens we observe.

While INTUITOR shows models can learn from internal confidence and HRPO demonstrates reasoning in latent space, Google's AlphaEvolve from a few weeks ago takes the next leap: combining these capabilities with evolutionary search to create genuinely new knowledge. AlphaEvolve pairs Gemini models with automated evaluators in an evolutionary framework, improving upon the most promising ideas over successive generations. This is starting to unlock a form of self-improvement. By discovering better matrix multiplication strategies, it reduced Gemini’s (it’s own) training time by 1% - significant savings given its scale.

But what of a pure bootstrap? The Darwin Gödel Machine (DGM), developed by researchers at UBC and Sakana AI in a paper published yesterday, takes self-improvement to its logical conclusion: AI that directly rewrites its own code to improve performance. Starting from a baseline 20% success rate on SWE-bench (real-world GitHub issues), the DGM autonomously improved itself to achieve 50% - without any human intervention. It discovered improvements like adding patch validation steps, better file viewing tools, and generating multiple solutions to select the best.

Today’s AI agents can be frustratingly unpredictable and limited. But we’re on an evolutionary tree and the path to breakthrough agents must run through these less-performant “ancestors”. Failed experiments are the essential stepping stones. A Darwinian process is underway, an expanding space of possible minds, and each new branch is potentially superior.

Takeaways: Experimental AI can now learn from internal signals, discover better algorithms, and rewrite their own code. The combination of reasoning models with evolutionary search is about to unlock genuine self-improvement. As these systems enhance the very infrastructure and algorithms that create them, we may be approaching a future where AI progress accelerates itself.

China forges its own path to AGI

China's AI capabilities continue advancing rapidly across models, silicon and infrastructure, with DeepSeek's latest R1-0528 update claiming performance parity with the latest generations from OpenAI, Google and Anthropic. The updated model (rumoured to be R2 but given its modest performance bump, kept as a point release to R1) released this week, offers a reported reduction in hallucinations whilst adding enhanced creative writing and improved code generation.

On the hardware front, the shift to domestic silicon is accelerating. With Trump administration restrictions on Nvidia's H20 chip and current supplies projected to last only until early next year, China's tech giants, Alibaba, Tencent, and Baidu, are racing to adopt local alternatives. They're testing Huawei's Ascend chips despite US warnings of potential penalties for global use, whilst pursuing a hybrid strategy: using remaining Nvidia chips for AI training while shifting to domestic processors for inference tasks. Moving from Nvidia's CUDA to Huawei's CANN framework won’t be easy however and could cause development delays. Huawei is expanding production but cannot yet meet demand, prompting companies to explore alternatives like Cambricon and Hygon processors, with some investing in building their own chips. Meanwhile, progress continues elsewhere. Lisuan Technology successfully booted its 6nm GPU targeting RTX 4060 performance, with mass production scheduled for 2026. Huawei's next-generation Ascend 910D aims to surpass Nvidia's H100, accepting that achieving competitive performance requires more chips and significantly higher power consumption, inefficiencies China views as the price of sovereignty.

These advances come as Nvidia CEO Jensen Huang criticised US export controls at Computex. "Export control was a failure," Huang stated, revealing Nvidia's China market share plummeted from 95% to 50% whilst achieving little strategic benefit. The restrictions resulted in "multiple billions of dollars" in inventory write-offs. Nvidia may release a compliant Blackwell chip, stripped of features like high-bandwidth memory, as early as July, though Huang admitted navigating restrictions proves "quite complicated."

China's coordinated push extends beyond individual breakthroughs. With over 250 AI data centres operational or announced, plus orbital computing networks already launching satellites, the infrastructure foundation is also being built out. The state directs banks to fund strategic sectors regardless of profitability, ensuring AI companies access capital even if they never generate returns, a socialist approach to technology development that prioritises output over profit. But China’s top-down AI strategy as far as it can be known externally, still has elements of caution. Ding Xuexiang, Xi's trusted lieutenant overseeing AI development, revealed the Party's dual approach at Davos: "We need to invest in AI, but we can't go all out without knowing what the brakes are. We have to develop the brakes at the same time." The Party fears AGI could become a tool for hostile actors to undermine CCP authority. Every major tech company in China already has teams who can "pull the plug" if algorithms generate sensitive content – a control mechanism certain to extend to AGI development.

Takeaways: China is in the AI race, but it’s also simultaneously building up its control mechanisms. The Party wants AGI's power but fears its potential to erode control, creating a unique development path where every breakthrough comes paired with a kill switch. As tech giants urgently pivot to domestic chips and models match US capabilities, China demonstrates that authoritarian AI development can advance rapidly despite technical hurdles, pursuing somewhat different priorities than Silicon Valley's increasingly unfettered race to AGI.

Claude codes

This chart captures Claude's improved coding capabilities. Before 23 May, syntax error rates fluctuated between 0.15% and 0.25% across different programming languages and application scenarios generated on the Loveable AI app creation platform. After the Claude 4 rollout error rates dropped and have remained consistently low. Feedback on the new Claude has ranged from disappointment to excitement, but if it’s code you’re interested in generation 4 is a step up.

Weekly news roundup

This week showcases major enterprise AI adoption with Barclays' massive Copilot deployment, growing concerns about AI's impact on entry-level jobs, and continued strong demand for AI infrastructure despite market volatility.

AI business news

Barclays Bank signs 100k license Copilot deal with Microsoft (Demonstrates massive enterprise-scale AI adoption in financial services, signalling AI's mainstream acceptance in highly regulated industries.)
Box's stock jumps 17% as it beats expectations and raises full-year forecast on agentic AI boost (Shows market validation for companies successfully integrating agentic AI capabilities into their products.)
Perplexity offers training wheels for building AI agents (Democratises AI agent development, making advanced AI capabilities accessible to non-technical users.)
Q&A with Mary Meeker on the AI revolution (Insights from influential tech analyst provide strategic perspective on AI's transformative potential.)
Black Forest Labs' Kontext AI models can edit pics as well as generate them (Advances in multimodal AI show progress beyond simple generation to sophisticated editing capabilities.)

AI governance news

AI may already be shrinking entry-level jobs in tech, new research suggests (Critical data on AI's immediate labour market impacts, particularly relevant for workforce planning.)
AI Safety Institute to be renamed Center for AI Safety and Leadership (Signals shift in government approach from safety-focused to leadership-oriented AI policy.)
Google AI Overviews Says It's Still 2024 (Highlights ongoing reliability challenges in deployed AI systems, even from major tech companies.)
Anthropic boss: AI will take half of entry level jobs in the UK (Frank assessment from leading AI company about significant job displacement potential.)
The people who think AI might become conscious (Explores emerging philosophical debates about AI consciousness that could impact future regulation.)

AI research news

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models (Advances in making language models better at reasoning through reinforcement learning techniques.)
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents (Improved benchmarking for AI coding agents addresses critical evaluation challenges.)
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows (New framework for assessing AI's capability in complex scientific research applications.)
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations (Breakthrough in processing structured data, crucial for enterprise AI applications.)
BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs (New benchmark specifically designed for financial AI applications addresses industry-specific needs.)

AI hardware news

Nvidia delivers another earnings and revenue beat on rampant data center growth (Continued explosive demand for AI infrastructure shows no signs of slowing.)
Why export restrictions aren't the only thing to pay attention to in Nvidia's earnings (Geopolitical factors increasingly shape AI hardware market dynamics.)
Dell warns 'nonlinear' demand for AI servers may 'persist' (Market volatility in AI infrastructure suggests unpredictable adoption patterns.)
German consortium in talks to build AI data centre, Telekom says (European efforts to build sovereign AI infrastructure capabilities accelerate.)
Taiwan semiconductor boom runs on exploited migrant labor (Ethical concerns emerge about labour practices in critical AI supply chains.)