- ExoBrain weekly AI news
- Posts
- ExoBrain weekly AI news
ExoBrain weekly AI news
9th May 2025: o4-mini learns new tricks, robots take the physical Turing test, and the em dash conspiracy

Welcome to our weekly email newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…
Themes this week:
OpenAI's new reinforcement learning service for o4-mini and next generation agents
Nvidia's Jim Fan and his vision for AI passing the “Physical Turing Test”
Em dashes as tell tale signals of AI-generated content
o4-mini goes back to school
OpenAI has quietly made Reinforcement Fine-Tuning (RFT) available for its capable o4-mini model, and whilst this might seem a rather obscure news item, it’s easily the biggest thing to happen in AI this week. While other groups offer equivalent services, this now being available on OpenAI’s slick developer platform, combined with o4-mini's cutting-edge power makes this a big step in creating the next generation of AI agents.
Reinforcement learning or “RL” is not new, but nonetheless is everywhere in AI at the moment, as we covered in our piece on “the age of experience” a few weeks ago. It has enabled powerful new models such as DeepSeek R1 and o3 and delivered some of the first breakthrough agents such as the original Deep Research. The fine-tuning version of RL differs from traditional tuning which is usually just showing an AI examples to copy. Instead, this uses a 'grader' to score responses during training. The AI learns by trying different approaches, gradually understanding the principles of what makes a good response for complex tasks, especially those involving specialised thought processes or the use of tools. It’s about teaching the AI how to achieve a goal, rather than just what the final output looks like, or a rigid set of stepwise instructions. This is vital for building specialist agents and getting them to work more reliably on the real-world tasks (beyond just performing well on maths or coding benchmarks).
As an example, a sophisticated RL fine-tune could train o4-mini to act as a compliance assistant in financial services, teaching it how to interpret specialist information from regulatory documents correctly, or using complicated custom built calculation tools. AccordanceAI worked with OpenAI to use this technique to improve their tax analysis agent TaxBench’s performance by 40%. OpenAI’s Deep Research agent, which analyses web information to produce detailed reports, showcases the power of similar techniques. Deep Research learns through end-to-end training on complex web browsing and writing tasks, benefiting from high-quality data and a strong base model, o3, to develop flexible research strategies. More examples here.
This kind of RL is a powerful technique but not a universal solution to all AI challenges. Success requires a model good world knowledge and tasks where performance can be clearly measured and rewarded, and careful design of the reward mechanism. RL is more about refining and directing an AI’s existing abilities for specific applications than about teaching it entirely new forms of general reasoning. OpenAI's RFT for o4-mini setup aligns with these needs, hence its huge potential. The service is designed to be relatively accessible, with companies able to start with small datasets and fine-tune for a few hours (costing around $100 per training hour). This makes it feasible to experiment and iteratively develop AI tools that are precisely tuned to an organisation's unique requirements.
Takeaways: OpenAI’s new RL fine-tuning service for o4-mini places powerful AI customisation tools into more hands. It enables businesses to develop AI that understands and executes specific tasks with greater precision, especially in specialist areas or when using tools. Last week we covered Claude’s struggles with using new tools made available through its integration features. Fine tuning will give us the tools to teach AI new and specific tricks. While reinforcement learning has its limits, this practical application offers a clear path to building more effective agents for the infinitely varied real world.
The physical Turing test

Jim Fan of Nvidia’s embodied AI group gave a fascinating talk at Sequoia Capital’s AI Ascent get together this week, sharing numerous examples of AI-powered and human-like motion. Fan's “Physical Turing Test” is a challenging vision for embodied AI: coming home to an immaculate living room and candlelit dinner, with no way to tell if a human or machine had cleaned up and prepared a gourmet meal. Fan describes this as "deceptively simple, insanely hard" and the "next North Star of AI", a dream that keeps him working late nights in the lab. There’s no rest for his robots either, who work tirelessly inside the Nvidia’s digital twin, compressing a decade of learning into every few hours.
Takeaways: AI’s learning in simulated environments is another powerful way to make training faster and more effective. Such virtual training grounds allow developers to stress-test agents under tough conditions, ensuring that when deployed to tackle high criticality uses in healthcare or financial services, they are battle-hardened and ready to go.
An em dash conspiracy
Like “delve” before it, the humble “em dash”, often used to break up a sentence like so—is the latest hallmark of AI-generated text, particularly on platforms like Reddit. Data recently analysed showed em dash usage in some business subreddits, such as r/Entrepreneur, quadrupling in just seven months. This has triggered what some call the "em dash conspiracy." Part of the suspicion seems to stem from a practical observation… many users confess they wouldn't even know how to type an em dash easily. Surely, it’s the work of ChatGPT?
Takeaways: With study after study demonstrating current AI detectors are unreliable, inconsistent, and sometimes biased, this could be a useful detection signal. More importantly its obscurity and increasing prevalence is indicative of the lack of care taken when AI generated content is simply pushed out with no human refinement. Don’t be caught out and ensure there’s a “never use "—" (em dashes)” in your prompts or custom instructions!
Weekly news roundup
This week saw major developments in AI infrastructure challenges, significant business acquisitions, and growing regulatory scrutiny, while research continues to advance in reasoning and multimodal capabilities.
AI business news
OpenAI agrees to buy Windsurf for about $3 billion, Bloomberg News reports (Major acquisition showing OpenAI's continued expansion and consolidation in the AI industry)
In depth: 'World's first AI law firm' Garfield Law targets high street practices (Demonstrates how AI is disrupting traditional professional services)
Google shares slump as Apple exec calls AI the new search (Indicates shifting dynamics in tech industry leadership around AI)
Instacart CEO Fidji Simo to join OpenAI as head of applications (Shows top talent migration towards AI-focused companies)
RelevanceAI and Stack AI get millions in funding to bring AI agents into the workforce (Highlights growing investment in practical AI workplace applications)
AI governance news
OpenAI CEO Sam Altman's Senate testimony shows industry shift on regulation (Signals evolving attitudes towards AI oversight among industry leaders)
The US Treasury asked Benchmark whether its Manus AI funding is covered by restrictions on investments in tech destined for "countries of concern" (Illustrates growing scrutiny of international AI investments)
Google rolls out AI tools to protect Chrome users against scams (Shows practical application of AI in cybersecurity)
The people refusing to use AI (Highlights growing resistance movement to AI adoption)
US Department of Labor closes investigation into Scale AI (Important development for AI training data industry)
AI research news
Absolute Zero: Reinforced Self-play Reasoning with Zero Data (Breakthrough in zero-shot learning capabilities)
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models (Comprehensive overview of multimodal AI progress)
AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine (Advances in document processing efficiency)
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models (Important development in model efficiency)
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play (Advancement in voice interaction technology)
AI hardware news
How AI Demand Is Draining Local Water Supplies (Critical environmental impact of AI infrastructure)
xAI to pull half the gas turbines powering Colossus DC (Shows scale of energy requirements for AI operations)
Trump administration to rescind and replace Biden-era global AI chip export curbs (Major policy shift affecting global AI chip market)
What's the carbon footprint of using ChatGPT? (Important environmental impact analysis of AI usage)
Trump's attacks on green energy are big trouble for data centres and AI (Political implications for AI infrastructure sustainability)