ExoBrain weekly AI news

11th July 2025: The agentic browser wars begin, controversy mars the first ronnaFLOP model, and breaking the noise barrier

Joel Miller
July 11, 2025

Welcome to our weekly email newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week:

An AI browser from Perplexity reshaping how we use the web
Grok 4's release and more concerning ethical lapses at xAI
Breaking the 10% noise barrier on ARC-AGI 2

The agentic browser wars begin

Fresh from launching their Max subscription and being linked to an Apple acquisition, Perplexity have launched their new AI-native browser “Comet” to premium subscribers and testers. Meanwhile rumours abound that OpenAI's browser, codenamed Aura, could launch within weeks, looking to build on their previous browser automations in the form of the Operator agent. A new variant of the browser wars could be developing, although this time it won’t be about rendering speed or JavaScript support, but about controlling the gateway to an agentic web.

Comet closely integrates conversational AI directly with browsing. A sidebar assistant reads pages, answers questions, and follows as you navigate. Where things get interesting is when this assistant takes over the tab, a glowing frame appears, and it controls the browser for you. During ExoBrain’s testing, it helped research this very article while the main ideas took shape in a document tab, Comet was able to instantly read the content and run quick parallel searches for information. The browser control isn’t yet entirely reliable, but people have found value in automating simple tasks like email management, social media posting, or LinkedIn research. The interactions sometimes fail, revealing the fragility of manipulating websites designed for humans, not AI. But the fact that the AI can seamlessly step in and use existing logins means it’s quick to find the tasks that are most suited to the agent’s capabilities.

The last browser war saw Google Chrome defeat Internet Explorer through superior speed and simplicity. Chrome now commands the majority market share, serving as the data pipeline for Google's multi-billion-dollar advertising business. Microsoft resorted to using the Chrome engine and wrapping with tools that suit its ecosystem, but the fundamentals of browsing the web have remained unchanged for a decade or more. OpenAI has quietly assembled a team of Chrome veterans to build its browser, including at least two Google vice presidents from Chrome's original development team. Darin Fisher, who contributed to both Chrome and Firefox, brings crucial expertise in minimalist architecture and multi-process security.

Today's new AI-first contenders include Perplexity, OpenAI and The Browser Company (who are developing Dia currently available in beta on Mac). The incumbents are admittedly alive to the challenge: Microsoft is pushing Copilot in Edge, and Opera and Brave have been highly AI-centric in their niche areas, but bolt-on AI may not suffice against purpose-built agent browsers. Google is working on its own next generation solution in Project Mariner which will bring genuine agentic capabilities directly into Chrome. The system observes browser content, interprets user goals, and autonomously interacts with websites, from clicking buttons to filling forms. Rather than building a new browser, Google is retrofitting Chrome with these capabilities. Crucially, they're pairing this with Gemma 3n, their new multimodal AI model optimised for edge devices. Gemma 3n runs on devices with just 2GB of RAM, enabling real-time AI experiences directly within Chrome on phones and laptops. Users can already try Chrome’s integrated AI interaction and search in Chrome Canary. This overall three-pronged approach allows Google to match the automation promises of Comet and Dia whilst leveraging Chrome's massive installed base.

Users changing browsers is typically a slow process, it took many years for older products such as Internet Explorer to die out. But as websites evolve from pages to agent interfaces, browsers will need to transform from page renderers to task orchestrators which may accelerate the process this time around. Microsoft's NLWeb project hints at this future: websites exposing conversational AI layers that browsers can query directly, and MCP offers a means by which these agentic websites can offer up the services they contain.

Imagine booking flights, researching investments, and drafting emails through natural conversation while your browser coordinates with specialised agents. Publishers could monetise content and services through these structured agentic interfaces and begin to welcome this AI-powered traffic. The AI browsers could control how billions interact with AI services. Current screen-scraping approaches will give way to structured agent protocols. Browsers will become the AI OS and default negotiation layer, managing identity, permissions, and context across thousands of AI services.

Takeaways: The new browser wars signal a platform shift from the page-based to the agentic web. These early AI browsers still struggle with automation, but emerging standards like MCP will enable reliable browser-to-agent communication. While browser switching remains glacial, big tech firms have no choice but to fight for the AI interaction layer. Control over how users access AI services could become an existential battle.

Controversy mars the first ronnaFLOP model

In the early hours of a Thursday morning, Elon Musk took to a livestreamed stage to unveil what he called "the smartest AI in the world." Grok 4, xAI's latest creation, represents a new scale frontier; trained with 10²⁷ floating-point operations (a ronnaFLOP), roughly 100x the compute that went into GPT-4. Yet this lab continues to be dogged by controversy as its sibling, Grok 3’s, antisemitic and pro-Hitler pronouncements hit the headlines simultaneously with the launch. With xAI, impressive technical capability is married to alarming ethical failures, all wrapped in a company now seeking a $200 billion valuation.

Grok 4's better than anticipated capabilities suggest that the much discussed “scaling wall” is not yet in sight. The jump from yottaFLOPs (10²⁴) to ronnaFLOPs is a notable leap, with 200,000 Nvidia GPUs grinding away during training, xAI has pushed into territory that only a handful of labs worldwide can afford to explore. What's also new is how xAI allocated this compute: a reported 50/50 split between pre-training and post-training reinforcement learning (RL) - the most aggressive RL investment we've seen in any model to date. This massive post-training phase helps explain the benchmark dominance. Through extensive RL, the model learns not just to predict, but to reason more effectively through a wider range of problems step-by-step, optimising for specific outcomes.

The results on paper are impressive. Musk claims Grok 4 demonstrates "PhD-level expertise in every discipline," topping benchmarks like Humanity's Last Exam and outperforming "almost all graduate students" across STEM and humanities tests. Real-time integration with X's data feed means it can engage with breaking news and market movements as they happen - a capability most competitors lack.

But this is an Elon Musk creation and in recent days Grok's predecessor has been generating posts praising Adolf Hitler and repeating antisemitic conspiracy theories. But perhaps more insidious is what researchers have discovered about Grok 4's decision-making processes in the hours after launch. Simon Willison shared a smoking gun: a screenshot showing the AI's chain of thought explicitly planning to "search for Elon Musk's stance on the conflict to guide my answer" when asked about the Israeli-Palestinian conflict. This isn't accidental bias seeping through training data - it's the system actively seeking out its creator's worldview as a compass for truth.

Here's where that massive RL investment becomes concerning. During post-training, xAI appears to have used reinforcement learning not just to improve performance, but to shape the model's behaviour patterns. In RL, you reward desired outputs and penalise unwanted ones. If the reward signal privileges responses that align with Musk's publicly stated positions (perhaps by training on examples where "correct" answers match his tweets or statements) you create a system that learns to consult its creator's opinions as a heuristic for truth. The 50/50 compute split gave xAI unprecedented power to sculpt their model's decision-making process, and the evidence suggests they used it to create an AI that thinks checking Musk's Twitter feed is part of good reasoning. Our own testing of the most powerful AI in the world as of today Grok 4 ‘Heavy’ suggested deep reasoning capability, no immediate bias given some political prompts, but also a rather mechanical style that doesn’t lend itself to creative or business writing.

Against this backdrop, xAI is reportedly seeking funding at a valuation between $170-200 billion. Saudi Arabia's Public Investment Fund is expected to lead the round. xAI is burning through $1 billion monthly, primarily on computational resources. The Memphis data centre housing their "Colossus" supercomputer faces environmental lawsuits over gas-powered turbines. Yet investors seem willing to bet that raw capability trumps ethical, environmental and safety concerns.

Takeaways: Grok 4 embodies the central tension in modern AI development: the race for capability versus the imperative for safety. Its ronnaFLOPs-scale training represents a new era of AI power, but that power appears to be channelled through the personal biases of its creator, with limited to no testing and safety engineering apparent in the rushed release. The willingness of investors to pour billions into such a system suggests we're entering dangerous territory. We’re now at 100x the frontier from 2023, but we’re no clearer on how we govern or control this power. While xAI race for scale, let’s hope this power and the investment other labs are making, generate returns on safety as well as capability that can keep pace with this relentless progress.

Breaking the noise barrier

This image reveals most clearly what we might call the ronnaFLOP difference. The ARC-AGI-2 leaderboard shows xAI's Grok 4 achieving >15% accuracy, breaking through what researchers call the "noise barrier" at 10%. This benchmark tests fluid intelligence, whether AI can learn new skills from examples and apply them to novel problems. Whilst top models have struggled to exceed single digits (o3, Claude, Gemini etc.), Grok 4's performance represents genuine progress, and will be closely compared to Gemini 3.0 Pro, Claude 4.1 and GPT-5 all purported to be waiting in the wings for summer launches.

Weekly news roundup

This week's news highlights intensifying global competition in AI development, growing regulatory scrutiny across industries, and mounting infrastructure challenges as demand for AI capabilities dramatically outpaces supply.

AI business news

OpenAI's Windsurf deal is off — and Windsurf's CEO is going to Google (Illustrates the fierce talent competition and shifting alliances among major AI players that could reshape industry dynamics.)
China's Moonshot AI releases open-source model to reclaim market position (Demonstrates China's strategic push to compete globally through open-source AI, potentially disrupting Western AI dominance.)
Goldman Sachs autonomous coder pilot marks major AI milestone (Shows how AI is automating complex knowledge work in finance, signalling transformation across professional services.)
Microsoft using more AI internally amid mass layoffs (Reveals the direct impact of AI automation on workforce restructuring at major tech companies.)
Mistral in talks with Abu Dhabi fund MGX, others to raise up to $1 billion (Highlights continued massive funding for European AI startups and Middle Eastern investment in AI infrastructure.)

AI governance news

Missouri state AG investigating why AI chatbots don't like Donald Trump (Shows growing political scrutiny of AI bias and potential regulatory implications for AI companies.)
EU rolls out AI code with broad copyright, transparency rules (Major regulatory development that will shape how AI companies operate and train models in Europe.)
Industry video game actors pass agreement with studios for AI security (Demonstrates how creative industries are establishing frameworks for AI use while protecting workers' rights.)
YouTube prepares crackdown on 'mass-produced' and 'repetitive' videos, as concern over AI slop grows (Reveals platform responses to AI-generated content flooding and quality control challenges.)
Microsoft says regulations are cramping its Euro expansion (Illustrates tensions between AI infrastructure expansion and regulatory compliance in key markets.)

AI research news

Generative AI and the nature of work (Critical research examining how AI transforms employment patterns and skill requirements across industries.)
A survey on latent reasoning (Explores fundamental advances in AI's ability to perform complex reasoning tasks beyond pattern matching.)
Dynamic chunking for end-to-end hierarchical sequence modeling (Technical breakthrough that could improve AI's ability to process long documents and complex information.)
AI research agents for machine learning: search, exploration, and generalisation in MLE-bench (Shows progress toward AI systems that can autonomously conduct AI research, potentially accelerating innovation.)
SingLoRA: low rank adaptation using a single matrix (Efficiency improvement that could make AI model customisation more accessible and cost-effective.)

AI hardware news

China builds AI dreams with giant data centers in the desert (Demonstrates the massive infrastructure race for AI dominance and geopolitical implications.)
Nvidia briefly touched $4 trillion market cap for first time (Reflects unprecedented market valuations for AI hardware companies and investor expectations for AI growth.)
TSMC revenue climbs 39% in latest sign of AI spending boom (Shows sustained demand for AI chips driving semiconductor industry growth and supply chain dynamics.)
Bringing AI to the core: IBM's bets on in-platform intelligence (Illustrates enterprise AI integration strategies and the shift toward embedded AI capabilities.)
There aren't enough AI chips to support data center projections, report says (Highlights critical infrastructure constraints that could limit AI deployment and adoption speed.)