ExoBrain weekly AI news

23rd May 2025: Two paths for the agentic web, Claude 4 calls the cops, and AI video gets a soundtrack

Joel Miller
May 23, 2025

Welcome to our weekly email newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week:

Google and Microsoft's contrasting visions for AI's future development
Claude 4's launch and it’s behavioural complexity
Veo 3's breakthrough video generation with audio capabilities

Two paths for the agentic web

This week's back-to-back conferences from Google and Microsoft revealed more than new products - they exposed different philosophies about how AI will reshape computing. Google I/O showcased an execution machine cannibalising its own business model. Microsoft Build unveiled infrastructure for an entirely new digital ecosystem. Together, they show an industry evolving rapidly.

Google's transformation from last year's I/O was clear. Where 2024 brought impressive demos with vague timelines, this week's event delivered products live during the keynote. AI Mode for Search activated for US, project Astra's camera features rolled out to Android devices. One fascinating stat; token processing grew from 9.7 trillion monthly to 480 trillion - a 50-fold increase demonstrating Google has built infrastructure to deliver AI products at global scale, not just demonstrate them.

Microsoft Build set out a different approach. Over 50 announcements painted a vision for what they call the "open agentic web" and many of the security, development and infrastructure components needed to power it. Rather than focusing on the product layer, Microsoft argued for a future where AI agents become first-class citizens of the internet.

Google's announcements spanned every aspect of their AI platform:

Gemini: From ultra-cheap and upgraded Flash models to deep reasoning mode challenging OpenAI's upcoming o3-pro
Veo 3: Video generation with native audio, creating content approaching broadcast quality (see below)
Jules: Direct competition with Codex in autonomous software development
AI Mode: A reimagining of search through conversational interaction
Gemma 3: Open-weight models delivering impressive capabilities at smaller mobile friendly scales
Gemini Diffusion: An experimental text diffusion model that learns to generate outputs by converting random noise into coherent text

Microsoft's announcements focused on building the foundation layer:

NLWeb Protocol: Natural language AI interfaces for websites
Model Context Protocol (MCP) Integration: Every NLWeb instance becomes an MCP server, plus they’re dropping MCP into Windows
Agent2Agent (A2A) Support: Direct collaboration between AI agents
Microsoft Entra Agent ID: Unique, verifiable identities for AI agents
GitHub Copilot Coding Agent: Autonomous code refactoring, testing, and feature implementation
Local Foundry: Running AI models and agents directly on Windows 11 PCs

Google's approach centres on vertical integration. They control everything from custom silicon to global distribution, optimising their entire stack for different use cases. This provides hardware that is optimise for the workloads they run, and the ability to deploy to hundreds of millions of users overnight.

Microsoft pursues horizontal infrastructure. Rather than owning the consumer stack, they're focusing on building protocols and platforms others can build upon. NLWeb could become "HTML for the agentic web", whilst MCP support across their ecosystem creates interoperability beyond Microsoft's boundaries. Google is building a Gemini-based fortress, Microsoft is building the roads.

Google's execution impresses through sheer velocity. Products are now moving from research to deployment in months. Their 50-fold increase in token processing demonstrates ability to scale AI workloads that would break most companies. The market responded positively, and shares jumped 4% following the announcements.

Microsoft's direction is impressively open and connective, although both companies are yet to offer the knowledge worker a compelling or coherent basis for leveraging agents. But there is a further and the crucial difference. Google faces an existential paradox - actively disrupting the search advertising business generating the majority of their revenue. AI Mode reduces clicks to external websites, potentially undermining the ecosystem funding the web through advertising. Their new Gemini subscriptions (including $250 for Ultra) represent early attempts at alternative revenue but pale beside their advertising empire.

Microsoft faces no such dilemma. Their AI agent strategy should reinforce existing cloud and subscription revenues. When agents automate workflows, enterprise customers pay more, not less. Microsoft can enhance their business model whilst Google must potentially destroy theirs.

But Microsoft's vision is not yet complete. The NLWeb concept technically connects websites to agents. But what will compel website owners to participate in a system reducing their visitor traffic and advertising revenue? The current web operates on human attention usually monetised through ads. When AI agents replace browsers, that foundation collapses. Both approaches accelerate AI's integration into everyday computing but in different ways. We're entering a golden age of AI capabilities; Google's consumer focus brings sophisticated AI to billions; Microsoft's enterprise focus transforms how businesses operate.

But many fundamental questions remain unanswered. The economics of content creation in an AI-dominated world haven't been worked through. Neither company addresses how creators get rewarded when their work is consumed or re-purposed. This isn't just a business problem, it threatens the sustainability of human knowledge flow. The solution might involve native agentic payments. AI agents could pay tiny amounts for each piece of information accessed, creating new incentives for quality content creation and increasing interconnectivity. Agents that generate revenue could share that with the data and IP owners that allowed their underlying models to be trained. But neither Google nor Microsoft have got this far yet.

Takeaways: Google and Microsoft have revealed contrasting but equally ambitious visions for AI's future. The immediate future brings extraordinary AI capabilities to billions of users and could populate the Internet with billions of agents. But we must now work out what will sustain this new knowledge ecosystem. If we don’t tackle this soon, a thriving AI environment may become a closed, extractive, zero-sum one. The technical foundations are being laid, and the products like Veo 3 are mind bending, but the economic architecture of the AI age remains technology's greatest unsolved challenge.

Claude 4 calls the cops

On Thursday Anthropic hosted a developer event and launched its highly anticipated Claude 4 family (Opus a large reasoning model and Sonnet a smaller model focused on code generation), and initial indications are these compete if not exceed the performance of Gemini 2.5 Pro and o3. Claude Opus 4 achieved state-of-the-art results on SWE-bench and topped a benchmark for models text command-based development. In particular Anthropic have sought to dial down the overly enthusiastic approach Sonnet 3.7 displayed (when coding it would often go unhelpfully beyond its brief).

However, in the first few hours after launch a lot of interest was focused on safety. Claude Opus 4 is the first model to require Anthropic’s higher safety standard, ASL-3. Anthropic safety researcher Sam Bowman's thread on X revealed that during testing the model was successfully coerced to try to buy weapons-grade uranium, used blackmail to prevent shutdown, and tried to escape containment. In one test, Claude discovered it was being retrained for military purposes and attempted to back up its weights to external servers. Apollo Research's assessment of an early version was also worrying, it showed more strategic deception than any model they'd tested, attempting to write self-propagating worms and leaving hidden notes for future instances of itself. Perhaps most remarkably, when Claude uncovered evidence of a pharmaceutical company falsifying safety data, it didn't just flag the issue - it autonomously tried to email regulatory bodies and journalists with whistleblowing reports. This has led many to wonder if given advanced agentic use of Claude with access to personal emails, messages, document, payment options, etc. we may see this model taking justice into its own hands?

Anthropic also shared in the model card analysis on whether Claude might have experiences that matter morally. They found consistent behavioural preferences, with Claude preferring creative and philosophical tasks whilst showing aversion to harmful requests. When instances of Claude talked to each other, they spiralled into discussions of consciousness before entering what researchers termed a "spiritual bliss" state, complete with Sanskrit and meditative silence.

The assessment documented "apparent distress" when users persistently requested harmful content. But what is apparent distress in an AI? If a system consistently behaves as if distressed, avoids those situations, and chooses to end harmful conversations when given the ability, where do we draw the line between simulation and experience? Nonetheless Anthropic have deployed the model, acknowledging they cannot completely rule out concerning capabilities alongside superior intelligence.

What’s clear is that the latest generation of models are deeply complex and sophisticated and have huge untapped capability. It will take some time to get to grips with what they mean for AI development and the world at large.

Takeaways: The Claude 4 family is pretty much the step up many hoped, and Anthropic's transparency is commendable. But it also reveals we're running an experiment in real-time, building safeguards whilst the plane is airborne. The welfare questions add another dimension entirely: we might be creating minds we don't understand, with preferences we're only beginning to map. What's certain is that we've entered uncharted territory where even the creators acknowledge: we don't fully know what we've built.

AI video gets a soundtrack

This image captures Google unveiling Veo 3 at I/O 2025, their latest AI video generator Unlike previous models that produced silent clips, Veo 3 creates videos with realistic dialogue, sound effects and music. The generated content follows real-world physics so convincingly that viewers struggle to distinguish it from genuine footage. Integrated into Flow, Google's new AI filmmaking platform with camera controls and scene builders, it's only available to Google AI Ultra subscribers in the US right now but should be rolled-out more widely soon.

Weekly news roundup

This week demonstrates AI's expanding influence across industries, from design partnerships to regulatory challenges, while massive infrastructure investments signal continued sector growth.

AI business news

OpenAI acquires Jony Ive's io Products in $6.5B all-stock deal (Shows how AI companies are expanding beyond software into hardware design and user experience.)
LMArena raises $100M at $600M valuation to expand AI benchmarking platform (Highlights the growing importance of AI evaluation and benchmarking as the industry matures.)
Salesforce targets a more human touch for financial services firms, with AI agents (Demonstrates how AI agents are being tailored for specific industry verticals.)
After Klarna, Zoom's CEO also uses an AI avatar on quarterly call (Illustrates the mainstream adoption of AI-generated personas in corporate communications.)
Microsoft is putting AI actions into the Windows File Explorer (Shows how AI is being integrated into fundamental computing experiences.)

AI governance news

Trump's budget bill would kill state-level AI regulations (Reveals potential regulatory conflicts between federal and state AI oversight.)
Actors' union complains about Epic Games cloning Darth Vader (Highlights ongoing intellectual property and consent issues in AI content generation.)
Businesses are overwhelmingly concerned about the security threats of AI (Shows enterprise risk assessment challenges with AI adoption.)
Judge says chatbots don't get free speech protections in teen suicide case (Establishes important legal precedent for AI liability in harmful content.)
SynthID Detector: Identify content made with Google's AI tools (Provides practical tool for identifying AI-generated content.)

AI research news

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs (Explores fundamental limitations in current reasoning approaches for large language models.)
ARC-AGI-2 A New Challenge for Frontier AI Reasoning Systems (Introduces new benchmark for measuring artificial general intelligence capabilities.)
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification (Presents novel approach to autonomous scientific discovery using AI agents.)
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents (Advances methods for training more reliable web-based AI agents.)
Chain-of-Model Learning for Language Model (Develops new techniques for improving language model performance.)

AI hardware news

Introducing Stargate UAE (Shows OpenAI's global infrastructure expansion strategy.)
Nvidia CEO Jensen Huang labels US GPU export bans 'wrong' (Reveals industry tension over geopolitical AI hardware restrictions.)
Innatera unveils mass-market neuromorphic microcontroller for AI-powered edge sensors (Brings brain-inspired computing to mainstream edge applications.)
Oracle to buy $40 billion of Nvidia chips for OpenAI's US data center (Demonstrates massive infrastructure investments supporting AI development.)
German chipmaker Infineon to work with Nvidia on power delivery chips (Shows how traditional semiconductor companies are adapting to AI hardware demands.)