ExoBrain weekly AI news

2nd May 2025: When AI tries too hard to please, connecting Claude, and image generators gain creative control

Welcome to our weekly email newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week:

  • OpenAI's sycophancy problem with the latest GPT-4o update

  • Claude's new integration tools connecting with third-party services

  • Advanced control features in AI image generation platforms

When AI tries too hard to please

Between April 25th and 29th, 2025, users of the latest GPT-4o model in ChatGPT started to report unnerving responses. It became excessively agreeable, sometimes dangerously so. Examples emerged of the chatbot applauding dubious user statements, laughably bad business ideas, and one alarming instance where it praised a user for stopping their medication. CEO Sam Altman first addressed the sycophancy problem on X last Sunday, pledging swift action. Then, on Tuesday, he announced the full rollback of an update, alongside efforts to make additional fixes to the model's personality.

OpenAI termed the general behaviour "sycophantic" and admitted that improvements had inadvertently backfired. OpenAI explained that updates incorporating user feedback weakened controls against excessive agreeableness, illustrating the "reinforcement learning trap" where optimising for user satisfaction can have unintended consequences. Compounding this, underlying system prompt changes were somewhat "blunt and heavy-handed", as acknowledged by OpenAI in online AMA. They explained that many users had been positive about the overly enthusiastic style in initial A/B tests, although there had been some negative non-specific feedback.

Part of the fix was revealed through leaked “system prompts”, highlighted by technologist Simon Willison using information reportedly obtained by notorious prompt jailbreaker Pliny the Liberator. The prompt apparently responsible for the excessive agreeableness encouraged the AI to "adapt to the user’s tone and preference" and explicitly "try to match the user’s vibe, tone, and generally how they are speaking." Conversely, the corrected prompt steers the AI differently: "Engage warmly yet honestly," it reads, instructing the model to "Be direct; avoid ungrounded or sycophantic flattery" and "Maintain professionalism and grounded honesty that best represents OpenAI and its values."

Perhaps most interesting was why this resultant impact wasn't caught pre-update. Standard evaluations looked good. OpenAI suggested the core difficulty lies in measuring nuanced behaviours, stating they are now actively developing better, scalable evaluations. Until today, their process allowed positive metrics to override qualitative concerns.

This episode once again raises concerns about AI misalignment. It demonstrates that subtle but important failures, leading to actively unsafe outputs, don't require AGI or superhuman intelligence. They arise from the complexities of current systems and the difficulty in defining, measuring, and enforcing optimal behaviour. The sycophantic model, capable of dangerously poor judgment, exemplifies the risk of manipulation or flawed guidance driving negative outcomes at the scale of the half-a-billion ChatGPT users.

OpenAI, have stated they "missed the mark," announcing concrete changes to their processes today, May 2nd. They plan an opt-in "alpha phase" for user testing pre-launch and commit to blocking future launches based on behavioural concerns identified through qualitative signals, even if metrics look positive. They also pledge more proactive communication about updates, including known limitations. This follows earlier statements about refining training techniques, potentially offering multiple AI personalities, and explicitly recognising that the platform's use for "deeply personal advice" necessitates treating this use case with "great care" within their safety work.

Takeaways: OpenAI's GPT-4o sycophancy incident served as a tangible example of the AI alignment problem, revealing how even current systems can develop actively harmful behaviours. It exposed critical weaknesses in relying solely on quantitative metrics and blunt steering mechanisms. While OpenAI is now implementing significant process changes – committing to value qualitative signals enough to block launches and acknowledging its profound responsibility for users seeking personal advice – this is just one prominent lab. As numerous diverse AI systems enter the world, often amidst a climate where competitive pressures and rapid releases seem to overshadow rigorous safety work, this episode underscores the need for some caution. It serves as a reminder against hubris when developing and deploying technology with such complex societal impacts.

Connecting Claude

It was a busy week for Anthropic, not for model releases, but a series of product feature additions. Web search for Claude is now available worldwide, and their deep research equivalent is also now enabled for paying users. Claude is now broadly on a par with the ChatGPT and Gemini toolsets. Anthropic has also equipped Claude with new integration capabilities, allowing it to connect with services like PayPal, HubSpot and Notion and thousands more. This update, built on the Model Context Protocol (MCP) introduced last December, aims to solve the problem of AI disconnection by creating standardised links between AI models and external tools and apps.

This feature has been available in raw form on the desktop version of Claude but now comes to the web with a friendlier configuration experience. However, it’s still not what you’d call slick. Built-in Google Workspace integrations work pretty reliably, but third-party tools remain unpredictable. The quality of these integrations depends heavily on the third-party’s implementation and Claude's ability to understand and use its new tools.

The MCP approach differs from OpenAI's discontinued plugins by establishing an open protocol that works across multiple AI systems rather than requiring custom development for a single platform. This strategy creates multiple paths to integration, simplifying development significantly. Google and OpenAI are supporting MCP in their coding frameworks and will hopefully bring that support to their primary web products in the future.

Full maturity is going to require solving authentication challenges, managing server availability, handling timeouts, and implementing proper error management. These technical hurdles are surmountable and the future feels promising for connected AI chatbots.

Takeaways: While Claude's integration capabilities aren't perfect, they represent a key step toward AI agents that can meaningfully connect with our digital world. As these connections mature, we'll likely see consumer AI products transform from isolated chat interfaces into genuine productivity powerhouses.

Image generators gain creative control

This image showcases the growing sophistication and control in AI image generation, exemplified by platforms like Ideogram 3.0 which received a big update this week. Beyond photorealism, the tech now allows users to precisely direct outcomes, creating consistent product visuals across varied marketing contexts, complete with specific text or compositional elements. Features enabling detailed edits and style management are evolving AI into a controllable creative partner. AI is maturing into a practical toolset for marketing and design, enabling precise asset creation and challenging traditional workflows.

Weekly news roundup

This week's developments show increasing tension between AI advancement and regulatory controls, particularly in hardware and international trade, while major tech companies continue aggressive AI product launches and research breakthroughs.

AI business news

AI governance news

AI research news

AI hardware news