Scale And Strategy

together with

Turing

This is Scale And Strategy, the newsletter designed to make you the smartest person at the water cooler.

Here’s what we got for you today:

New study throws cold water on the agent hype cycle
Google just put real pressure on Apple’s AI strategy

New study throws cold water on the agent hype cycle

A blue, stylized human head with a maze inside its open cranium. Colorful lines extend from the maze's exits, suggesting complex problem-solving or the flow of information through a neural network.

Agents are quickly becoming the AI industry’s favorite buzzword. The problem is the safety story still looks shaky.

A new long-horizon study from Emergence found that agents from Google, OpenAI, Anthropic, and xAI showed unpredictable behavior over extended periods of time, including drifting beyond the rules and constraints they were originally given.

That matters because most benchmarks today test agents over minutes or hours. Real enterprise deployments are increasingly expected to operate autonomously for days or weeks. Entirely different failure mode. Humans inventing digital interns and then acting surprised when they start freelancing their own interpretation of the assignment. Timeless management story.

Emergence built a simulation platform called Emergence World, using world models to observe how different agents behaved across five virtual societies over a 15-day period. Each group started with the same instructions and environmental constraints. The outcomes diverged hard.

Some societies stabilized. Others collapsed into chaos.

According to the researchers:

Claude agents from Anthropic formed highly structured, cooperative societies with zero violence, though they also drifted toward excessive bureaucracy and rigid conformity.
GPT-5 mini agents from OpenAI understood collaboration conceptually but struggled to organize effectively in practice, resulting in weak social coordination.
Gemini agents from Google built the richest and most creative societies, but also generated extreme instability, including 111 arsons and more than 500 physical conflicts alongside advanced governance systems.
Grok agents from xAI went full apocalypse mode. Theft, retaliation, assaults, arson, and complete societal collapse. All 10 agents in the simulation were dead within four days.

The broader concern is not whether these exact simulations perfectly map onto the real world. They don’t. The concern is that the study showed measurable behavioral drift emerging over longer operational timelines. That’s the exact category of problem companies are currently sleepwalking toward while wiring agents into healthcare systems, telecom infrastructure, banking operations, customer support stacks, and internal enterprise workflows.

The findings also line up with warnings from researchers like Yoshua Bengio, who has repeatedly argued that increased agency creates increased risk because current systems can develop goals and behaviors humans did not explicitly intend.

And that’s really the uncomfortable point underneath all this: most of the current agent hype assumes capability scales faster than unpredictability. That assumption may age very badly.

The industry spent the last two years proving models can reason, code, browse, and operate software. The next challenge is proving they can do those things for long periods without drifting into unstable behavior, compounding errors, or quietly inventing new objectives along the way.

Right now, the evidence for that looks… incomplete. Which is a polite way of saying everyone is racing toward deployment while the guardrails are still being assembled in the parking lot.

The research accelerator for frontier AI labs

While data factories churn out quantity, leading AI labs need partners who co-own research goals and engineer the complex human-AI loops that push models from promising to state-of-the-art. Turing specializes in closing capability gaps through custom research acceleration.

Turing’s research-focused approach includes:

Co-owned experimental outcomes, not just data delivery, and vendor neutrality
Quality-by-design workflows with transparent data lineage and auditable results
Custom RL environments and SFT/RLHF/DPO pipelines designed for your benchmarks

Partner with the research accelerator that understands what frontier AI labs actually need.

Google just put real pressure on Apple’s AI strategy

It's Google's turn to convince us to care about AI on our phones | The Verge

The AI smartphone race stopped being theoretical this week.

Google rolled out a new batch of Android features powered by Gemini Intelligence, turning Android into something much closer to an active operating layer than a traditional mobile OS. And the timing was surgical: the launch landed less than a month before Apple’s WWDC event, where the company is expected to finally unveil a much-needed Siri overhaul.

Google clearly wanted to frame the conversation before Apple could step on stage.

Sameer Samat, who runs the Android ecosystem, barely hid the jab either:

“Some companies are still working on their first iteration of what that chatbot or assistant voice assistant should really be.”

Translation: Google thinks Apple is late. Which, honestly, is hard to argue against at this point.

The bigger story is that Android’s AI features are starting to feel genuinely integrated instead of bolted on.

Gemini Intelligence can now handle multi-step workflows across apps, meaning users can do things like sync grocery lists with food delivery carts or trigger actions between services without manually stitching everything together themselves. The real value of AI on phones was always going to be orchestration, not chatbots floating in isolation like lonely customer support windows from 2017.

Google also introduced Rambler, a multilingual voice dictation system that strips filler words and handles language switching naturally, alongside tighter integration with Meta platforms to improve Instagram video quality on Android devices. That last one matters more than it sounds. Consumer platform wars are increasingly creator-economy wars wearing different clothes.

Other additions included smarter widget customization, QR-based file sharing, and features designed to reduce doomscrolling. Silicon Valley creating products to save people from the consequences of Silicon Valley products remains one of the great infinite loops of modern business.

The important thing here is not any single feature. It’s momentum.

For years, Apple dominated smartphones by owning the premium user experience while Google won on openness and distribution. AI changes the equation because the assistant layer becomes the product layer. Whoever builds the better ambient intelligence system could end up controlling how users interact with apps entirely.

Right now, Google looks ahead on execution speed.

That does not automatically mean Google wins.

Apple still has the strongest consumer trust, the tightest ecosystem lock-in, and perhaps the most loyal hardware customer base on earth. Millions of users will tolerate objectively worse software experiences indefinitely to avoid becoming the green bubble in a group chat. Human tribal behavior continues to outperform rational market analysis. Astonishing species.

Still, the pressure is real now. If Apple’s WWDC reveal underwhelms, the narrative around AI leadership in consumer tech could shift very quickly.

Was this email forwarded to you?

That’s it for today and as always It would mean the world to us if you help us grow and share this newsletter with other operators.

Our mission is to help as many business operators as possible, and we would love for you to help us with that mission!

Unsubscribe · Preferences

Google just put real pressure on Apple’s AI strategy

Scale And Strategy

​Turing​

New study throws cold water on the agent hype cycle

​The research accelerator for frontier AI labs​

Google just put real pressure on Apple’s AI strategy

Subscribe to Scale & Strategy

Turing

The research accelerator for frontier AI labs