Scale & Strategy
together with
This is Scale & Strategy, the newsletter that hits harder than a surprise WiFi signal on a long flight.Here’s what we got for you today:
- Study: AI is less Terminator, more hot mess
- OpenAI turns Codex into a multi-agent tool
Study: AI is less Terminator, more hot mess
Human-level AI is still the industry’s favorite argument, but if there’s one thing today’s models undeniably share with humans, it’s their ability to screw up.
New research from Anthropic published Tuesday suggests that AI failures aren’t usually the result of deliberate wrongdoing. Instead, models are more likely to fall apart when tasks get harder and more complex, slipping into what researchers call incoherence: random errors, confusion, and hallucinations rather than systematic bias or intentional misalignment.
One of the biggest safety fears is that models might eventually act against their training. Anthropic notes there are two ways this can happen: honest mistakes, or intentional malice.
Many AI safety advocates focus on the second scenario, where a superintelligent system might coherently pursue goals we never intended. But Anthropic’s findings suggest the more immediate danger looks less like an evil mastermind and more like an industrial accident.
“This suggests that future AI failures may look more like industrial accidents than coherent pursuit of a goal we did not train them to pursue,” the company wrote.
The research also raises questions about whether simply scaling models solves the problem. Larger systems become more coherent on easier tasks, but on complex ones, incoherence often stays the same or even worsens. Scaling does tend to reduce bias, but it doesn’t eliminate confusion when things get difficult.
Anthropic’s takeaway: this doesn’t remove AI risk, it just changes what that risk looks like, especially for the hardest problems.
Bottom line: AI doomers may not get their Terminator storyline yet, but hallucinations and incoherent failures are still dangerous. Whether a system causes harm on purpose or because it got confused, the outcome is the same. A self-driving car doesn’t need malicious intent to crash, it just needs to hallucinate that a school zone is a highway.
Vanta uses AI and automation to get you compliant fast, simplify your audit process, and unblock deals — so you can prove to customers that you take security seriously.
Make Vanta your compliance co-pilot and:
- Get SOC 2 ready without pulling engineers off product
- Automate evidence collection and streamline audit processes
- Unblock enterprise deals with security credibility that sales
- Access expert support at every stage, from startup to scale-up
Get compliant fast with Vanta – trusted by top startups like Cursor, Linear, and Replit.
OpenAI turns Codex into a multi-agent tool
OpenAI’s Codex has quickly become a favorite among developers, and now the company is giving it a major upgrade with a new app and a more powerful workflow.
On Monday, OpenAI launched the Codex app for macOS, introducing an interface designed specifically for managing multiple AI coding agents in parallel. This isn’t just a new wrapper, it’s a shift toward treating Codex as a full command center rather than a single assistant.
The updated experience looks more like ChatGPT, but optimized for software development. Agents run in separate threads, organized by project, and the app includes built-in worktree support so multiple agents can work on the same repository without stepping on each other’s changes.
“The Codex app is now a dedicated command center for managing agents,” said Alexander Embiricos, Codex Product Lead, during a press briefing.
One of the biggest additions is an Agent Skills interface, which lets teams apply consistent instructions, workflows, and coding preferences across agents. OpenAI also introduced Automations, allowing agents to run scheduled tasks in the background.
Other Codex updates coming across platforms include:
- Two personality modes: the familiar terse version and a new more conversational, empathetic option
- Expanded access for a limited time, including availability to ChatGPT Free and Go users
- Double rate limits for paid subscribers during the promotional period
To address security concerns, OpenAI says the app uses native, open-source, configurable sandboxing similar to the Codex CLI, and restricts agents to editing only the files they’re assigned.
“These models have gotten much better… and for low-stakes things, we are comfortable relying on the model in ways we’re not used to, with some guardrails,” said CEO Sam Altman.
Our team received early access, and CEO Faris Kojok has been testing Codex over the past several days. “This isn’t just a Claude Code competitor,” he said. “It’s OpenAI’s bet that the future of AI-assisted development isn’t about pairing with one agent, it’s about managing a team of them.”
He noted that multi-agent execution genuinely worked, automations ran on schedule, and custom skills were easy to set up. That said, he found onboarding less seamless than OpenAI suggested. The key difference, in his view, is that Codex is built around delegation: assigning tasks to multiple agents and reviewing the results, rather than working side-by-side with a single assistant.
Was this email forwarded to you?
That’s it for today and as always It would mean the world to us if you help us grow and share this newsletter with other operators.
Our mission is to help as many business operators as possible, and we would love for you to help us with that mission!