The weekend before Évian
— Anthropic catches three crosswinds at once.
Forty-eight hours before Sam Altman, Dario Amodei and Demis Hassabis sit down at Évian-les-Bains for the Jun 15–17 G7 working lunch on AI, three uncomfortable Anthropic data points landed inside the same Saturday morning. (1) The Information published Anthropic Blindsides Its Business Partners on Jun 13 — Stephanie Palazzolo and Amir Efrati reporting that Anthropic asked Figma, Canva and other firms to be "partners" on the April 17 Claude Design launch weeks beforehand, without warning them the product would compete head-on with their core surfaces; Mike Krieger resigned from Figma's board on Apr 14, Figma stock dropped ~7% on launch day, and Canva is the only firm still claiming co-development. (2) The Agents' Last Exam leaderboard — UC Berkeley RDI's Dawn Song with Snorkel AI, 300+ experts across 55 industries, 1,500+ verified real-world tasks — went live this week with GPT-5.5 on the Codex harness top of the chart at 24.0% pass rate, Cursor with Composer 2.5 second, and Fable 5 on Claude Code third at 22.0%. The hardest "Last-Exam" tier averages 2.6% across all frontier agents; nobody is close to passing. (3) The Register's Jun 11 read on the IDC FERS March-2026 survey pegs Claude at 19% extensive enterprise use vs OpenAI at 42% and Google at 38% — the first third-party data point on the gap between Anthropic's revenue lead ($30B+ run-rate, per The Information) and its seat at the enterprise table. Underneath, the harness war eased off the throttle: Anthropic cut a single chore-only v2.1.177 at 01:25 UTC on the Saturday after Friday's three-in-a-day burst, and OpenAI's Codex CLI added two more pre-releases (alpha.17, alpha.18) on the v0.140.0 tag — seventeen alphas across four days, the stable cut still pinned to v0.139.0. The capital plane kept laying cloud-shaped foundations: OpenAI confirmed on Jun 11 it will acquire Ona (the cloud business Gitpod rebranded into) to give Codex a secure long-horizon execution layer; and the Jun 10 Visa Payments Forum saw Visa plug its global payment network into ChatGPT so an agent can shop and pay at any Visa-accepting merchant under user-defined spending limits. The throughline going into Évian: the policy seat and the capital seat both still favour Anthropic, but the partner, benchmark and enterprise-adoption seats just got visibly harder.
The lead — three Anthropic crosswinds inside a single Saturday morning
The Information publishes "Anthropic Blindsides Its Business Partners" — Figma, Canva and other firms asked to be Claude Design launch "partners" weeks ahead, without warning the product would compete with their core surfaces
Jun 13The cleanest public read yet on the cost-of-doing-business with the lab whose surface keeps expanding, and the first Information-sourced story that frames Anthropic's partner stratum as a fault line going into Évian. The Jun 13 piece, bylined Stephanie Palazzolo and Amir Efrati, reports that weeks before the April 17, 2026 Claude Design launch — an AI-native visual creation tool that turns prompts into prototypes, slides and marketing assets on Claude Opus 4.7 — Anthropic asked Figma, Canva and other design-tool firms to be named launch partners, without telling them the product would compete directly with their core surfaces. The blow landed publicly when Mike Krieger, Anthropic's chief product officer, resigned from Figma's board on Apr 14, three days before launch; Figma's stock dropped ~7.28% to $18.84 on announcement day, and Adobe and Wix tracked down with it. Canva remains the only firm claiming to have co-developed Claude Design — its Design Engine and Visual Suite take handoff from Claude Design output. Two reads. (1) The framing — "asked them to be partners without telling them they were competitors" — is the structural tell on what an infrastructure-class AI lab looks like to its partner ecosystem two years in: the same surface that underwrites Figma's Code to Canvas ships head-to-head with Figma's editor on a six-month cycle. Two months out from a confidential S-1, the pattern is now a paper trail for the bankers walking this story into rooms. (2) The Information byline matters independently. Palazzolo + Efrati hold the tightest source list on Anthropic's business model (the $30B+ revenue-vs-OpenAI scoop ran under the same masthead); a they-blindsided-the-partners read out of those two reporters lands differently than the same claim from a tier-2 outlet. Évian opens forty-eight hours later with the policy lunch — the seat Dario Amodei walks into argues for FAA-style external review of frontier labs, while his own firm's most public product-launch playbook is now in print as a case study in not warning the rest of the stack.
UC Berkeley RDI + Snorkel AI launch Agents' Last Exam — GPT-5.5 on the Codex harness tops the public leaderboard at 24.0%, Cursor / Composer 2.5 second, Claude Code / Fable 5 third at 22.0%, hardest tier averages 2.6%
Jun 11–12The first cross-lab agent benchmark this cycle that holds the verifiable-task bar (not multiple-choice, not synthetic puzzles) and that publicly inverts the post-Fable-5 model-bench narrative. Agents' Last Exam (ALE) — paper co-led by Dawn Song at UC Berkeley RDI, contributors from Snorkel AI's Open Benchmarks Grants program, 300+ industry experts across 55 non-physical industry sub-domains anchored on O*NET / SOC 2018 — fields the agent on a real machine for a real workflow, scores the artifacts it leaves behind against verifiable success criteria. 1,500+ tasks total. The launch leaderboard, Jun 12: Codex + GPT-5.5 first at 24.0% full pass; Cursor + Composer 2.5 second; Claude Code + Fable 5 third at 22.0%. On the hardest "Last-Exam" tier the average full pass rate across all frontier agents is 2.6%. Two reads. (1) The narrative inversion is the headline — for a year the Anthropic frontier model has carried the published-benchmark crown (Fable 5 at 95% SWE-Bench Verified, 80.3% SWE-Bench Pro per vendor report). ALE is the first widely-cited benchmark this quarter where the harness matters as much as the model, and OpenAI's Codex-plus-GPT-5.5 stack just beat Anthropic's Claude-Code- plus-Fable-5 stack days after Fable 5 shipped. The dunk lands two days before the G7 lunch. (2) The 2.6% hardest-tier number is the deeper signal. The competitive story among frontier labs is now within the noise band of "everyone fails the hard work"; the useful axis going into Q3 is not which model wins another SWE-Bench point — it's which harness composes long-horizon tools well enough to climb the ALE long tail. That's a fight Codex won the first round of in public.
The Register reads IDC's March-2026 FERS survey — Claude at 19% extensive enterprise use vs OpenAI 42% / Google 38%, with 25% actively evaluating Claude — the first third-party gap data behind Anthropic's revenue lead
Jun 11The structural counterweight to Anthropic's revenue-lead story, and the data point the rest of the cycle will be re-litigated against. The Register's Jun 11 piece reads IDC's FERS (Future Enterprise Resiliency & Spending) March-2026 survey of 1,000+ end-user organisations: 19% reported extensive use of Claude models, vs ~42% for OpenAI and ~38% for Google. The evaluation funnel sits at 25% for Claude — a real pipeline, but not converted yet. Two reads. (1) The disconnect with the Anthropic-leads-on-revenue story ($30B+ run rate per The Information's May reporting) resolves the way most enterprise lifts do: Anthropic is winning the token-volume race off a smaller distinct-deployments base — a few very large customers running enormous workloads, while OpenAI and Google have broader installed-base footprint with smaller per-account spend. The policy seat at Évian and the extensive-use seat in IDC's cross-tab are not the same seat, and the enterprise-buying surface is still keyed to the second. (2) The 25% evaluation number is the under-the-radar tell on the next twelve months. If Anthropic converts a third of evaluators to extensive users by IDC's FERS end-of-year cut, the gap closes structurally; if Fable 5's same-month launch plus the Information's blindside story plus the ALE upset stall those evaluations, 19% becomes the FY26 baseline the next S-1-era earnings calls get measured against. The enterprise plane and the policy plane just diverged.
The harness war eases off the throttle for the weekend
Update — anthropics/claude-code v2.1.177 ships Sat 01:25 UTC as a chore-only release (CHANGELOG.md + feed.xml), the three-in-a-day Friday burst gives way to a single housekeeping cut
Jun 13The first non-substantive Claude Code release in five days, and the clearest tell that Friday's three-in-a-day burst was Fable-5-anniversary scaffolding rather than a new normal. v2.1.177 landed at 01:25 UTC Saturday morning, signed by @ashwin-ant, single commit ca9f604 with the message "chore: Update CHANGELOG.md and feed.xml" — no shipped behaviour change, a pure metadata roll. Two reads. (1) The signal is the absence: after the v2.1.174 → v2.1.176 cluster cleared three Fable-5-shaped levers in twenty-one hours (enterprise model allow-list, Bedrock credential caching, language-aware session titles), the Saturday cadence dropped to housekeeping — the harness team is staffing the weekend at a single maintainer cutting the docs feed, not the on-call rotation shipping behaviour. The seventeen-releases-in-eight-days streak that started on the Jun 5 Fable-5 RC has compressed into two days idle. Pair with item 05's Codex side: the harness war is deliberately leaving Saturday quiet ahead of the Évian week, which suggests both Anthropic and OpenAI are holding their next substantive cut for Monday — the day the G7 lunch ships. (2) The chore-only release pattern is itself a Fortune-1000 deployment tell — enterprise admins running availableModels-locked installs under managed updates need the CHANGELOG.md regen to flow through even when no behaviour changes, because the docs feed is the audit trail their compliance stack consumes. The chore release exists because the enterprise stack now expects one.
Update — openai/codex rust-v0.140.0-alpha.17 ships Jun 13 01:20 UTC, alpha.18 at 17:30 UTC — two more pre-releases on top of the Friday quad, seventeen alphas across four days, stable still pinned to v0.139.0
Jun 13The Codex CLI alpha train slowed but did not stop into the weekend. openai/codex shipped rust-v0.140.0-alpha.17 at 01:20 UTC Saturday and alpha.18 at 17:30 UTC — two more pre-releases on top of the four that landed Jun 12 (alpha.13 → alpha.16). Running count: seventeen alphas across Jun 9 → Jun 13; stable channel still pinned to rust-v0.139.0 from Jun 9. Two reads. (1) The cadence halved from Friday's four-in-a-day to Saturday's two-in-a-day, mirroring Claude Code's drop from three-in-a-day to one chore release over the same window. Both harness teams kept the on-call shift live for weekend hotfixes but neither shipped a substantive feature pre-release on Saturday — the deliberate quiet weekend pattern that has historically preceded a Monday-morning model-event-aligned cut. The stable v0.140.0 tag is still waiting, eighteen pre-releases deep. (2) The seventeen-alphas-in-four-days shape is the same competitive read as item 04: when the model surface is at parity (item 02's ALE numbers say so explicitly), neither harness can afford to ship a week-late substantive feature. Both teams are now buffering features inside pre-release tags until the model-event ship window, then collapsing the buffer into a stable cut. The pre-release queue is the competitive moat.
Capital lays cloud-shaped foundations under the agent layer
OpenAI confirms acquisition of Ona — the cloud business Gitpod rebranded into — giving Codex secure, pre-configured cloud sandboxes for long-horizon agent runs that persist beyond a single session
Jun 11The first sandbox-layer acquisition of the post-Fable-5 cycle and the OpenAI side of the same trade Anthropic made when it bought Stainless in May — both labs are buying the infrastructure that lets a coding agent run for days instead of minutes, not the agent itself. On Jun 11, OpenAI announced it would acquire Ona — the German company Gitpod rebranded into in 2025, which provides pre-configured cloud environments stocked with the tools, access controls and audit trails an AI agent needs to operate independently inside a customer's own cloud. The press framing from OpenAI: more than 5M weekly Codex users (up 4× on the year), enterprise client list includes a major US bank, European pharma firms and Asian sovereign wealth funds; Ona's productive use among enterprise customers is up 13× in 2026. Financial terms undisclosed; transaction subject to customary closing conditions. Two reads. (1) The acquisition lines up with the v0.140.0 Codex CLI alpha train (item 05) — the seventeen alphas under that tag have been adding the harness scaffolding to run Codex inside a long-lived sandbox; Ona becomes the production cloud surface those alphas now ship into when the stable cut lands. The Codex-runs-while-your-laptop-is-closed product pitch needs both halves at once. (2) The customer's-own-cloud framing is the under-the-radar enterprise tell. Anthropic's Cloudflare Environments for Claude Managed Agents in March took the same shape on the Cloudflare runtime; OpenAI buying Ona gives Codex a parallel offering that doesn't depend on Cloudflare being the runtime. Two harnesses, two sandbox stacks, same competitive surface — the "where does the agent actually run" question now has a real answer on both sides.
Visa plugs its global payment network into ChatGPT — Intelligent Commerce Connect lets an agent shop and pay at any Visa-accepting merchant under user-defined spending limits, with tokenised credentials and real-time fraud monitoring
Jun 10The first time a global payment network has opened an agent-initiated checkout rail at planetary scale, and the cleanest pairing yet between the agentic-commerce pitch every lab has made for a year and a real-money settlement surface. At the Jun 10 Visa Payments Forum in San Francisco, Visa announced a strategic collaboration with OpenAI: Visa will provide its global network, tokenisation capabilities and security infrastructure to support agent-initiated transactions inside ChatGPT, with the agent processing a user prompt, evaluating merchant catalogues and completing checkout on the user's behalf using Visa's rails at any participating merchant. The product wrapper is Intelligent Commerce Connect, which Visa describes as a network, protocol and "token-vault-agnostic on-ramp" for AI agent builders and merchants. Spending limits, merchant-category restrictions and approval requirements all live in user-defined parameters; tokenised credentials and real-time fraud monitoring apply throughout. Two reads. (1) The universal-merchant framing — any Visa-accepting merchant, not a marketplace partner list — is the structural signal. The agent-commerce stack has been stuck at the "Stripe-style marketplace partner integrations" stage for eighteen months (PayPal + ChatGPT, Shopify Instant Checkout, the various agent- wallet pilots); pulling the card-network layer up to the agent-initiated transaction shape skips that step and makes agents merchant- agnostic. The "checkout-as-a-service" pitch every agent vendor has made now has a tokenisation backbone that doesn't require merchant-by-merchant signup. (2) The under-the-radar regulatory tell is the spending-limit + approval parameter layer. Visa's network has carried user-authentication-shaped controls for a decade (3-D Secure 2.0, dynamic limits); the same controls now apply to a non-human actor, which is the compliance-by-construction shape the EU AI Act high-risk audit trail requires from Aug 2 2026. Visa is building the "who authorised this transaction" audit trail for agents before the enforcement date.
The framework cadence keeps shipping under the bench upset
agno v2.6.14 — Learnings CRUD on AgentOS, Gemini thread-safety and JSON-provider fixes; v2.6.13 wired sub-agent event streaming, AgentOS registry auto-population and workflows HITL socket support
Jun 10–12The two cuts that pushed Agno's AgentOS primitive past "agent runtime" and into "agent operating system", and the Python-agents-framework counterpart to the Strands and Mastra majors from yesterday's brief. v2.6.13 on Jun 10 21:02 UTC wired sub-agent event streaming through the AgentOS event bus, added registry auto-population so a new AgentOS host picks up its workforce on cold-boot, and shipped HITL socket support for Workflows; v2.6.14 on Jun 12 16:38 UTC added a Learnings CRUD endpoint pair on the AgentOS API and fixed Gemini thread-safety under concurrent requests plus a JSON- provider edge case. Two reads. (1) Learnings CRUD is the under-the-radar primitive — the AgentOS now treats "what the agent learned this session" as a first-class CRUD-able resource alongside memory and tools. Pair with Mastra's trusted system actor and Strands's checkpointing-in-event- loop: the open-source Python frameworks converged this week on "every step has identity, every step is a save point, every learning is a resource". (2) Gemini thread-safety under concurrent requests is the multi-tenant-deployment tell — Agno's AgentOS now runs more than one Gemini agent in the same process safely, the bug surface that's been quietly blocking Gemini-first AgentOS rollouts since v2.6.10.
openai/openai-agents-python v0.17.5 — sandbox error retryability is now exposed at the public API, and tool-end hook results are typed as objects (the first behaviour change on the Python agents SDK in seventeen days)
Jun 11The first Python agents-sdk cut since v0.17.4 on May 26 and the cleanest sandbox-layer plumbing signal alongside the Ona acquisition (item 06). v0.17.5 exposes retryability as a first-class field on sandbox errors so a calling agent can decide whether to retry-in-place or escalate to the user, and types tool-end hook results as structured objects rather than free-form strings. Two reads. (1) Sandbox-error-retryability is the primitive that lets a long-horizon Codex run inside Ona handle a transient cloud failure without surfacing a human interrupt — the exact shape the "Codex runs while your laptop is closed" product pitch needs. The acquisition and the SDK change line up to within forty-eight hours. (2) Typed tool-end hook results is the trace-surface tell. Until v0.17.5, an agents-sdk caller couldn't reliably inspect a tool's structured return inside a hook; this change collapses a class of fragile string-parsing patterns and propagates downstream into weave, langfuse and Logfire traces.
All-Hands-AI/OpenHands 1.8.0 — LLM profiles, sandbox grouping-strategy selection, sub-agent delegation, and the first generic-ACP-agent UI ship together
Jun 10The biggest single OpenHands cut since the 1.7 line and the first time an open-source autonomous-coding agent has shipped a generic ACP (Agent Communication Protocol) UI shell — effectively the rendering surface for any ACP-compatible agent backend, not only OpenHands itself. 1.8.0 on Jun 10 16:58 UTC packs four first-time primitives at once: named LLM profiles (a single OpenHands install can now switch between a fast-cheap and frontier- expensive profile per task); sandbox grouping strategy as a user-selectable knob (the agent picks how to bucket file-system, process and network isolation per sub-task); sub-agent delegation (a primary OpenHands run can hand a sub-goal to a child agent and wait); and the generic ACP UI. Two reads. (1) The ACP UI cleanly mirrors what Anthropic's auth.md and WorkOS's agent-identity stack are doing on the auth side — open-source agents agreeing on a shared protocol for the surface a human reviews. OpenHands shipping the first concrete UI implementation of that protocol moves ACP from spec-on-paper to running-in-production. (2) LLM profiles + sub-agent delegation together is the cost-discipline primitive — the open-source answer to Cursor's Composer-2.5-for-fast-path, frontier-for-core price-discrimination lever from the prior brief. The "frontier model for the agent core, small tuned model for tooling" pattern that closed labs are running is now first-class in an open-source autonomous- coding agent.
← Back to all Spotlight editions
