MAI-Thinking-1, Grok V9 and Kumo —
the model layer comes home.
Yesterday's brief tracked the runtime, the rulebook and the observability watcher. Today the layer that sits above all of those moves: the model layer comes home. Microsoft buried the actual headline of Build 2026 under the Scout story — MAI-Thinking-1, its first frontier-grade in-house reasoning model, a 35B-active sparse-MoE with a 256K context window, trained from scratch on commercially licensed data without third-party distillation, scoring 97.0% on AIME 2025 and 94.5% on AIME 2026 and matching Anthropic's Claude Opus 4.6 on SWE-Bench Pro — plus MAI-Code-1-Flash, a 5B-param coding model that rolls out today to every GitHub Copilot plan and beats Claude Haiku 4.5 by 16 points on SWE-Bench Pro (51.2% vs 35.2%). xAI says Grok V9-Medium — 1.5 trillion parameters, roughly 3× the prior generation — has finished pre-training on Cursor developer-workflow data and ships in 2–3 weeks; Grok Build, xAI's first dedicated coding agent, is already in beta. NVIDIA agrees to buy Kumo AI for $400M+, picking up ex-Pinterest CTO Vanja Josifovski, Stanford's Jure Leskovec and ex-LinkedIn AI head Hema Raghavan — plugging the structured-data gap LLMs have never closed at customers like DoorDash, Reddit, Databricks and Snowflake. The substrate picks sides in the same week: Pinterest signs a $4B AWS commitment through 2031 — its largest infrastructure deal ever — with Trainium as primary AI accelerator for the LLMs and VLMs behind the Taste Graph and 600M-MAU visual search. Mistral's 44MW Bruyères-le-Châtel facility — 13,800 NVIDIA GB300s on an $830M seven-bank debt facility — targets an end-of-June opening. The IDE-agent tooling beneath all of it matures in 72 hours: Cursor ships Organizations for Enterprise, a new Premium seat (5× the included usage at 3× the cost) and moves Bugbot to usage-based billing; FastMCP 3.4 ("the remote release") lands fastmcp-remote, a single-purpose stdio→HTTPS bridge with OAuth on for free; Google Antigravity Agent and its SDK go public preview through the Gemini API, and Gemini 3.5 Flash GAs in Gemini Enterprise with native MCP. Apoha exits stealth at $36M to build Liquid State Intelligence — a third molecular data class alongside sequence and structure, with Boehringer Ingelheim and Somru BioScience as named partners. Brian Chesky lines up early funding for a design-first AI lab while staying Airbnb CEO. And the clock keeps ticking: the EU AI Act's August 2 cutover — high-risk-deployer and frontier-model obligations binding, penalties up to €35M or 7% of global turnover — is now 58 days away. The substrate is no longer the variable. The model is.
The lead — the model layer comes home
Microsoft MAI-Thinking-1 + MAI-Code-1-Flash — first in-house frontier reasoning model, no third-party distillation
Jun 2The Build 2026 story Microsoft pulled below Scout — and the one that materially changes the Copilot stack. On June 2, Microsoft AI unveiled MAI-Thinking-1: a mid-sized sparse Mixture-of-Experts reasoning model, 35B active parameters, 256K context window, trained from scratch on commercially licensed data without distillation from any third-party model. Microsoft claims 97.0% on AIME 2025, 94.5% on AIME 2026, and parity with Anthropic's Claude Opus 4.6 on SWE-Bench Pro coding. It ships in private preview through Microsoft Foundry. The companion MAI-Code-1-Flash is a 5B- parameter coding model rolling out today across every GitHub Copilot plan (Pro, Pro+, Business, Enterprise) and selectable in the VS Code model picker; Microsoft says it outperforms Claude Haiku 4.5 on all four core coding benchmarks tested, with a 16-point lead on SWE-Bench Pro (51.2% vs 35.2%) at roughly 60% fewer output tokens. Two reads. (1) Yesterday's Polaris ship was Microsoft pulling Copilot's default off OpenAI; today's MAI-1 line is the supply-side version of the same trade — Microsoft now has a frontier-class reasoning model and a Haiku-tier coding model it trained itself, with no upstream lab in the dependency graph. (2) Read with the Microsoft Foundry Agent Service GA at Build (the operate-layer tier) and Scout's per-agent Entra identity (yesterday's lead): the in-house model + the hosted agent runtime + the governed identity are now Microsoft's full vertical — the first non-OpenAI lab with all three tiers in one badge.
xAI finishes pre-training Grok V9-Medium — 1.5T parameters, trained on Cursor workflow data, public release in 2–3 weeks
late May–JunxAI's coding-model bet, made loud. Elon Musk confirmed this week that Grok V9-Medium has finished pre-training at 1.5 trillion parameters — roughly 3× the prior V8 generation — with fine-tuning under way now and the RL phase queued; public release is signalled for mid-June, i.e. inside the next 2–3 weeks. The training-data choice is the consequential bit: xAI says V9-Medium was trained not just on public GitHub corpora but on Cursor developer-workflow data — real-engineer debugging, refactoring and production-codebase traces — the kind of trajectory data Cursor's 2026 training-data licensing deals were explicitly built to monetise. Underneath the model, Grok Build — xAI's first dedicated coding agent, a terminal CLI + agent runtime with parallel subagents, git worktrees, headless mode and Agent Client Protocol support — is already in beta for SuperGrok Heavy subscribers and live on the xAI API at $1 / $2 per million input / output tokens at 100+ tokens/second. Two reads. (1) Coding is now where the lab race actually clears price: Microsoft's MAI-Code-1-Flash (item 01) prints 16-point SWE-Bench Pro gains at 5B, Anthropic's Claude Code keeps shipping under Opus 4.8, and xAI's answer is to triple the parameter count and train on the IDE's own telemetry. (2) Cursor's IDE telemetry as training corpus is the first concrete repeat of the OpenAI / Reddit data licence playbook, but for engineering trajectories — the data layer that, until this quarter, sat unmonetised on every agent IDE's logs.
NVIDIA buys Kumo AI for $400M+ — picking up Josifovski, Leskovec and Raghavan to plug the structured-data gap
Jun 3–4The acquisition that completes the model-layer story. NVIDIA agreed to buy Kumo AI — a five-year-old Mountain View startup — for a reported $400M+, per The Information / PYMNTS. Kumo builds predictive large models that operate directly on structured enterprise data (customer records, payment data, warehouse tables) — the workloads general-purpose LLMs handle poorly because the underlying tokens are columns, not prose. Its technical stack pairs graph machine learning with synthetic data generated inside simulated business environments, training models that predict churn, payment defaults and demand directly against an enterprise data warehouse. The customer list is the part that matters: DoorDash, Reddit, Databricks and Snowflake. All three co-founders — former Pinterest CTO Vanja Josifovski, Stanford professor Jure Leskovec, and former LinkedIn AI head Hema Raghavan — join NVIDIA. Two reads. (1) NVIDIA's enterprise-AI push has so far been a hardware-and-framework story (DGX Spark, Vera, NemoClaw, BlueField). Kumo is the first time NVIDIA has paid frontier-lab money for an enterprise model, not a runtime — a tacit acknowledgement that an agent platform without a native structured-data model leaks revenue to whoever ships one. (2) Pair with Microsoft MAI-1 (item 01) and xAI V9 (item 02): three different platform owners, three different in-house model strategies, one shared conclusion in a single week — the rented-model era is over.
The substrate picks sides
Pinterest commits $4B to AWS through 2031 — Trainium as primary AI accelerator for the Taste Graph
Jun 4The largest single infrastructure commitment in Pinterest's history, and the cleanest enterprise vote yet for Trainium as a Nvidia-GPU alternative. On June 4, Pinterest and AWS announced a six-year strategic agreement worth $4 billion, running through 2031, that names AWS the Preferred Cloud Services Provider and commits Pinterest to using AWS Trainium chips as the primary accelerator for training and serving the LLMs and vision-language models behind the Taste Graph and personalised visual search for 600M monthly active users. AWS Graviton already runs roughly one-third of Pinterest's platform compute and will expand; Pinterest is also migrating to an EKS-based container architecture. Two reads. (1) For Amazon, a marquee customer publicly buying Trainium for frontier vision-language model training — not inference, not cost-down — is the validation moment the silicon needed: every cloud-procurement conversation post-June 4 has a Pinterest case to point at. (2) For Pinterest, the deal locks in unit economics for the agent-driven discovery roadmap before the EU AI Act August 2 cutover (item 11) re-prices the cost of any model-platform switch. Read with yesterday's Vector Core Compute launch and the OpenAI-AWS Trainium Bedrock AgentCore commitment: the post-OpenAI compute supply chain is being assembled in public.
Mistral's 44MW Bruyères-le-Châtel facility — 13,800 GB300s — heads for an end-of-June opening
end-JunEurope's sovereign-compute story stops being a slide deck. Mistral's 44MW training facility in Bruyères-le-Châtel — roughly 30km south of Paris, owned and operated by Eclairion, financed by an $830M debt facility from a seven-bank consortium (Bpifrance, BNP Paribas, Credit Agricole CIB, HSBC, La Banque Postale, MUFG, Natixis CIB) — is on schedule to come online by end of June 2026. The build: 13,800 NVIDIA GB300 Grace-Blackwell units, each rated at up to 20 PFLOPS FP4, for an aggregate ~276 EFLOPS FP4. A separate 10MW inference site at Les Ulis is queued for Q3, and a $1.4B Swedish build lifts Mistral's target to 200MW of European compute by end of 2027. Two reads. (1) Pair with Mistral's EU industrial roster — Airbus, BMW, ASML — and the Vibe rebrand: the lab now has the model, the agent product, and (in 25 days) the self-owned compute. Sovereignty as a sales pitch clears its first concrete delivery milestone. (2) Pair with EU CADA (yesterday) and the AI Act August 2 cutover (item 11): every European regulated buyer evaluating an LLM platform after July 1 can now answer "where does the training compute live?" without pointing across the Atlantic.
The agent in the IDE matures — and remote MCP grows up
Cursor ships Organizations, a Premium seat (5× usage at 3× cost), and moves Bugbot to usage-based billing
Jun 3The week's most consequential IDE-vendor governance move. On June 3, Cursor pushed three changes that together rewrite the procurement story. (1) Organizations: an Enterprise-only layer letting an admin manage multiple Teams from one pane with separate security, budget, governance and feature controls, plus Groups for flexible cross-team access, spend caps and permissions. (2) A new Premium seat at $96/seat/month annual (vs the $32 Standard) — 5× the included usage at 3× the cost, with Cursor estimating the Composer pool covers a full month of heavy-agent usage for 99% of users. (3) Bugbot moves to pure usage-based billing with configurable review effort and custom review logic; existing customers can opt in early, the default cutover hits at the next renewal after June 8. Two reads. (1) Organizations is the line Cursor needed to credibly pitch a CIO whose comparison set is Copilot Enterprise + Claude Code + Devin Desktop — multi-team rollout was the missing seat-management primitive. (2) Bugbot's switch ratifies what GitHub Copilot's AI-Credits move started: the seat-fee era for agentic dev tools is closing inside a single quarter, and the next compare-the-vendor conversation is about token efficiency, not licence price.
FastMCP 3.4 "the remote release" — fastmcp-remote bridges stdio MCP hosts to HTTPS, OAuth on by default
Jun 2The cleanest single-purpose MCP infra ship of the quarter, from the team that became the de-facto Pythonic MCP standard after PrefectHQ took stewardship. On June 2, FastMCP 3.4.0 landed fastmcp-remote: a standalone bridge that takes a single HTTPS MCP server URL and exposes it locally as stdio — letting Claude Code, Codex, Cursor and other stdio-only hosts talk to remote MCP servers without each vendor rolling its own transport. OAuth is on automatically for HTTPS endpoints, with explicit bearer-token and custom-header escape hatches. The proxy layer itself gets harder: as of 3.4, initialize is part of the forwarded handshake, so a missing backend, wrong URL (server root vs /mcp), denied upstream auth, or non-MCP upstream now fails the downstream initialize instead of producing a "connected" proxy whose capability fetches silently come back empty. FastMCP-issued tokens can also outlive short-lived upstream tokens, keeping sessions alive across the long idle periods remote agent clients are prone to. Two reads. (1) fastmcp-remote is the missing stdio→HTTPS primitive every agent harness has been hand-rolling — and now there's a canonical one with OAuth wired in. (2) Pair with the MCP 2026-07-28 RC stateless lock-in (covered earlier): the protocol's transport story is converging on remote-HTTP with TLS as the substrate, and FastMCP 3.4 is the client-side answer.
Google Antigravity Agent + SDK go public preview through the Gemini API — Gemini 3.5 Flash GAs in Enterprise with native MCP
rollingGoogle's hosted agent runtime finally gets an API surface — and Gemini Enterprise gets MCP. The Antigravity Agent, powered by Gemini 3.5 Flash and using the same harness as the Antigravity IDE, is now in public preview through the Interactions API in Google AI Studio and the Gemini API: a single request triggers an autonomous loop of reasoning, code execution, tool calls and file management, with multimodal inputs and built-in tools including google_search and url_context. The companion Google Antigravity SDK ships in preview, giving developers programmatic access to the same agent harness optimised for Gemini models, hostable on infra of their choice. Separately, Gemini 3.5 Flash goes GA in Gemini Enterprise — the feature-toggle disappears and it becomes the default after June 8, 2026 — and Gemini Enterprise lights up the ability for end users to connect custom MCP servers to their private data (off by default, gated on an org-admin opt-in). Two reads. (1) The Antigravity-via-API path is Google's Bedrock-AgentCore-style move: the agent is no longer only an IDE product, it's a hosted primitive any vendor can call. (2) MCP support landing in Gemini Enterprise closes the loop — Anthropic, OpenAI and now Google all natively speak MCP from their enterprise tier. Three frontier labs, one tool-call protocol; the MCP 2026-07-28 RC ships into a fully unified client surface.
New ventures, new modalities
Apoha exits stealth at $36M — "Liquid State Intelligence" as a third molecular data class
Jun 3The week's most genuinely novel data-layer raise, and an explicit bet that the next agent-grade frontier isn't text. Apoha — a London / San Francisco deep-tech founded in 2021 by Oxford physicist Shamit Shrivastava and ex-Goldman Sachs Anshika Srivastava — emerged from stealth on June 3 at the Frontier Technologies Stage at SXSW London with a cumulative $36M across its 2024 seed and a latest unlettered round, led by Singular with Draper Associates joining and seed backers Redalpine, Seedcamp, Wilbe, Nucleus following on; an Innovate UK grant rounds it out. The product is the interesting part: VIBE® (Variations in Inter-facial Behaviour Under Excitation) suspends a pinhead-sized sample in liquid, applies controlled physical stresses and captures the wave patterns the molecule produces. Apoha calls the readout Liquid State Intelligence and positions it as a third molecular-science data class alongside sequence and structure. Active partnerships: Boehringer Ingelheim and Somru BioScience. Two reads. (1) Every funded AI primitive of the past 18 months has been a token model or an agent harness around one; Apoha is a reminder that the next defensible data layer might be a new modality, not a new architecture. (2) Pair with NVIDIA's Kumo acquisition (item 03): two different platform owners both spending real money to close gaps where LLMs aren't the right primitive — the "text-only AI" frame is closing.
Brian Chesky lines up early funding for a design-first AI lab — staying Airbnb CEO
Jun 4The big-tech-CEO-starts-an-AI-lab pattern picks up another principal. Bloomberg, Fortune and TechCrunch reported on June 4 that Brian Chesky is in early-stage funding talks for a new AI lab focused on AI models and user interaction / design. Chesky will remain CEO of Airbnb and is not taking the CEO role at the new lab; The Information also confirmed he's in talks to back a new AI venture. The thesis maps to Chesky's repeated public stance that AI for travel and e-commerce demands a rich UI rather than a text chat box — explicitly contrasting with OpenAI and Anthropic surfaces. Two reads. (1) Read with the past quarter's Big-Tech-CEO spinouts and lab launches (Recursive Superintelligence, Mistral's industrial roster, ex- Contextual hires at Google DeepMind): the "AI lab founded by an operator with distribution" pattern is hardening into a category. (2) Pair with NVIDIA-Kumo (item 03) and Apoha (item 09): three different parts of this week's news all point at the same conclusion — the next defensible AI products are the ones with a non-text-chat interaction model, and the capital is already moving there.
The clock and the edge
EU AI Act — August 2 cutover: 58 days to high-risk-deployer and GPAI obligations binding
Aug 2Yesterday's brief covered the EU Commission's new CADA + Tech Sovereignty Package, but the clock that's about to tick on every regulated EU deployment is older and harder. On August 2, 2026 — two years after the AI Act's entry into force, and now 58 days away — the bulk of the regulation becomes binding. The package: Articles 9–17 provider requirements for high-risk systems, Article 26 deployer requirements (including FRIAs — Fundamental Rights Impact Assessments, incident reporting, log retention, human oversight guarantees), and Article 50 transparency obligations on which the AI Office is publishing guidance into this window; GPAI (general-purpose AI model) rules — the rulebook for frontier models that may carry systemic risk — apply from the same date. Each Member State must stand up at least one AI regulatory sandbox by August 2. Penalty range: up to €35M or 7% of worldwide annual turnover. The November 2025 Commission proposal to delay parts of this to late 2027 has not been enacted — counsel firms (Holland & Knight, Baker McKenzie, K&L Gates) are uniformly telling US clients to plan for the August date. Two reads. (1) The European buyer's procurement-cycle math is now set: any agent platform shipping in EU regulated workloads after Q3 needs deployer-side incident reporting and FRIA tooling in the product, not in services. (2) Pair with Mistral's end-of-June compute milestone (item 05) and yesterday's CADA: in the same quarter Brussels gives European labs both the operational substrate and the regulatory shape of the demand.
Perplexity + Intel show a hybrid local-cloud inference orchestrator at Computex — Personal Computer agent ships in July
Jun 2The week's cleanest answer to the "where does the agent run?" question. Aravind Srinivas joined Intel CEO Lip-Bu Tan on Intel's Computex 2026 keynote on June 2 to demo what Perplexity is calling the first hybrid local-cloud inference orchestrator: software that decides — in real time, mid-task — which parts of an agent workload run on a local model and which get routed to a frontier cloud model. The on-stage demo processed confidential deal materials through Perplexity's Personal Computer agent on Intel Core Ultra Series 3 silicon; the local model classified each fragment for sensitivity, kept what it should keep, and asked the user before sending the rest. Perplexity says the orchestrator will be chip-agnostic (NVIDIA RTX Spark also a stated target) and lands inside Perplexity Computer in July. Two reads. (1) Pair with NVIDIA's Vera CPU early-adopter list (yesterday) and Pinterest's Trainium commit (item 04): the substrate war is spreading from the data center to the laptop, and the Computex week's quiet consensus is that the same agent needs to talk credibly to both silicon classes. (2) Pair with the EU AI Act August 2 cutover (item 11): "sensitive material stays on device" is exactly the architecture EU deployers will need an answer for once Article 26 obligations bind — and Perplexity is the first major lab to ship a UX-level primitive for it.
← Back to all Spotlight editions
