Mellum2 and Aion go open — the layer below the frontier turns commodity — Spotlight · 2026-06-08

← All editions

Yesterday's brief was the front of the pack — Codex 5.2, Cursor SDK, Supabase's $10.5B round, the FAccT'26 audit and the Claude Code GitHub Action CVE all in one 72-hour compression at the back of the window. Today's brief rewinds to the front of the same window — June 1–2, the Computex/Build week that just ended — and finds the layer below the frontier model gone commodity in three different ways at once. (1) The model layer goes open. On June 2, JetBrains open-sources Mellum2: a 12B-parameter Mixture-of-Experts coding model that activates only 2.5B per token, 64 experts × 8 active, Apache 2.0, trained from scratch on ~10.6T tokens — shipped not as a chatbot but as the focal model for routing, RAG, sub-agents and high-throughput coding inside any harness, with sub-second inference targets. The first frontier-grade coding model JetBrains has put on Hugging Face. The same day, Microsoft ships Aion 1.0 Instruct in Edge Insider: a CPU-only on-device SLM that runs without an NPU or dedicated GPU, smaller and faster than the current Windows OS SLM — open weights to follow on Hugging Face in July 2026 — pairing it with the 14B Aion 1.0 Plan reasoner already in-box on capable Windows machines. (2) The action plane gets paid. On June 1, Zoom launches ZoomMate at $20/user/month — "the first AI teammate," an agentic work surface that turns a meeting into completed workflows in Salesforce, Jira, Slack and ServiceNow, with records in Workday, events on Google Calendar and Outlook, and finished decks/docs/spreadsheets. NA online + direct on day one; EMEA and APAC to follow. (3) The governance layer hardens. On June 2, Harness acquires Codecov from Sentry — coverage intelligence is embedded directly into Harness's Software Delivery Knowledge Graph, so a release pipeline can refuse to ship unreviewed AI-generated code in the same gating step that already checks deployment, security and incident telemetry; Codecov stays free for OSS. The back half of the window is the OSS churn: Claude Code mainline clears v2.1.165 → v2.1.168 in 48 hours (fallback models, broader deny-rule globs, cross-session message security); microsoft/agent-framework python-1.8.0 adds MCP skills discovery, progressive tool exposure, background agents and file access; lobehub rebrands itself as a "Chief Agent Operator"; rtk-ai/rtk ships a Rust CLI proxy claiming 60–90% token-reduction on common dev commands; farion1231/cc-switch hits v3.16.1 as a single keyring for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI and Hermes Agent; and affaan-m/ECC ships an "operator system" with 64 specialised agents, 261 skills and a Rust control plane. The picture: the cliffhanger has flipped. The frontier-model headline is now a known quantity for two weeks running — the news this week is that the layer underneath it is simultaneously open, on-device, paid for, and operated.

The lead — open coding models go local

JetBrains open-sources Mellum2 — a 12B Mixture-of-Experts coding model with 2.5B active per token, Apache 2.0, 64 experts × 8 active, ~10.6T training tokens

Jun 2

The most legible "open the coding model layer" the JetBrains AI team has shipped to date, and the cleanest play yet for any team that wants a frontier-grade coding model under its own roof. On June 2, JetBrains released Mellum2 on Hugging Face under Apache 2.0: a 12B- parameter Mixture-of-Experts model with 64 experts, only 8 active per token, 2.5B active parameters at inference — trained from scratch on ~10.6T tokens of natural language and code. The framing in the JetBrains AI blog matters: this is not the next-generation general chatbot, it's a focal model — fast, well- scoped, optimised for sub-second inference inside larger AI systems for routing, RAG, summarisation, sub-agents and high-throughput coding. Unlike Mellum-1, which only did code completion, Mellum2 functions as a full coding assistant — generates and edits code, calls external tools, executes multi-step agentic workflows, holds long conversations, and supports explicit reasoning. vllm's v0.22.1 patch on June 5 ships day-one Mellum2 support so the OSS inference stack is already wired in. Two reads. (1) The strategic tell is the focal-model framing — JetBrains is signalling that the next generation of agent harnesses won't run a single frontier model end-to-end; they'll route between a frontier "planner" (Claude Opus / GPT-5.2) and a fleet of cheap, fast focal models doing the 90% of work that doesn't need a planner. Mellum2 is the first credible open-weights candidate for that role from a vendor that actually ships an IDE. (2) Pair with item 02: the same week Microsoft opens its on-device SLM and JetBrains opens its in-IDE focal model is the week the under-the-frontier model layer stopped being a closed-source moat.

JetBrains AI Blog — Mellum2 goes open source ↗ Hugging Face — Introducing Mellum2 ↗ MarkTechPost — Mellum2, a 12B MoE for multi-model pipelines ↗ The New Stack — JetBrains opens Mellum2 to go where Claude Code can't ↗ vLLM v0.22.1 — Mellum2 support ↗

Microsoft Aion 1.0 Instruct ships in Edge Insider — a CPU-only on-device SLM, open weights on Hugging Face in July, no NPU or dedicated GPU required

Jun 2

Microsoft's first-party answer to the same question Mellum2 (item 01) answers from the IDE side, framed from the OS side. On June 2 at Build 2026, the Windows developer blog introduced Aion 1.0 Instruct: Microsoft's next-generation small language model, "smaller, faster and more efficient" than the current Windows OS SLM, built ground-up for on-device workloads. The disclosures that matter. (1) Runs on CPU — no NPU, no dedicated GPU required — which means it works on far more PCs than any prior Windows AI model, including most existing laptops in the installed base. (2) Powers everyday text intelligence (summarisation, rewrite, intents, accessibility) and extends beyond Windows APIs into the Edge browser through the new on-device APIs Microsoft also announced at Build. (3) Open weights — preview today in Edge Insider channels, full open-source drop on Hugging Face in July 2026. It pairs with the 14B Aion 1.0 Plan reasoner already shipping in-box on capable Windows machines (see yesterday's brief). Two reads. (1) Pair with item 01: Mellum2 is the focal-model layer for the IDE; Aion 1.0 Instruct is the focal-model layer for the OS. The same week, two of the biggest opinions about where the agent runs both shipped small, fast, open models for the work the frontier planner shouldn't be doing. (2) The CPU-only constraint is the strategic read — Microsoft is signalling that its on-device AI bet is not gated on Copilot+ / NPU-class hardware. Aion 1.0 Instruct is what an agent on a 2022 ThinkPad can call locally before it ever reaches for a cloud frontier model.

Microsoft Edge Dev Blog — On-device AI in Edge: new models & APIs ↗ AI Weekly — Microsoft unveils Aion 1.0 on-device AI at Build 2026 ↗ byteiota — Windows Aion 1.0: on-device AI inside Windows itself ↗ Windows Forum — Aion 1.0 Instruct, Plan and new APIs ↗

The action plane gets paid

Zoom launches ZoomMate — "the first AI teammate" at $20/seat, agentic execution in Salesforce, Jira, Slack, ServiceNow, Workday and Outlook/Google Calendar

Jun 1

The cleanest "the meeting is the agent surface" launch from a non-Microsoft, non-Google vendor this quarter — and the first one that prints a real seat price for it. On June 1, Zoom launched ZoomMate: "the first AI teammate built to turn conversations into completed work." Built on the system-of-action vision Zoom trailed in March, ZoomMate is an agentic work surface that pairs live conversational context to three things at once. (1) Search across Zoom, the web, and connected third-party systems — pulling from Salesforce, ServiceNow, Workday and enterprise file stores. (2) Orchestrate & execute: agents monitor ongoing projects, identify next steps from meeting context, schedule events on Google Calendar or Outlook, route requests to the right systems, update records, create follow-up tasks, draft customer communications, and trigger onboarding or support workflows. Real execution in Salesforce, Jira, Slack, ServiceNow and more, not just suggestions. (3) Complete: turns meeting and enterprise context into finished decks, documents, spreadsheets and project plans. Pricing: $20/user/month with AI credits included, available today to NA online + direct customers; EMEA and APAC later this year. Two reads. (1) ZoomMate is the first "meeting → workflow" agent priced like an enterprise seat rather than a Copilot add-on — which is the actual moment the agent leaves the chatbox and starts showing up in budget review. (2) Pair with the AI Agent Index from yesterday: Zoom is now selling an agent with first-class write access to Salesforce, Slack and ServiceNow at $20/seat — the audit question is no longer abstract. Whether ZoomMate ships disclosures in line with the index's eight fields is the read the second half of 2026 will turn on.

Zoom — Zoom launches ZoomMate ↗ StockTitan — ZoomMate at $20/month ↗ TechRepublic — Zoom introduces ZoomMate, an AI teammate for post-meeting work ↗ MarTech Series — ZoomMate ↗

Software delivery gets governed

Harness acquires Codecov from Sentry — coverage intelligence becomes part of the Software Delivery Knowledge Graph, AI-era release governance

Jun 2

The cleanest "governance-by-acquisition" move yet in the AI-coding wave. On June 2, Harness announced it acquired Codecov from Sentry. Terms undisclosed. The product framing is the news. (1) Codecov's coverage intelligence is embedded directly into Harness's delivery pipeline, becoming a new layer in the Software Delivery Knowledge Graph alongside deployment signals, security reachability analysis, incident telemetry and change history — i.e. a release gate can refuse to ship a change that AI authored if test coverage on the affected lines is below threshold, in the same gating step that already checks the other risk surfaces. (2) Codecov stays free for OSS: Harness explicitly committed to preserving the existing Codecov experience and continuing investment in the open-source side. The pitch on the press wire: "speed without visibility can introduce new quality and release risk" — code-coverage data tied to the specific repositories and changes flowing through the pipeline. Two reads. (1) Pair with yesterday's Salt Code launch and the Claude Code GitHub Action CVE: the audit layer is now consolidating from three angles at once — policy enforcement (Salt Code, MCP-shaped), provenance / coverage (Harness × Codecov), and per-system safety disclosure (the FAccT'26 index). Coverage was the last missing leg. (2) The deal is a quiet vote on the long-term shape of the AI-coding market: it says the buyer's question is no longer "which agent should I use" but "what is the AI-aware delivery pipeline" — and Harness wants to be the answer to that, not the answer to the first.

PR Newswire — Harness acquires Codecov from Sentry ↗ Harness Press — Acquires Codecov ↗ Codecov — A new chapter for Codecov ↗ DevOps.com — Harness acquires Codecov to identify untested code ↗

The agent harnesses keep churning

Update — Claude Code v2.1.165 → v2.1.168: fallback models, broader deny-rule globs, cross-session message security, agents filtering

Jun 5–7

The Claude Code mainline cleared four point releases in ~48 hours across June 5–7 — v2.1.165, v2.1.166, v2.1.167, v2.1.168 — the densest churn since the v2.1.150s sprint last month. Per Anthropic's public release notes, the four versions together ship: fallback models (a primary model can now degrade to a configured secondary inside a single Claude Code session without user intervention); broader deny-rule glob support (the same glob syntax permission rules already use for allow-lists, extended to deny-lists); stronger cross-session message security; more reliable thinking-mode controls; improved retries, update messaging and an agents filter in the sub-agent view; and a wide pass of fixes across the terminal, auth, session and UI surfaces. Two reads. (1) The fallback-models change is the headline — it's the first time Claude Code has built native support for "use a cheaper model for the next turn if this one is rate-limited or failing," exactly the routing pattern Mellum2 and Aion 1.0 are being shipped to fit into (items 01–02). (2) The deny-rule glob change is the quieter governance read: paired with yesterday's Salt Code / CVE / FAccT'26 wave, the harness-side answer to "the agent's authority is the attack surface" is to make the deny language as expressive as the allow language. That arrived this week without fanfare.

GitHub — anthropics/claude-code releases ↗ Releasebot — Anthropic release notes (June 2026) ↗

Update — microsoft/agent-framework python-1.8.0: MCP skills discovery, progressive tool exposure, background agents, file access

Jun 4

The 1.6 → 1.7 → 1.8 arc on microsoft/agent-framework in three weeks is the cleanest signal that the .NET-and-Python side of the agent stack is now shipping at the same cadence as the Anthropic / OpenAI / Cursor harnesses. On June 4, python-1.8.0 landed on PyPI and GitHub with four notable additions. (1) MCP skills discovery: the framework can now enumerate skills exposed by connected MCP servers and surface them to the agent as a routable list, instead of requiring the developer to pre-wire each skill by hand — the exact primitive microsoft/apm (yesterday's item 09) needs to be a real package registry. (2) Progressive tool exposure: the agent doesn't see the full tool catalogue on turn one; tools are revealed in staged sets as the conversation establishes context, an answer to the well-documented "100-tool prompt overflow" problem. (3) Background agents ship as a first-class abstraction. (4) File access: new AgentFileStore and FileAccessProvider primitives standardise how agents persist and read files across sessions. Two reads. (1) Pair with item 01: the harness shipping MCP skills discovery the same week JetBrains opens Mellum2 as a focal model is exactly the shape of the new agent stack — small open models, declared skills, progressive tool exposure. (2) The release cadence itself is the moat. Microsoft Agent Framework has now shipped five Python minor releases in seven weeks. That's enough velocity to credibly host the .NET enterprise side of the agent stack.

GitHub — microsoft/agent-framework releases ↗ PyPI — agent-framework 1.8.0 ↗ Microsoft Agent Framework dev blog ↗

The OSS agent layer keeps moving

lobehub/lobehub — "Chief Agent Operator," organising agents into 7×24 operations by hiring, scheduling and reporting on an entire AI team

trending

The cleanest "the agent is now a workforce" framing shipped this week, and a meaningful rebrand from one of the most-starred OSS chat front-ends. The project formerly known as LobeChat rebrands as LobeHub with explicit positioning as a "Chief Agent Operator" — a control panel that handles hiring (provisioning an agent), scheduling (when each agent runs and on what triggers) and reporting (what each agent did, in operations terms) across an entire AI team running 24×7. The pitch is not "yet another agent harness" — it's the operator layer above the harness, modelled on how a real ops team manages a real on-call rotation. Two reads. (1) Pair with item 03 (ZoomMate): the meeting-as-action-surface and the ops-console-as-action-surface are now converging from two different angles on the same job — "the agent should be visible in operational language, not chat language." (2) The naming choice ("Chief Agent Operator") is itself the strategic tell. The first vendor to credibly own the operator title for the multi-agent era will be the first vendor enterprise teams default-buy when an internal AI deployment crosses the four- or five-agent threshold.

GitHub — lobehub/lobehub ↗

rtk-ai/rtk — a Rust CLI proxy claiming 60–90% LLM token reduction on common dev commands, single binary, zero dependencies

trending

The single most "the bill is the product" OSS project to land this week, and the one whose claim deserves independent benchmarking before any large team deploys it. rtk-ai/rtk ships a Rust CLI proxy that sits between an agent and the model, intercepting common developer commands (build / test / lint / git / file-read shapes) and rewriting them into a compressed form claimed to reduce LLM token consumption by 60–90% on those commands — single binary, zero external dependencies, drop-in for any agent that shells out. Two reads. (1) The category is real even if the specific number isn't yet verified by third parties — every team running Claude Code or Codex at scale is now bleeding tokens to the same repetitive shell shapes (the "list me the repo, read this file, run the tests" turn that costs 10–40K tokens per cycle). A token-reduction proxy is the obvious productisation. (2) Pair with item 01 (Mellum2 as focal model) and item 02 (Aion 1.0 Instruct as on-device SLM): the under-the-frontier economics are now under attack from three angles — open small models, on-device CPU inference, and proxy-side token compression. Treat the rtk 60–90% number as a claim to verify before citing in budget — but the trend it's surfing is very real.

GitHub — rtk-ai/rtk ↗

farion1231/cc-switch — a single desktop app to manage Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI and Hermes Agent (50+ provider presets, MCP and Skills unified)

v3.16.1 · Jun 1

The cleanest "every coding agent on one keyring" project to ship this window, and the shape the multi-CLI era clearly wants. farion1231/cc-switch — built on Tauri 2 (Rust backend) with React 18 front-end, available on Windows / macOS / Linux — centralises configuration for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI, Hermes Agent and other CLIs the senior developer juggles, with 50+ built-in provider presets, one-click provider import, instant switching, unified MCP and Skills management, system-tray quick-switching, cloud sync, usage tracking and a SQLite-backed config store designed to prevent the manual-config-file corruption that has been the open wound of the multi-CLI workflow for eighteen months. Trending hard on GitHub, with v3.16.1 shipped on June 1. Two reads. (1) Pair with yesterday's fleet-console items (superset / agent-deck / herdr): those are the runtime multiplexers — many agents running in parallel; cc-switch is the configuration multiplexer — one set of API keys, MCP wiring and skills, used by whichever agent the developer reaches for. The configuration plane and the execution plane are both being productised at OS level. (2) The explicit OpenClaw / Hermes Agent presets are the clearest single piece of evidence this week that the open-source agent CLIs are now first-class targets alongside Claude Code and Codex in the developer-tooling ecosystem.

GitHub — farion1231/cc-switch ↗ cc-switch v3.16.1 release ↗

affaan-m/ECC — a harness-native operator system: 64 specialised agents, 261 skills, 84 legacy command shims, AgentShield security scanning, cross-harness from day one

v2.0.0-rc.1

The most ambitious single-maintainer OSS attempt this week at the question "what is the operator layer across coding agents." affaan-m/ECC describes itself as "the harness-native operator system for agentic work" — explicitly built from real-world multi-harness engineering workflows accumulated over ten months of intensive daily use building real products. The numbers are the framing: 64 specialised agents, 261 skills, 84 legacy command shims, with native cross- harness support for Claude Code, Cursor, OpenCode, Codex and other AI harnesses. Memory optimisation across sessions, "skills and instincts" framing for continuous learning, integrated AgentShield security scanning, and a research-first development workflow are all first-class. v2.0.0-rc.1 landed in April with a desktop GUI dashboard and a Rust control-plane prototype; weekly updates through the window. Two reads. (1) Pair with item 07 (lobehub as Chief Agent Operator): two completely independent attempts at the same job — "operate a fleet of agents in production" — both crossing production-readiness this week. The naming converges on the same word: operator. The agent-operator surface is now a category the OSS world is trying to win in public. (2) The framing as "harness-native" is the strategic tell — ECC is not pitched as an alternative to Claude Code or Cursor, it's pitched as the layer above them. The same shape Microsoft/apm productised on the package side, ECC productises on the operator side.

GitHub — affaan-m/ECC ↗

Compiled 2026-06-08 from the JetBrains AI blog's Mellum2 goes open source post (June 2), the Hugging Face launch post, MarkTechPost and The New Stack on the 12B MoE / 2.5B-active / Apache 2.0 / 10.6T- token disclosures, with vLLM v0.22.1 for day-one Mellum2 inference; the Microsoft Edge Dev Blog's on-device-AI post, AI Weekly, byteiota and theWinCentral on Aion 1.0 Instruct's CPU-only constraint and the July open-weights drop; Zoom's own ZoomMate launch post with StockTitan, TechRepublic and MarTech Series on the $20-a-seat agentic-work-surface frame; PR Newswire and Harness Press on Harness × Codecov with the Codecov "new chapter" post and DevOps.com on the AI-era delivery-governance read; GitHub and Releasebot on the Claude Code v2.1.165 → v2.1.168 sprint and the Microsoft Agent Framework python-1.8.0 notes (PyPI / dev blog); and the GitHub repos for lobehub/lobehub, rtk-ai/rtk, farion1231/cc-switch and affaan-m/ECC on the OSS agent-operator layer. Window of Jun 1 – Jun 8. Numbers, version tags and named partners are as reported by the primary sources at compile time; trending-repo star counts are noted as "trending" rather than printed where the value would mislead. Hand-curated; corrections → jay@jfound.net.

← Back to all Spotlight editions

Mellum2 and Aion go open— the layer below the frontier turns commodity.

The lead — open coding models go local

JetBrains open-sources Mellum2 — a 12B Mixture-of-Experts coding model with 2.5B active per token, Apache 2.0, 64 experts × 8 active, ~10.6T training tokens

Microsoft Aion 1.0 Instruct ships in Edge Insider — a CPU-only on-device SLM, open weights on Hugging Face in July, no NPU or dedicated GPU required

The action plane gets paid

Zoom launches ZoomMate — "the first AI teammate" at $20/seat, agentic execution in Salesforce, Jira, Slack, ServiceNow, Workday and Outlook/Google Calendar

Software delivery gets governed

Harness acquires Codecov from Sentry — coverage intelligence becomes part of the Software Delivery Knowledge Graph, AI-era release governance

The agent harnesses keep churning

Update — Claude Code v2.1.165 → v2.1.168: fallback models, broader deny-rule globs, cross-session message security, agents filtering

Update — microsoft/agent-framework python-1.8.0: MCP skills discovery, progressive tool exposure, background agents, file access

The OSS agent layer keeps moving

lobehub/lobehub — "Chief Agent Operator," organising agents into 7×24 operations by hiring, scheduling and reporting on an entire AI team

rtk-ai/rtk — a Rust CLI proxy claiming 60–90% LLM token reduction on common dev commands, single binary, zero dependencies

farion1231/cc-switch — a single desktop app to manage Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI and Hermes Agent (50+ provider presets, MCP and Skills unified)

affaan-m/ECC — a harness-native operator system: 64 specialised agents, 261 skills, 84 legacy command shims, AgentShield security scanning, cross-harness from day one

Mellum2 and Aion go open
— the layer below the frontier turns commodity.