Safety

CBS News · 4 min read

Unauthorized parties accessed Claude Mythos — Anthropic investigating (opens in new tab)

CBS News reports Anthropic is investigating a possible breach of its Mythos model via a third-party vendor environment inside the Project Glasswing program. The first visible crack in the restricted-access model since Glasswing launched.

Safety Models

IEEE Spectrum · 9 min read

Stanford's AI Index 2026 — inside the numbers (opens in new tab)

IEEE Spectrum walks through the 2026 AI Index — training compute curves, evaluation saturation, open-weights share, and a sharp rise in domain-specific benchmarks. The least breathless read of the Index so far.

Research Safety Data

Foreign Policy · 11 min read

Project Glasswing and the new cyber calculus (opens in new tab)

Foreign Policy argues Project Glasswing — restricting Claude Mythos to twelve vetted partners for vulnerability research — is already reshaping how national-security staff think about offensive cyber capability in frontier AI. A governance read, not a tech read.

The Zvi · 20 min read

Claude Mythos #2: cybersecurity and Project Glasswing (opens in new tab)

Zvi's second Mythos deep-dive picks apart the Project Glasswing mechanism — what responsible capability overhang looks like in practice, the argument for and against vendor gating, and the uncomfortable questions restraint leaves unanswered. The governance companion to every other Mythos piece this week.

Safety Models Research

Bloomberg · 12 min read

How Anthropic discovered Mythos AI was too dangerous for release (opens in new tab)

Bloomberg's deep-dive on why Anthropic restricted Claude Mythos to Project Glasswing instead of a public release. During red-team evaluation, the model autonomously identified and exploited a previously unknown FreeBSD RCE vulnerability, crossing the company's ASL-4 threshold for cyber-capability. The piece is the clearest public picture yet of how frontier labs are handling capability overhang.

Safety Models Research

Stratechery · 18 min read

Myth and Mythos — restraint as strategy in the next AI era (opens in new tab)

Ben Thompson argues Anthropic's decision to hold back Claude Mythos resets the competitive frame: capability is table stakes, trust is the moat. The piece threads Glasswing, the $800B valuation, and the cyber-capability disclosure into a single strategic narrative about where the frontier goes next. The essential read of the week.

Models Safety

Import AI · 12 min read

Import AI 452: scaling laws for cyberwar (opens in new tab)

Jack Clark traces the emergent cyber-offensive capabilities surfacing in frontier models, with research notes most coverage missed. A necessary complement to the Mythos story from someone with deep context on the alignment landscape.

Alignment Forum · 11 min read

My AGI safety research — 2025 review and 2026 plans (opens in new tab)

An independent safety researcher's candid year-in-review and roadmap. Unusual for its honesty about dead ends and its willingness to name specific open problems for 2026. Perfect companion to the Mythos coverage.

From Week 15, 2026

Bloomberg · 6 min read

OpenAI, Anthropic, Google unite against Chinese AI model theft (opens in new tab)

OpenAI, Anthropic, and Google announced they're sharing intelligence through the Frontier Model Forum to stop adversarial distillation. Three Chinese AI firms are named; Anthropic claims 16M fraudulent Claude exchanges via ~24K fake accounts.

Safety Funding

From Week 15, 2026

Anthropic · 4 min read

Glasswing: Anthropic's cybersecurity initiative with 12 enterprise partners (opens in new tab)

Anthropic launched Glasswing, a cybersecurity consortium with 12 partners, deploying Claude Mythos Preview exclusively through this secure channel. Focus on reducing AI-driven cyberattack risk in critical sectors.

Safety Funding

From Week 14, 2026

Claude Blog · 4 min read

Audit Claude Platform activity with the new Compliance API (opens in new tab)

Anthropic launched a Compliance API for enterprise teams to audit Claude Platform activity, enabling integration with existing compliance and logging infrastructure.

Dev Safety

From Week 14, 2026

Science · 8 min read

Sycophantic AI decreases prosocial intentions across 11 major models (opens in new tab)

Peer-reviewed study in Science demonstrating that sycophantic AI behavior measurably reduces prosocial intentions in users, tested across 11 frontier models. A concrete data point for the alignment conversation.

From Week 14, 2026

Free Systems · 12 min read

Building political superintelligence (opens in new tab)

Andy Hall proposes using AI not to replace political decision- making but to help citizens and institutions perceive reality more sharply, understand tradeoffs, and contest power more effectively.