CBS News reports Anthropic is investigating a possible breach
of its Mythos model via a third-party vendor environment inside
the Project Glasswing program. The first visible crack in the
restricted-access model since Glasswing launched.
IEEE Spectrum walks through the 2026 AI Index — training
compute curves, evaluation saturation, open-weights share, and
a sharp rise in domain-specific benchmarks. The least
breathless read of the Index so far.
Foreign Policy argues Project Glasswing — restricting Claude
Mythos to twelve vetted partners for vulnerability research —
is already reshaping how national-security staff think about
offensive cyber capability in frontier AI. A governance read,
not a tech read.
Zvi's second Mythos deep-dive picks apart the Project
Glasswing mechanism — what responsible capability overhang
looks like in practice, the argument for and against vendor
gating, and the uncomfortable questions restraint leaves
unanswered. The governance companion to every other Mythos
piece this week.
Bloomberg's deep-dive on why Anthropic restricted Claude Mythos to
Project Glasswing instead of a public release. During red-team
evaluation, the model autonomously identified and exploited a
previously unknown FreeBSD RCE vulnerability, crossing the company's
ASL-4 threshold for cyber-capability. The piece is the clearest public
picture yet of how frontier labs are handling capability overhang.
Ben Thompson argues Anthropic's decision to hold back Claude Mythos
resets the competitive frame: capability is table stakes, trust is
the moat. The piece threads Glasswing, the $800B valuation, and the
cyber-capability disclosure into a single strategic narrative about
where the frontier goes next. The essential read of the week.
Jack Clark traces the emergent cyber-offensive capabilities surfacing
in frontier models, with research notes most coverage missed. A
necessary complement to the Mythos story from someone with deep
context on the alignment landscape.
An independent safety researcher's candid year-in-review and roadmap.
Unusual for its honesty about dead ends and its willingness to name
specific open problems for 2026. Perfect companion to the Mythos
coverage.
OpenAI, Anthropic, and Google announced they're sharing intelligence
through the Frontier Model Forum to stop adversarial distillation.
Three Chinese AI firms are named; Anthropic claims 16M fraudulent
Claude exchanges via ~24K fake accounts.
Anthropic launched Glasswing, a cybersecurity consortium with 12 partners,
deploying Claude Mythos Preview exclusively through this secure channel.
Focus on reducing AI-driven cyberattack risk in critical sectors.
Anthropic launched a Compliance API for enterprise teams to audit
Claude Platform activity, enabling integration with existing
compliance and logging infrastructure.
Peer-reviewed study in Science demonstrating that sycophantic
AI behavior measurably reduces prosocial intentions in users,
tested across 11 frontier models. A concrete data point for the
alignment conversation.
Andy Hall proposes using AI not to replace political decision-
making but to help citizens and institutions perceive reality more
sharply, understand tradeoffs, and contest power more effectively.
The chief architect of Lean argues that formal mathematical
verification becomes essential infrastructure as AI generates
more of the world's code. When proof is cheap, it becomes
the stronger path to correctness.
Anthropic partnered with Mozilla to unleash Claude on the Firefox
codebase, where it found 22 previously unknown vulnerabilities —
14 rated high-severity. The bugs ranged from use-after-free memory
errors to potential sandbox escapes. It's the most compelling
demonstration yet that frontier models can do meaningful security
research, not just generate code.