Models

LLMs, model releases, fine-tuning

44 links across all digests

From Week 17, 2026

Bloomberg · 6 min read

DeepSeek V4 Pro and Flash arrive — open-weights frontier at a fraction of the price (opens in new tab)

DeepSeek dropped V4 in two preview models today — V4-Pro (1.6T total parameters, 49B active) and V4-Flash (284B, 13B active), both 1M-context MoE under an MIT license. Pro is now the largest open-weights model in circulation, larger than Kimi K2.6 and GLM-5.1, and its $1.74-per-million-input pricing undercuts every Western frontier. Self-reported benchmarks trail GPT-5.4 and Gemini-3.1-Pro by three-to-six months — but at this price gap, the spread barely matters for most production workloads.

Models OSS Infra

From Week 17, 2026

TechCrunch · 5 min read

OpenAI releases GPT-5.5, betting on the AI super app (opens in new tab)

GPT-5.5 lands with stronger coding, longer-horizon reasoning, and noticeably better agentic behavior — rolling out to Plus, Pro, Business and Enterprise this week. Framed as a step toward the ChatGPT super app strategy.

Models Agents

From Week 17, 2026

CBS News · 4 min read

Unauthorized parties accessed Claude Mythos — Anthropic investigating (opens in new tab)

CBS News reports Anthropic is investigating a possible breach of its Mythos model via a third-party vendor environment inside the Project Glasswing program. The first visible crack in the restricted-access model since Glasswing launched.

Safety Models

From Week 17, 2026

Product Compass · 18 min read

The ultimate guide to Claude Opus 4.7 (opens in new tab)

The most thorough walkthrough yet of Opus 4.7's Adaptive Thinking, the new effort levels, and the 1M-context workflow changes. Covers when to reach for which effort level, how Adaptive Thinking actually allocates compute, and ten concrete workflows that are newly viable. The kind of practitioner read you come back to twice while rewiring your agents.

Dev Workflow Models

From Week 17, 2026

DataCamp · 10 min read

Claude Opus 4.7 benchmark: memory and effort levels tested (opens in new tab)

A Streamlit benchmark harness that measures how Opus 4.7's memory and effort levels trade off on real tasks. Practical numbers for anyone tuning latency-vs-quality knobs in an app.

Dev Models Research

From Week 17, 2026

MIT Technology Review · 10 min read

The current state of AI, told through charts (opens in new tab)

MIT Technology Review distills the macro picture of AI in spring 2026 into a single chart pack — compute trends, model spend, public opinion, and the gap between expert and lay confidence. The clearest at-a-glance reference for the trajectory of the field right now, and the one you'll keep sending to non-AI colleagues asking what's going on.

Research Models Infra

From Week 17, 2026

Understanding AI · 8 min read

Meta is back in the LLM game after a year-long break (opens in new tab)

A clear read on why Meta's Muse Spark launch represents a real re-entry to frontier LLMs, not a rebrand of existing work. Distribution across WhatsApp, Instagram, and Messenger is the quiet moat.

Models Funding

From Week 17, 2026

Simon Willison · 6 min read

DeepSeek V4 — almost on the frontier, a fraction of the price (opens in new tab)

The clearest practitioner teardown of DeepSeek V4-Pro and V4-Flash that exists the morning they shipped — pricing comparison table, quantization notes, MoE activation math, and the pelican-on-a-bicycle test. The single pricing table alone reframes every model-selection conversation you'll have next week: Flash at $0.14 input beats GPT-5.4 Nano, Pro at $1.74 input beats every other frontier model outright.

Models OSS Infra

From Week 17, 2026

The Zvi · 20 min read

Claude Mythos #2: cybersecurity and Project Glasswing (opens in new tab)

Zvi's second Mythos deep-dive picks apart the Project Glasswing mechanism — what responsible capability overhang looks like in practice, the argument for and against vendor gating, and the uncomfortable questions restraint leaves unanswered. The governance companion to every other Mythos piece this week.

Safety Models Research

From Week 17, 2026

Bismarck Analysis · 12 min read

AI 2026: Mistral will rise as compute is unleashed (opens in new tab)

A contrarian argument from a geopolitical-strategy shop: Mistral is the structurally best-placed lab to benefit from the coming compute surplus, because European sovereignty buyers will pay a premium to not depend on US hyperscalers.

Models Infra Funding

From Week 16, 2026

Bloomberg · 12 min read

How Anthropic discovered Mythos AI was too dangerous for release (opens in new tab)

Bloomberg's deep-dive on why Anthropic restricted Claude Mythos to Project Glasswing instead of a public release. During red-team evaluation, the model autonomously identified and exploited a previously unknown FreeBSD RCE vulnerability, crossing the company's ASL-4 threshold for cyber-capability. The piece is the clearest public picture yet of how frontier labs are handling capability overhang.

Safety Models Research

From Week 16, 2026

Anthropic · 4 min read

Claude Opus 4.7 ships with stronger coding, better vision, and effort controls (opens in new tab)

Claude Opus 4.7 launched at the same pricing as 4.6 with meaningful gains on software engineering benchmarks, improved vision, and new "effort" controls that let developers tune reasoning depth per call.

Models Dev

From Week 16, 2026

Stratechery · 18 min read

Myth and Mythos — restraint as strategy in the next AI era (opens in new tab)

Ben Thompson argues Anthropic's decision to hold back Claude Mythos resets the competitive frame: capability is table stakes, trust is the moat. The piece threads Glasswing, the $800B valuation, and the cyber-capability disclosure into a single strategic narrative about where the frontier goes next. The essential read of the week.

Models Safety

From Week 16, 2026

MIT Technology Review · 10 min read

Why opinion on AI is so divided (opens in new tab)

MIT Tech Review unpacks the growing gap between expert enthusiasm (56% excited) and public skepticism (10% excited) using Stanford's 2026 AI Index data. A clear-eyed look at why a technical story and a social story have diverged.

Research Models

From Week 16, 2026

MIT Technology Review · 8 min read

Want to understand the current state of AI? Check out these charts (opens in new tab)

Data-rich companion piece breaking down benchmark compression, the narrowed US-China gap (now under 3%), and adoption curves across industries. Useful visual reference for any AI briefing.

Research Models