Research

Papers, benchmarks, evals, academic work

37 links across all digests

From Week 17, 2026

Bloomberg · 4 min read

Jeff Bezos closes in on $10B raise for new AI lab (opens in new tab)

Bezos is closing a $10B round for a new AI lab focused on "understanding the physical world." Reported valuation puts it directly in frontier-lab territory before shipping anything.

Funding Research

From Week 17, 2026

DataCamp · 10 min read

Claude Opus 4.7 benchmark: memory and effort levels tested (opens in new tab)

A Streamlit benchmark harness that measures how Opus 4.7's memory and effort levels trade off on real tasks. Practical numbers for anyone tuning latency-vs-quality knobs in an app.

Dev Models Research

From Week 17, 2026

MIT Technology Review · 10 min read

The current state of AI, told through charts (opens in new tab)

MIT Technology Review distills the macro picture of AI in spring 2026 into a single chart pack — compute trends, model spend, public opinion, and the gap between expert and lay confidence. The clearest at-a-glance reference for the trajectory of the field right now, and the one you'll keep sending to non-AI colleagues asking what's going on.

Research Models Infra

From Week 17, 2026

Crunchbase News · 7 min read

Q1 2026 shatters venture funding records — AI pushes startup investment to $300B (opens in new tab)

Crunchbase's Q1 2026 data shows $300B in global startup funding — the bulk of it AI — making it the largest single quarter on record. The macro context for every "too much capital chasing too few ideas" conversation.

Funding Research

From Week 17, 2026

IEEE Spectrum · 9 min read

Stanford's AI Index 2026 — inside the numbers (opens in new tab)

IEEE Spectrum walks through the 2026 AI Index — training compute curves, evaluation saturation, open-weights share, and a sharp rise in domain-specific benchmarks. The least breathless read of the Index so far.

Research Safety Data

From Week 17, 2026

Foreign Policy · 11 min read

Project Glasswing and the new cyber calculus (opens in new tab)

Foreign Policy argues Project Glasswing — restricting Claude Mythos to twelve vetted partners for vulnerability research — is already reshaping how national-security staff think about offensive cyber capability in frontier AI. A governance read, not a tech read.

Safety Research

From Week 17, 2026

The Zvi · 20 min read

Claude Mythos #2: cybersecurity and Project Glasswing (opens in new tab)

Zvi's second Mythos deep-dive picks apart the Project Glasswing mechanism — what responsible capability overhang looks like in practice, the argument for and against vendor gating, and the uncomfortable questions restraint leaves unanswered. The governance companion to every other Mythos piece this week.

Safety Models Research

From Week 17, 2026

Latent Space · 14 min read

Scaling without slop (opens in new tab)

Swyx's take on what "slop" actually means for AI engineering — and why the next year of agent work will be decided by who can scale quality-control without drowning in noise. The kind of piece that quietly becomes a reference in team Slacks.

Agents Workflow Research

From Week 16, 2026

Bloomberg · 12 min read

How Anthropic discovered Mythos AI was too dangerous for release (opens in new tab)

Bloomberg's deep-dive on why Anthropic restricted Claude Mythos to Project Glasswing instead of a public release. During red-team evaluation, the model autonomously identified and exploited a previously unknown FreeBSD RCE vulnerability, crossing the company's ASL-4 threshold for cyber-capability. The piece is the clearest public picture yet of how frontier labs are handling capability overhang.

Safety Models Research

From Week 16, 2026

MIT Technology Review · 10 min read

Why opinion on AI is so divided (opens in new tab)

MIT Tech Review unpacks the growing gap between expert enthusiasm (56% excited) and public skepticism (10% excited) using Stanford's 2026 AI Index data. A clear-eyed look at why a technical story and a social story have diverged.

Research Models

From Week 16, 2026

MIT Technology Review · 8 min read

Want to understand the current state of AI? Check out these charts (opens in new tab)

Data-rich companion piece breaking down benchmark compression, the narrowed US-China gap (now under 3%), and adoption curves across industries. Useful visual reference for any AI briefing.

Research Models

From Week 16, 2026

Import AI · 12 min read

Import AI 452: scaling laws for cyberwar (opens in new tab)

Jack Clark traces the emergent cyber-offensive capabilities surfacing in frontier models, with research notes most coverage missed. A necessary complement to the Mythos story from someone with deep context on the alignment landscape.

Safety Research

From Week 16, 2026

Alignment Forum · 11 min read

My AGI safety research — 2025 review and 2026 plans (opens in new tab)

An independent safety researcher's candid year-in-review and roadmap. Unusual for its honesty about dead ends and its willingness to name specific open problems for 2026. Perfect companion to the Mythos coverage.

Safety Research

From Week 15, 2026

Understanding AI · 12 min read

Why AI reasoning models are hitting diminishing returns (opens in new tab)

The industry is shifting from scaling language models to deployment and real-world reasoning. Raw capability gains plateau as infrastructure becomes the constraint. What comes next after the era of bigger models?

Models Research

From Week 15, 2026

HUMAI · 6 min read

Healthcare AI market projected to hit $45B by 2026 (opens in new tab)

Healthcare AI adoption is accelerating, with market growth to $45B. Financial services, retail, and healthcare show strongest ROI. But industry consolidation is defining winners vs. vaporware.

Models Research