Latest Briefings
Featured Analysis

Marcel Dances Alone. The Crowd at Lincoln Center Lost Its Mind.

Robert Gaudette has never been to Paris, has no film training, and couldn't get a script greenlit in 25 years. His eight-minute AI film just won the Runway Grand Prix.

Read analysis → 4 min read
Frontier Models

Benchmark Matrix

Source: Artificial Analysis · Updated 2 hours ago · Methodology

Scores are Artificial Analysis's benchmark results in their native scales (0–1 accuracy for the six axes; the composite is on AA's own internal scale). AA periodically re-baselines its benchmark set.

# Model GPQA 0–1.0 HLE 0–1.0 SciCode 0–1.0 IFBench 0–1.0 τ² 0–1.0 TermBenchH 0–1.0 Intelligence Index context
01 Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) Anthropic 0.926 0.533 0.602 0.635 0.985 0.629 59.9
02 Claude Opus 4.8 (Adaptive Reasoning, Max Effort) Anthropic 0.920 0.457 0.535 0.622 0.944 0.583 55.7
03 GPT-5.5 (xhigh) OpenAI 0.935 0.443 0.561 0.759 0.939 0.606 54.8
04 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) Anthropic 0.914 0.396 0.545 0.586 0.886 0.515 53.5
05 GPT-5.5 (high) OpenAI 0.932 0.430 0.559 0.716 0.930 0.599 53.1
06 GLM-5.2 (max) Z AI 0.895 0.401 0.505 0.733 0.991 0.508 51.1
07 GPT-5.5 (medium) OpenAI 0.926 0.406 0.535 0.710 0.918 0.576 50.4
08 Gemini 3.5 Flash (high) Google 0.922 0.410 0.531 0.763 0.953 0.409 50.2

Bars are scaled to a fixed per-column reference (0–1 for accuracy benchmarks), not to the strongest model in the table. The Intelligence Index column shows AA's overall score for context only — it is partly derived from the axes to its left, so it is not bar-rendered.

Today's Themes

What's Moving Today

  1. 01 Whether export controls on frontier AI models create durable competitive moats or simply accelerate indigenous capability development in excluded markets.
  2. 02 How active conflicts are becoming live data environments for military AI doctrine — and which observers are paying the closest attention.
  3. 03 The growing gap between what AI can do for an informed individual navigating a medical system and what that system offers by default.
  4. 04 Whether "Mythos-like" model development in Asia signals genuine capability convergence or a rebranding of existing architectures under political pressure.
The Daily Signal

Today's Briefing

Daily Signal View archive →