Llama 4 Behemoth

Meta's teacher model (still training) — 288B active params, 16 experts, ~2T total params. Outperforms GPT-4.5 and Claude Sonnet 3.7 on STEM benchmarks. Used to distill Scout and Maverick.

CATEGORYReasoning

CONTEXT1M

RELEASEDApril 5, 2025

Key Features

288B active / ~2T total params
Best STEM benchmarks in class
Distillation teacher for Llama 4
MoE 16 experts
Not yet publicly released

Provider announcement →

Read the Model Signal report →

The Model Signal report for Llama 4 Behemoth is still compounding. It synthesizes automatically from Signal + Noise wire coverage and the provider announcement — check back as the wire mentions this model.