0
MODEL SIGNAL · META

Llama 4 Behemoth

Meta's teacher model (still training) — 288B active params, 16 experts, ~2T total params. Outperforms GPT-4.5 and Claude Sonnet 3.7 on STEM benchmarks. Used to distill Scout and Maverick.

CATEGORYReasoning
CONTEXT1M
RELEASEDApril 5, 2025
Key Features
  • 288B active / ~2T total params
  • Best STEM benchmarks in class
  • Distillation teacher for Llama 4
  • MoE 16 experts
  • Not yet publicly released

Provider announcement →

Read the Model Signal report →

The Model Signal report for Llama 4 Behemoth is still compounding. It synthesizes automatically from Signal + Noise wire coverage and the provider announcement — check back as the wire mentions this model.