MODEL SIGNAL · META
Llama 4 Behemoth
Meta's teacher model (still training) — 288B active params, 16 experts, ~2T total params. Outperforms GPT-4.5 and Claude Sonnet 3.7 on STEM benchmarks. Used to distill Scout and Maverick.
CATEGORYReasoning
CONTEXT1M
RELEASEDApril 5, 2025
Key Features
- 288B active / ~2T total params
- Best STEM benchmarks in class
- Distillation teacher for Llama 4
- MoE 16 experts
- Not yet publicly released
Read the Model Signal report →
The Model Signal report for Llama 4 Behemoth is still compounding. It synthesizes automatically from Signal + Noise wire coverage and the provider announcement — check back as the wire mentions this model.