Model Signal
Model releases, translated into operating decisions.
A recurring Signal + Noise series on frontier, open, multimodal, and specialist models — what changed, where each model fits, what breaks, and what teams should do next.
VibeThinker-3B
VibeThinker-3B is a 3-billion-parameter open-weights reasoning model developed by WeiboAI.
VibeThinker-3B
VibeThinker-3B is a 3-billion-parameter open-weights reasoning model developed by WeiboAI.
GLM-5.2
GLM-5.2 is a 753-billion parameter open-weights large language model from Z.ai that is specifically optimized for long-horizon coding tasks.
Varya
An open-weight video generation model optimized for Indian cultural contexts and highly cost-efficient inference.
Claude Fable 5
Anthropic's most powerful publicly released model — its first Mythos-class model available to enterprise and paid users. Leads on SWE-bench, knowledge work, and scientific research, scoring 10%+ above Opus 4.8 on key benchmarks.
Claude Mythos 5
Anthropic's frontier Claude Mythos 5 model was launched and subsequently disabled worldwide following a US government security directive.
Claude Opus 4.8
Builds on Opus 4.7 with stronger agentic reasoning, adaptive thinking, dynamic parallel workflows in Claude Code, mid-conversation system messages, and 2.5x fast mode at 3x lower cost.
Qwen3.7 Max
A frontier-discount agent substrate from Alibaba — strong agentic index, 1M context, and prompt-cache economics that change what becomes economical to automate.
DeepSeek V4 Flash
Reasoning-optimized flash variant of DeepSeek V4 — neck-and-neck with Kimi K2.6 on coding benchmarks at significantly lower latency.
GPT-5.5
OpenAI's latest flagship with noticeably stronger reasoning and autonomy than GPT-5.4. Available in standard and Pro variants. 1M context window, $5/$30 per 1M tokens for standard.
Kimi K2.6
1 trillion-parameter vision-language model from Moonshot AI. Highest-ranked open-weights model on Artificial Analysis leaderboard (score 54). Designed for long-horizon agentic coding with plan-write-test-debug loops lasting days.
DeepSeek V4
DeepSeek's latest open-source flagship. Rivals closed frontier models on coding and reasoning benchmarks while maintaining extremely low inference cost. MoE architecture.
Mistral Small 4
Efficient small model with configurable reasoning — set reasoning_effort from none to high on the fly. 5x more params than Small 3 but only 6B active per token, 40% faster end-to-end.
Mercury 2
Fastest model on the Artificial Analysis leaderboard by output speed. Diffusion-based LLM architecture enabling massively parallel token generation — entirely different inference paradigm from autoregressive models.
Gemini 3.1 Pro
Google's most advanced reasoning model — doubles ARC-AGI-2 performance vs Gemini 3 Pro. Handles text, audio, images, video, and entire code repositories in a 1M context window.
Grok 4.20
Major 4.x update from xAI with 2M token context, lowest hallucination rate in the Grok line, 4-agent system, and 60% lower pricing than Grok 4.
Claude Sonnet 4.6
Most capable Sonnet yet — approaches Opus-level performance for coding, computer use, and document work at a mid-tier price point. 1M context window in beta.
GLM-5.1
Open-weights model from Z.ai (Zhipu AI) with strong web-dev coding performance. Ranks above Kimi K2.6 on Code Arena WebDev leaderboard (1,534 Elo). Competitive on multilingual tasks.
Command A+
Cohere's enterprise-grade RAG and tool-use model. Lowest latency among major models per Artificial Analysis benchmarks. Optimized for enterprise document retrieval and structured outputs.
GPT-5.4
Previous OpenAI flagship, widely regarded as the best all-rounder with the largest developer ecosystem. Strong performance across coding, reasoning, and multimodal tasks.
Mistral Large 3
Mistral's most capable model — 41B active / 675B total sparse MoE. Achieves parity with top open-weight models with multimodal image understanding and best-in-class multilingual performance.
Imagen 4
Google's latest text-to-image model with photorealistic output, precise text rendering, and advanced style control. Powers Gemini's image generation and is available via Vertex AI.
Stable Diffusion 3.5 Large
Stability AI's best open-weights image generation model. 8B parameter MMDiT-X architecture with superior prompt adherence, diverse outputs, and quality rivaling Midjourney and DALL-E 3.
Sora 2
OpenAI's second-generation video + audio generation model. Physically accurate, cinematic-quality videos with synchronized dialogue and sound effects. Supports Cameo (insert yourself into scenes).
Qwen3-Coder-480B
Alibaba's most advanced agentic coding model — 480B total / 35B active MoE. Excels at full software dev pipelines, codebase debugging, and browser interaction. Context extendable to 1M.
Grok 4
xAI's flagship model with 100x training improvement over Grok 3. Leads ARC-AGI benchmarks, includes native tool use, real-time X search, and multi-agent coordination.
Gemini 2.5 Flash
Google's best-in-class efficiency model for high-volume tasks. Extremely fast with strong reasoning for its size, available via the Gemini API at low cost.
Qwen3-235B-A22B
Alibaba's flagship open-source hybrid reasoning MoE model. 235B total / 22B active params. Seamlessly switches between thinking and non-thinking modes. Trained on 36T tokens, supports 119 languages.
Llama 4 Behemoth
Meta's teacher model (still training) — 288B active params, 16 experts, ~2T total params. Outperforms GPT-4.5 and Claude Sonnet 3.7 on STEM benchmarks. Used to distill Scout and Maverick.
Llama 4 Scout
Meta's efficient open-weight multimodal model with an industry-leading 10M token context. 17B active params with 16 experts — fits on a single H100 GPU.
Llama 4 Maverick
Meta's open-weight MoE model with 17B active / 128 experts. Best multimodal in its class — beats GPT-4o and Gemini 2.0 Flash. ELO of 1417 on LMArena.
Model Signal tracks AI model releases across every provider. Each model has a compounding Model Signal report — a synthesized brief that auto-updates as the Signal + Noise wire mentions it. It covers releases through an operator lens: who should care, where each model fits, what changes, and what does not.