Gemini 2.0 Flash scheduled for retirement
Gemini 2.0 Flash is marked deprecated. Suggested successor: gemini-2-5-flash.
gemini-2.0-flash Context: 1M Output: 8K gemini-2-5-flash Retires: Jun 1, 2026 Reverse chronological model events generated from release dates, deprecation dates, and lineage links in the model dataset.
Release events come from released. Retirement events come from deprecated_date. Lineage is shown when the model data includes a successor or predecessor.
Gemini 2.0 Flash is marked deprecated. Suggested successor: gemini-2-5-flash.
gemini-2.0-flash Context: 1M Output: 8K gemini-2-5-flash Retires: Jun 1, 2026 Claude 3 Haiku is marked deprecated. Suggested successor: claude-haiku-4-5.
claude-3-haiku-20240307 Context: 200K Output: 4K claude-haiku-4-5 Retires: Apr 19, 2026 Anthropic's flagship model for complex agents and coding. Extended + adaptive thinking, 1M context, 128K output.
claude-opus-4-6 Context: 1M Output: 128K claude-opus-4-5 Fast balanced model with extended + adaptive thinking. 1M context, 64K output.
claude-sonnet-4-6 Context: 1M Output: 64K claude-sonnet-4-5 Next-gen LLM designed for autonomous real-world productivity. 205K context, large output window.
minimax-m2.7 Context: 205K Output: 65K minimax-01 Cost-efficient model in the Gemini 3.1 family. 1M context, lowest pricing tier.
gemini-3.1-flash-lite-preview Context: 1M Output: 64K Google's most capable model. 1M context, multimodal. Price doubles for context >200K.
gemini-3.1-pro-preview Context: 1M Output: 64K gemini-3-pro Variant of Grok 4.20 for collaborative agent-based workflows. Multiple agents operate in parallel.
grok-4.20-multi-agent Context: 2M Output: 128K xAI's newest flagship with industry-leading speed and agentic tool calling. 2M context, lowest hallucination rate.
grok-4.20 Context: 2M Output: 128K grok-4 Smaller 5.4 variant for coding, computer use, and subagents. 1.1M context.
gpt-5.4-mini Context: 1.1M Output: 64K Cheapest model in the GPT-5.4 family for high-volume simple tasks.
gpt-5.4-nano Context: 1.1M Output: 64K Highest-tier reasoning model with deep thinking. 1.1M context.
gpt-5.4-pro Context: 1.1M Output: 128K gpt-5-2-pro OpenAI's most capable model for complex reasoning and coding. 1.1M context, multimodal.
gpt-5.4 Context: 1.1M Output: 128K gpt-5-3 DeepSeek's latest model via deepseek-chat endpoint. V3.2 non-thinking mode. 128K context.
deepseek-chat Context: 128K Output: 8K deepseek-v3 DeepSeek V3.2 thinking mode via deepseek-reasoner endpoint. Up to 64K output with reasoning chains.
deepseek-reasoner Context: 128K Output: 64K deepseek-r1 Chat-optimized variant of the GPT-5.3 series.
gpt-5.3-chat-latest Context: 128K Output: 32K Kimi's most versatile model. Native multimodal architecture, vision + text, thinking and non-thinking modes. 256K context.
kimi-k2.5 Context: 256K Output: 32K kimi-k2 Qwen3.5 speed model. 1M context, lowest cost in Qwen lineup.
qwen3.5-flash Context: 1M Output: 32K Qwen3.5 series balanced model. Text/image/video input. 1M context, faster and cheaper than Qwen3-Max.
qwen3.5-plus Context: 1M Output: 32K Zhipu's latest model. Optimized for coding and agent tasks. Available via z.ai and OpenRouter.
glm-5.1 Context: 128K Output: 16K glm-5 OpenAI's latest coding-focused model. 91.5% LiveCodeBench. 400K context.
gpt-5.3-codex Context: 400K Output: 64K gpt-5-2-codex Mistral's latest coding-focused model. Agentic coding capabilities.
labs-devstral-small-2512 Context: 128K Output: 8K Third-generation Flash model. 1M context, fast and affordable.
gemini-3-flash-preview Context: 1M Output: 64K Fast variant of GLM-5. Lower cost, higher speed.
glm-5-turbo Context: 128K Output: 8K Previous-generation GPT-5 flagship. 400K context.
gpt-5.2 Context: 400K Output: 64K gpt-5-1 Successor: gpt-5-3-codex Previous-generation flagship. Extended thinking, 200K context.
claude-opus-4-5-20251101 Context: 200K Output: 64K claude-opus-4-1 Successor: claude-opus-4-6 Zhipu's GLM-5 model. Strong reasoning and coding. Available via z.ai API.
glm-5 Context: 128K Output: 16K glm-4 Successor: glm-5-1 Fast non-reasoning variant. 2M context. Cost leader among frontier-class providers.
grok-4.1-fast Context: 2M Output: 32K grok-4-fast Fastest Claude model with near-frontier intelligence. Extended thinking, 200K context, 64K output.
claude-haiku-4-5-20251001 Context: 200K Output: 64K claude-3-haiku Previous-generation balanced model with extended thinking. 200K context.
claude-sonnet-4-5-20250929 Context: 200K Output: 64K claude-sonnet-4-0 Successor: claude-sonnet-4-6 Alibaba's flagship Qwen model. 262K context. Supports thinking mode with chain-of-thought. Tiered pricing by context length.
qwen3-max Context: 262K Output: 32K Smallest 2.5 model. 1M context, best budget option in the Gemini lineup.
gemini-2.5-flash-lite Context: 1M Output: 64K Previous-generation fast Grok. 2M context. Superseded by Grok 4.1 Fast.
grok-4-fast Context: 2M Output: 32K grok-4-1-fast Successor: grok-4-1-fast Premium reasoning model with extended thinking. 200K context, 32K output.
claude-opus-4-1-20250805 Context: 200K Output: 32K claude-opus-4-0 Successor: claude-opus-4-5 Original GPT-5 frontier model. 400K context.
gpt-5 Context: 400K Output: 32K gpt-4-1 Successor: gpt-5-1 xAI's coding-focused model. 256K context, optimized for code generation.
grok-code-fast-1 Context: 256K Output: 16K MoE model with 1T total / 32B active params. Exceptional coding and agent capabilities. 256K context.
kimi-k2 Context: 256K Output: 16K moonshot-v1-128k Successor: kimi-k2-5 Thinking model based on K2. General agentic and reasoning capabilities, deep reasoning tasks.
kimi-k2-thinking Context: 256K Output: 32K Mistral's flagship reasoning model with chain-of-thought capabilities. 128K context.
magistral-medium-2507 Context: 128K Output: 32K Smaller reasoning model. Open-source. 128K context.
magistral-small-2507 Context: 128K Output: 32K Medium-size coding model. Balanced capability and cost.
devstral-medium-2507 Context: 128K Output: 8K Previous-generation Grok flagship. 256K context. Superseded by Grok 4.20.
grok-4 Context: 256K Output: 32K grok-3 Successor: grok-4-20 Smaller Grok 3 variant with reasoning. 131K context. Cost-efficient.
grok-3-mini Context: 131K Output: 32K grok-4-fast First-generation Claude 4 flagship. Extended thinking, 200K context, 32K output.
claude-opus-4-20250514 Context: 200K Output: 32K claude-3-opus Successor: claude-opus-4-1 First-generation Claude 4 balanced model. Extended thinking, 200K context.
claude-sonnet-4-20250514 Context: 200K Output: 64K claude-3-5-sonnet Successor: claude-sonnet-4-5 Second-generation Flash. 1M context, excellent speed/cost ratio. Supports thinking budgets.
gemini-2.5-flash Context: 1M Output: 64K gemini-2-0-flash Open-weight MoE Qwen3. 235B total, 22B active params. 128K context.
qwen/qwen3-235b-a22b Context: 128K Output: 8K Open-weight dense Qwen3. 32B params, 128K context.
qwen/qwen3-32b Context: 128K Output: 8K OpenAI's reasoning model. Excels at math, science, and complex multi-step problems. 85.3% GPQA.
o3 Context: 200K Output: 100K o1 Latest mini reasoning model. 83.2% GPQA, 85.9% coding benchmarks.
o4-mini Context: 200K Output: 100K o3-mini Smaller GPT-4.1 model with 1M context. Cost-efficient.
gpt-4.1-mini-2025-04-14 Context: 1M Output: 32K GPT-4.1 model with 1M context window. Good instruction following.
gpt-4.1-2025-04-14 Context: 1M Output: 32K gpt-4o Successor: gpt-5 Meta's MoE model with 400B active parameters. 1M context, vision support. Strong performance at low cost.
meta-llama/llama-4-maverick Context: 1M Output: 64K llama-3-3-70b Smaller MoE Llama 4 model. 109B active params, massive 10M context window.
meta-llama/llama-4-scout Context: 10M Output: 64K First stable Grok 3 release. 131K context. Superseded by Grok 4.
grok-3 Context: 131K Output: 32K grok-4 Previous-gen flagship with thinking budgets. 1M context. Price doubles for context >200K.
gemini-2.5-pro Context: 1M Output: 64K gemini-1-5-pro Successor: gemini-3-1-pro Mid-size open-weight Gemma. 12B parameters, 128K context, vision.
google/gemma-3-12b-it Context: 128K Output: 8K Smallest Gemma 3 model. 1B parameters, 32K context. Text only.
google/gemma-3-1b-it Context: 32K Output: 8K Google's open-weight model. 27B parameters, 128K context, vision support.
google/gemma-3-27b-it Context: 128K Output: 8K Small open-weight Gemma. 4B parameters, 128K context.
google/gemma-3-4b-it Context: 128K Output: 8K Open-weight reasoning model. 32B params, chain-of-thought. Budget reasoning option.
qwen/qwq-32b Context: 128K Output: 16K Open-weight small model. 24B parameters, 128K context, vision support.
mistral-small-2503 Context: 128K Output: 8K Smaller reasoning model. Good balance of reasoning capability and cost. 79.1% GPQA.
o3-mini Context: 200K Output: 100K o4-mini Open-weight multimodal model. 72B params, 128K context, strong vision capabilities.
qwen/qwen2.5-vl-72b-instruct Context: 128K Output: 8K Open-weight reasoning model. Chain-of-thought with distilled variants. Superseded by V3.2 thinking.
deepseek-reasoner Context: 128K Output: 32K deepseek-v3-2-thinking Previous-generation MiniMax model. 456B MoE with 1M context. Superseded by M2.7.
minimax-01 Context: 1M Output: 8K minimax-m2-7 Mistral's coding specialist. 256K context, strong code generation.
codestral-2501 Context: 256K Output: 8K First Mistral Small 3 model. 24B params, text-only. Superseded by 3.1 with vision.
mistral-small-2501 Context: 128K Output: 8K mistral-small-3-1 Open-weight MoE model. 671B total, 37B active params. Superseded by V3.2 on API.
deepseek-chat Context: 128K Output: 8K deepseek-v2 Successor: deepseek-v3-2 Deprecated. Will be shut down June 1, 2026. Migrate to Gemini 2.5 Flash.
gemini-2.0-flash Context: 1M Output: 8K gemini-2-5-flash Retires: Jun 1, 2026 Strong 70B dense model. 128K context. Best Llama 3.x text-only model.
meta-llama/llama-3.3-70b-instruct Context: 128K Output: 8K llama-3-1-70b Successor: llama-4-maverick OpenAI's first reasoning model. Still available but superseded by o3.
o1 Context: 200K Output: 100K o3 Mistral's largest non-reasoning model. 123B parameters, 128K context.
mistral-large-2411 Context: 128K Output: 8K Vision-capable variant of Mistral Large. 128K context, image understanding.
pixtral-large-2411 Context: 128K Output: 8K pixtral-12b Open-weight coding specialist. 32B params, 128K context.
qwen/qwen2.5-coder-32b-instruct Context: 128K Output: 8K Claude 3.5 generation balanced model. 200K context, 8K output. Still callable.
claude-3-5-sonnet-20241022 Context: 200K Output: 8K claude-3-sonnet Successor: claude-sonnet-4-0 11B multimodal model. 128K context. Good efficiency for vision tasks.
meta-llama/llama-3.2-11b-vision-instruct Context: 128K Output: 8K 90B multimodal model. 128K context. Best Llama 3.x vision model.
meta-llama/llama-3.2-90b-vision-instruct Context: 128K Output: 8K llama-4-maverick Previous-gen open-weight model. 72B dense. Superseded by Qwen3.
qwen/qwen2.5-72b-instruct Context: 128K Output: 8K qwen3-235b Open-weight 12B multimodal model. 128K context, image understanding.
pixtral-12b-2409 Context: 128K Output: 8K pixtral-large Free/fast GLM model. 128K context. Minimal cost for basic tasks.
glm-4-flash Context: 128K Output: 4K 405B dense model. Was the largest open-weight model at release. Superseded by Llama 4.
meta-llama/llama-3.1-405b-instruct Context: 128K Output: 8K llama-4-maverick 70B dense model. Standard workhorse before Llama 3.3.
meta-llama/llama-3.1-70b-instruct Context: 128K Output: 8K llama-3-70b Successor: llama-3-3-70b 8B dense model. Smallest in the Llama 3.1 family. Good for local inference.
meta-llama/llama-3.1-8b-instruct Context: 128K Output: 8K llama-3-8b Smaller GPT-4o variant. Very cost-effective for production use.
gpt-4o-mini-2024-07-18 Context: 128K Output: 16K gpt-4-1-mini Previous-generation GLM flagship. 128K context. Superseded by GLM-5.
glm-4 Context: 128K Output: 8K glm-5 Vision-capable GLM-4 variant. 128K context, image understanding.
glm-4v Context: 128K Output: 8K glm-5 First-gen Flash model. 1M context. Superseded by 2.5 Flash.
gemini-1.5-flash Context: 1M Output: 8K gemini-2-5-flash OpenAI's former flagship. Multimodal (text + image). 128K context.
gpt-4o-2024-08-06 Context: 128K Output: 16K gpt-4-turbo Successor: gpt-4-1 Previous-generation MoE model. 236B total params. Superseded by V3.
deepseek-v2 Context: 128K Output: 8K deepseek-v3 Original Llama 3 70B. Only 8K context. Superseded by 3.1 with 128K.
meta-llama/llama-3-70b-instruct Context: 8K Output: 4K llama-3-1-70b Original Llama 3 8B. Only 8K context. Superseded by 3.1 with 128K.
meta-llama/llama-3-8b-instruct Context: 8K Output: 4K llama-3-1-8b GPT-4 Turbo with 128K context and vision. Superseded by GPT-4o.
gpt-4-turbo-2024-04-09 Context: 128K Output: 4K gpt-4 Successor: gpt-4o Deprecated fast model. Will be retired April 19, 2026. Migrate to Claude Haiku 4.5.
claude-3-haiku-20240307 Context: 200K Output: 4K claude-haiku-4-5 Retires: Apr 19, 2026 Original Moonshot model. 128K context. Superseded by K2 series.
moonshot-v1-128k Context: 128K Output: 8K kimi-k2 Original Claude 3 flagship model. 200K context. Still callable but superseded.
claude-3-opus-20240229 Context: 200K Output: 4K claude-opus-4-0 Original Claude 3 balanced model. 200K context.
claude-3-sonnet-20240229 Context: 200K Output: 4K claude-3-5-sonnet First model with 2M context. Tiered pricing (doubles for >128K). Still available.
gemini-1.5-pro Context: 2M Output: 8K gemini-2-5-pro Pioneering open-weight MoE model. 8 experts, 7B each. 32K context.
open-mixtral-8x7b Context: 32K Output: 8K Mistral's first open-weight model. 7B params, 32K context. Still used for local inference.
open-mistral-7b Context: 32K Output: 8K Original GPT-4 model. 8K context. Still available but expensive for its capability.
gpt-4 Context: 8K Output: 4K gpt-4-turbo Clear provider or event-type filters, or broaden the search text.