changelogs.info
OpenClaw Claude Code Codex Gemini Kilo Code Hermes Models Dispatches
Model changelog

Release and retirement timeline

Reverse chronological model events generated from release dates, deprecation dates, and lineage links in the model dataset.

101 total events
11 providers
99 release events
2 retirement events
// filters

Narrow the event stream

Release events come from released. Retirement events come from deprecated_date. Lineage is shown when the model data includes a successor or predecessor.

Event type
Provider
// timeline

Reverse chronological events

101 events shown

June 2026

1 events
Google Jun 1, 2026

Gemini 2.0 Flash scheduled for retirement

deprecation deprecated balanced

Gemini 2.0 Flash is marked deprecated. Suggested successor: gemini-2-5-flash.

Model: Gemini 2.0 Flash API: gemini-2.0-flash Context: 1M Output: 8K
Successor: gemini-2-5-flash Retires: Jun 1, 2026

April 2026

1 events
Anthropic Apr 19, 2026

Claude 3 Haiku scheduled for retirement

deprecation deprecated fast

Claude 3 Haiku is marked deprecated. Suggested successor: claude-haiku-4-5.

Model: Claude 3 Haiku API: claude-3-haiku-20240307 Context: 200K Output: 4K
Successor: claude-haiku-4-5 Retires: Apr 19, 2026

March 2026

17 events
Anthropic Mar 24, 2026

Claude Opus 4.6 released

release active frontier

Anthropic's flagship model for complex agents and coding. Extended + adaptive thinking, 1M context, 128K output.

Model: Claude Opus 4.6 API: claude-opus-4-6 Context: 1M Output: 128K
Predecessor: claude-opus-4-5
Anthropic Mar 24, 2026

Claude Sonnet 4.6 released

release active balanced

Fast balanced model with extended + adaptive thinking. 1M context, 64K output.

Model: Claude Sonnet 4.6 API: claude-sonnet-4-6 Context: 1M Output: 64K
Predecessor: claude-sonnet-4-5
Minimax Mar 18, 2026

MiniMax M2.7 released

release active balanced

Next-gen LLM designed for autonomous real-world productivity. 205K context, large output window.

Model: MiniMax M2.7 API: minimax-m2.7 Context: 205K Output: 65K
Predecessor: minimax-01
Google Mar 9, 2026

Gemini 3.1 Flash-Lite released

release preview fast

Cost-efficient model in the Gemini 3.1 family. 1M context, lowest pricing tier.

Model: Gemini 3.1 Flash-Lite API: gemini-3.1-flash-lite-preview Context: 1M Output: 64K
Google Mar 9, 2026

Gemini 3.1 Pro released

release preview frontier

Google's most capable model. 1M context, multimodal. Price doubles for context >200K.

Model: Gemini 3.1 Pro API: gemini-3.1-pro-preview Context: 1M Output: 64K
Predecessor: gemini-3-pro
xAI Mar 9, 2026

Grok 4.20 Multi-Agent released

release active frontier

Variant of Grok 4.20 for collaborative agent-based workflows. Multiple agents operate in parallel.

Model: Grok 4.20 Multi-Agent API: grok-4.20-multi-agent Context: 2M Output: 128K
xAI Mar 9, 2026

Grok 4.20 released

release active frontier

xAI's newest flagship with industry-leading speed and agentic tool calling. 2M context, lowest hallucination rate.

Model: Grok 4.20 API: grok-4.20 Context: 2M Output: 128K
Predecessor: grok-4
OpenAI Mar 5, 2026

GPT-5.4 Mini released

release active balanced

Smaller 5.4 variant for coding, computer use, and subagents. 1.1M context.

Model: GPT-5.4 Mini API: gpt-5.4-mini Context: 1.1M Output: 64K
OpenAI Mar 5, 2026

GPT-5.4 Nano released

release active fast

Cheapest model in the GPT-5.4 family for high-volume simple tasks.

Model: GPT-5.4 Nano API: gpt-5.4-nano Context: 1.1M Output: 64K
OpenAI Mar 5, 2026

GPT-5.4 Pro released

release active reasoning

Highest-tier reasoning model with deep thinking. 1.1M context.

Model: GPT-5.4 Pro API: gpt-5.4-pro Context: 1.1M Output: 128K
Predecessor: gpt-5-2-pro
OpenAI Mar 5, 2026

GPT-5.4 released

release active frontier

OpenAI's most capable model for complex reasoning and coding. 1.1M context, multimodal.

Model: GPT-5.4 API: gpt-5.4 Context: 1.1M Output: 128K
Predecessor: gpt-5-3
DeepSeek Mar 1, 2026

DeepSeek V3.2 released

release active balanced

DeepSeek's latest model via deepseek-chat endpoint. V3.2 non-thinking mode. 128K context.

Model: DeepSeek V3.2 API: deepseek-chat Context: 128K Output: 8K
Predecessor: deepseek-v3
DeepSeek Mar 1, 2026

DeepSeek V3.2 Thinking released

release active reasoning

DeepSeek V3.2 thinking mode via deepseek-reasoner endpoint. Up to 64K output with reasoning chains.

Model: DeepSeek V3.2 Thinking API: deepseek-reasoner Context: 128K Output: 64K
Predecessor: deepseek-r1
OpenAI Mar 1, 2026

GPT-5.3 Chat released

release active balanced

Chat-optimized variant of the GPT-5.3 series.

Model: GPT-5.3 Chat API: gpt-5.3-chat-latest Context: 128K Output: 32K
Moonshot AI Mar 1, 2026

Kimi K2.5 released

release active multimodal

Kimi's most versatile model. Native multimodal architecture, vision + text, thinking and non-thinking modes. 256K context.

Model: Kimi K2.5 API: kimi-k2.5 Context: 256K Output: 32K
Predecessor: kimi-k2
Qwen Mar 1, 2026

Qwen3.5 Flash released

release active fast

Qwen3.5 speed model. 1M context, lowest cost in Qwen lineup.

Model: Qwen3.5 Flash API: qwen3.5-flash Context: 1M Output: 32K
Qwen Mar 1, 2026

Qwen3.5 Plus released

release active balanced

Qwen3.5 series balanced model. Text/image/video input. 1M context, faster and cheaper than Qwen3-Max.

Model: Qwen3.5 Plus API: qwen3.5-plus Context: 1M Output: 32K

February 2026

2 events
ZAI Feb 1, 2026

GLM-5.1 released

release active frontier

Zhipu's latest model. Optimized for coding and agent tasks. Available via z.ai and OpenRouter.

Model: GLM-5.1 API: glm-5.1 Context: 128K Output: 16K
Predecessor: glm-5
OpenAI Feb 1, 2026

GPT-5.3 Codex released

release active coding

OpenAI's latest coding-focused model. 91.5% LiveCodeBench. 400K context.

Model: GPT-5.3 Codex API: gpt-5.3-codex Context: 400K Output: 64K
Predecessor: gpt-5-2-codex

December 2025

4 events
Mistral Dec 1, 2025

Devstral Small 2 released

release active coding

Mistral's latest coding-focused model. Agentic coding capabilities.

Model: Devstral Small 2 API: labs-devstral-small-2512 Context: 128K Output: 8K
Google Dec 1, 2025

Gemini 3 Flash released

release preview balanced

Third-generation Flash model. 1M context, fast and affordable.

Model: Gemini 3 Flash API: gemini-3-flash-preview Context: 1M Output: 64K
ZAI Dec 1, 2025

GLM-5 Turbo released

release active fast

Fast variant of GLM-5. Lower cost, higher speed.

Model: GLM-5 Turbo API: glm-5-turbo Context: 128K Output: 8K
OpenAI Dec 1, 2025

GPT-5.2 released

release legacy frontier

Previous-generation GPT-5 flagship. 400K context.

Model: GPT-5.2 API: gpt-5.2 Context: 400K Output: 64K
Predecessor: gpt-5-1 Successor: gpt-5-3-codex

November 2025

3 events
Anthropic Nov 1, 2025

Claude Opus 4.5 released

release legacy frontier

Previous-generation flagship. Extended thinking, 200K context.

Model: Claude Opus 4.5 API: claude-opus-4-5-20251101 Context: 200K Output: 64K
Predecessor: claude-opus-4-1 Successor: claude-opus-4-6
ZAI Nov 1, 2025

GLM-5 released

release active frontier

Zhipu's GLM-5 model. Strong reasoning and coding. Available via z.ai API.

Model: GLM-5 API: glm-5 Context: 128K Output: 16K
Predecessor: glm-4 Successor: glm-5-1
xAI Nov 1, 2025

Grok 4.1 Fast released

release active fast

Fast non-reasoning variant. 2M context. Cost leader among frontier-class providers.

Model: Grok 4.1 Fast API: grok-4.1-fast Context: 2M Output: 32K
Predecessor: grok-4-fast

October 2025

1 events
Anthropic Oct 1, 2025

Claude Haiku 4.5 released

release active fast

Fastest Claude model with near-frontier intelligence. Extended thinking, 200K context, 64K output.

Model: Claude Haiku 4.5 API: claude-haiku-4-5-20251001 Context: 200K Output: 64K
Predecessor: claude-3-haiku

September 2025

4 events
Anthropic Sep 29, 2025

Claude Sonnet 4.5 released

release legacy balanced

Previous-generation balanced model with extended thinking. 200K context.

Model: Claude Sonnet 4.5 API: claude-sonnet-4-5-20250929 Context: 200K Output: 64K
Predecessor: claude-sonnet-4-0 Successor: claude-sonnet-4-6
Qwen Sep 23, 2025

Qwen3 Max released

release active frontier

Alibaba's flagship Qwen model. 262K context. Supports thinking mode with chain-of-thought. Tiered pricing by context length.

Model: Qwen3 Max API: qwen3-max Context: 262K Output: 32K
Google Sep 1, 2025

Gemini 2.5 Flash-Lite released

release active fast

Smallest 2.5 model. 1M context, best budget option in the Gemini lineup.

Model: Gemini 2.5 Flash-Lite API: gemini-2.5-flash-lite Context: 1M Output: 64K
xAI Sep 1, 2025

Grok 4 Fast released

release legacy fast

Previous-generation fast Grok. 2M context. Superseded by Grok 4.1 Fast.

Model: Grok 4 Fast API: grok-4-fast Context: 2M Output: 32K
Predecessor: grok-4-1-fast Successor: grok-4-1-fast

August 2025

3 events
Anthropic Aug 5, 2025

Claude Opus 4.1 released

release legacy frontier

Premium reasoning model with extended thinking. 200K context, 32K output.

Model: Claude Opus 4.1 API: claude-opus-4-1-20250805 Context: 200K Output: 32K
Predecessor: claude-opus-4-0 Successor: claude-opus-4-5
OpenAI Aug 1, 2025

GPT-5 released

release legacy frontier

Original GPT-5 frontier model. 400K context.

Model: GPT-5 API: gpt-5 Context: 400K Output: 32K
Predecessor: gpt-4-1 Successor: gpt-5-1
xAI Aug 1, 2025

Grok Code Fast 1 released

release active coding

xAI's coding-focused model. 256K context, optimized for code generation.

Model: Grok Code Fast 1 API: grok-code-fast-1 Context: 256K Output: 16K

July 2025

6 events
Moonshot AI Jul 11, 2025

Kimi K2 released

release active balanced

MoE model with 1T total / 32B active params. Exceptional coding and agent capabilities. 256K context.

Model: Kimi K2 API: kimi-k2 Context: 256K Output: 16K
Predecessor: moonshot-v1-128k Successor: kimi-k2-5
Moonshot AI Jul 11, 2025

Kimi K2 Thinking released

release active reasoning

Thinking model based on K2. General agentic and reasoning capabilities, deep reasoning tasks.

Model: Kimi K2 Thinking API: kimi-k2-thinking Context: 256K Output: 32K
Mistral Jul 10, 2025

Magistral Medium 1.1 released

release active reasoning

Mistral's flagship reasoning model with chain-of-thought capabilities. 128K context.

Model: Magistral Medium 1.1 API: magistral-medium-2507 Context: 128K Output: 32K
Mistral Jul 10, 2025

Magistral Small 1.1 released

release active reasoning

Smaller reasoning model. Open-source. 128K context.

Model: Magistral Small 1.1 API: magistral-small-2507 Context: 128K Output: 32K
Mistral Jul 1, 2025

Devstral Medium 1.0 released

release active coding

Medium-size coding model. Balanced capability and cost.

Model: Devstral Medium 1.0 API: devstral-medium-2507 Context: 128K Output: 8K
xAI Jul 1, 2025

Grok 4 released

release legacy frontier

Previous-generation Grok flagship. 256K context. Superseded by Grok 4.20.

Model: Grok 4 API: grok-4 Context: 256K Output: 32K
Predecessor: grok-3 Successor: grok-4-20

June 2025

1 events
xAI Jun 1, 2025

Grok 3 Mini released

release legacy fast

Smaller Grok 3 variant with reasoning. 131K context. Cost-efficient.

Model: Grok 3 Mini API: grok-3-mini Context: 131K Output: 32K
Successor: grok-4-fast

May 2025

3 events
Anthropic May 14, 2025

Claude Opus 4 released

release legacy frontier

First-generation Claude 4 flagship. Extended thinking, 200K context, 32K output.

Model: Claude Opus 4 API: claude-opus-4-20250514 Context: 200K Output: 32K
Predecessor: claude-3-opus Successor: claude-opus-4-1
Anthropic May 14, 2025

Claude Sonnet 4 released

release legacy balanced

First-generation Claude 4 balanced model. Extended thinking, 200K context.

Model: Claude Sonnet 4 API: claude-sonnet-4-20250514 Context: 200K Output: 64K
Predecessor: claude-3-5-sonnet Successor: claude-sonnet-4-5
Google May 1, 2025

Gemini 2.5 Flash released

release active balanced

Second-generation Flash. 1M context, excellent speed/cost ratio. Supports thinking budgets.

Model: Gemini 2.5 Flash API: gemini-2.5-flash Context: 1M Output: 64K
Predecessor: gemini-2-0-flash

April 2025

9 events
Qwen Apr 28, 2025

Qwen3 235B-A22B released

release active open-weight

Open-weight MoE Qwen3. 235B total, 22B active params. 128K context.

Model: Qwen3 235B-A22B API: qwen/qwen3-235b-a22b Context: 128K Output: 8K
Qwen Apr 28, 2025

Qwen3 32B released

release active open-weight

Open-weight dense Qwen3. 32B params, 128K context.

Model: Qwen3 32B API: qwen/qwen3-32b Context: 128K Output: 8K
OpenAI Apr 16, 2025

o3 released

release active reasoning

OpenAI's reasoning model. Excels at math, science, and complex multi-step problems. 85.3% GPQA.

Model: o3 API: o3 Context: 200K Output: 100K
Predecessor: o1
OpenAI Apr 16, 2025

o4 Mini released

release active reasoning

Latest mini reasoning model. 83.2% GPQA, 85.9% coding benchmarks.

Model: o4 Mini API: o4-mini Context: 200K Output: 100K
Predecessor: o3-mini
OpenAI Apr 14, 2025

GPT-4.1 Mini released

release legacy balanced

Smaller GPT-4.1 model with 1M context. Cost-efficient.

Model: GPT-4.1 Mini API: gpt-4.1-mini-2025-04-14 Context: 1M Output: 32K
OpenAI Apr 14, 2025

GPT-4.1 released

release legacy balanced

GPT-4.1 model with 1M context window. Good instruction following.

Model: GPT-4.1 API: gpt-4.1-2025-04-14 Context: 1M Output: 32K
Predecessor: gpt-4o Successor: gpt-5
Meta Apr 5, 2025

Llama 4 Maverick released

release active open-weight

Meta's MoE model with 400B active parameters. 1M context, vision support. Strong performance at low cost.

Model: Llama 4 Maverick API: meta-llama/llama-4-maverick Context: 1M Output: 64K
Predecessor: llama-3-3-70b
Meta Apr 5, 2025

Llama 4 Scout released

release active open-weight

Smaller MoE Llama 4 model. 109B active params, massive 10M context window.

Model: Llama 4 Scout API: meta-llama/llama-4-scout Context: 10M Output: 64K
xAI Apr 1, 2025

Grok 3 released

release legacy frontier

First stable Grok 3 release. 131K context. Superseded by Grok 4.

Model: Grok 3 API: grok-3 Context: 131K Output: 32K
Successor: grok-4

March 2025

7 events
Google Mar 25, 2025

Gemini 2.5 Pro released

release active frontier

Previous-gen flagship with thinking budgets. 1M context. Price doubles for context >200K.

Model: Gemini 2.5 Pro API: gemini-2.5-pro Context: 1M Output: 64K
Predecessor: gemini-1-5-pro Successor: gemini-3-1-pro
Google Mar 12, 2025

Gemma 3 12B released

release active open-weight

Mid-size open-weight Gemma. 12B parameters, 128K context, vision.

Model: Gemma 3 12B API: google/gemma-3-12b-it Context: 128K Output: 8K
Google Mar 12, 2025

Gemma 3 1B released

release active open-weight

Smallest Gemma 3 model. 1B parameters, 32K context. Text only.

Model: Gemma 3 1B API: google/gemma-3-1b-it Context: 32K Output: 8K
Google Mar 12, 2025

Gemma 3 27B released

release active open-weight

Google's open-weight model. 27B parameters, 128K context, vision support.

Model: Gemma 3 27B API: google/gemma-3-27b-it Context: 128K Output: 8K
Google Mar 12, 2025

Gemma 3 4B released

release active open-weight

Small open-weight Gemma. 4B parameters, 128K context.

Model: Gemma 3 4B API: google/gemma-3-4b-it Context: 128K Output: 8K
Qwen Mar 5, 2025

QwQ 32B released

release active reasoning

Open-weight reasoning model. 32B params, chain-of-thought. Budget reasoning option.

Model: QwQ 32B API: qwen/qwq-32b Context: 128K Output: 16K
Mistral Mar 1, 2025

Mistral Small 3.1 released

release active open-weight

Open-weight small model. 24B parameters, 128K context, vision support.

Model: Mistral Small 3.1 API: mistral-small-2503 Context: 128K Output: 8K

January 2025

6 events
OpenAI Jan 31, 2025

o3 Mini released

release legacy reasoning

Smaller reasoning model. Good balance of reasoning capability and cost. 79.1% GPQA.

Model: o3 Mini API: o3-mini Context: 200K Output: 100K
Successor: o4-mini
Qwen Jan 28, 2025

Qwen2.5 VL 72B released

release active open-weight

Open-weight multimodal model. 72B params, 128K context, strong vision capabilities.

Model: Qwen2.5 VL 72B API: qwen/qwen2.5-vl-72b-instruct Context: 128K Output: 8K
DeepSeek Jan 20, 2025

DeepSeek R1 released

release legacy reasoning

Open-weight reasoning model. Chain-of-thought with distilled variants. Superseded by V3.2 thinking.

Model: DeepSeek R1 API: deepseek-reasoner Context: 128K Output: 32K
Successor: deepseek-v3-2-thinking
Minimax Jan 15, 2025

MiniMax-01 released

release legacy balanced

Previous-generation MiniMax model. 456B MoE with 1M context. Superseded by M2.7.

Model: MiniMax-01 API: minimax-01 Context: 1M Output: 8K
Successor: minimax-m2-7
Mistral Jan 1, 2025

Codestral 25.01 released

release active coding

Mistral's coding specialist. 256K context, strong code generation.

Model: Codestral 25.01 API: codestral-2501 Context: 256K Output: 8K
Mistral Jan 1, 2025

Mistral Small 3.0 released

release legacy open-weight

First Mistral Small 3 model. 24B params, text-only. Superseded by 3.1 with vision.

Model: Mistral Small 3.0 API: mistral-small-2501 Context: 128K Output: 8K
Successor: mistral-small-3-1

December 2024

4 events
DeepSeek Dec 26, 2024

DeepSeek V3 released

release legacy open-weight

Open-weight MoE model. 671B total, 37B active params. Superseded by V3.2 on API.

Model: DeepSeek V3 API: deepseek-chat Context: 128K Output: 8K
Predecessor: deepseek-v2 Successor: deepseek-v3-2
Google Dec 11, 2024

Gemini 2.0 Flash released

release deprecated balanced

Deprecated. Will be shut down June 1, 2026. Migrate to Gemini 2.5 Flash.

Model: Gemini 2.0 Flash API: gemini-2.0-flash Context: 1M Output: 8K
Successor: gemini-2-5-flash Retires: Jun 1, 2026
Meta Dec 6, 2024

Llama 3.3 70B released

release active open-weight

Strong 70B dense model. 128K context. Best Llama 3.x text-only model.

Model: Llama 3.3 70B API: meta-llama/llama-3.3-70b-instruct Context: 128K Output: 8K
Predecessor: llama-3-1-70b Successor: llama-4-maverick
OpenAI Dec 5, 2024

o1 released

release legacy reasoning

OpenAI's first reasoning model. Still available but superseded by o3.

Model: o1 API: o1 Context: 200K Output: 100K
Successor: o3

November 2024

3 events
Mistral Nov 1, 2024

Mistral Large 2.1 released

release active frontier

Mistral's largest non-reasoning model. 123B parameters, 128K context.

Model: Mistral Large 2.1 API: mistral-large-2411 Context: 128K Output: 8K
Mistral Nov 1, 2024

Pixtral Large released

release active multimodal

Vision-capable variant of Mistral Large. 128K context, image understanding.

Model: Pixtral Large API: pixtral-large-2411 Context: 128K Output: 8K
Predecessor: pixtral-12b
Qwen Nov 1, 2024

Qwen2.5 Coder 32B released

release active coding

Open-weight coding specialist. 32B params, 128K context.

Model: Qwen2.5 Coder 32B API: qwen/qwen2.5-coder-32b-instruct Context: 128K Output: 8K

October 2024

1 events
Anthropic Oct 22, 2024

Claude 3.5 Sonnet released

release legacy balanced

Claude 3.5 generation balanced model. 200K context, 8K output. Still callable.

Model: Claude 3.5 Sonnet API: claude-3-5-sonnet-20241022 Context: 200K Output: 8K
Predecessor: claude-3-sonnet Successor: claude-sonnet-4-0

September 2024

4 events
Meta Sep 25, 2024

Llama 3.2 11B Vision released

release active open-weight

11B multimodal model. 128K context. Good efficiency for vision tasks.

Model: Llama 3.2 11B Vision API: meta-llama/llama-3.2-11b-vision-instruct Context: 128K Output: 8K
Meta Sep 25, 2024

Llama 3.2 90B Vision released

release active open-weight

90B multimodal model. 128K context. Best Llama 3.x vision model.

Model: Llama 3.2 90B Vision API: meta-llama/llama-3.2-90b-vision-instruct Context: 128K Output: 8K
Successor: llama-4-maverick
Qwen Sep 19, 2024

Qwen2.5 72B released

release legacy open-weight

Previous-gen open-weight model. 72B dense. Superseded by Qwen3.

Model: Qwen2.5 72B API: qwen/qwen2.5-72b-instruct Context: 128K Output: 8K
Successor: qwen3-235b
Mistral Sep 1, 2024

Pixtral 12B released

release active open-weight

Open-weight 12B multimodal model. 128K context, image understanding.

Model: Pixtral 12B API: pixtral-12b-2409 Context: 128K Output: 8K
Successor: pixtral-large

August 2024

1 events
ZAI Aug 1, 2024

GLM-4 Flash released

release active fast

Free/fast GLM model. 128K context. Minimal cost for basic tasks.

Model: GLM-4 Flash API: glm-4-flash Context: 128K Output: 4K

July 2024

4 events
Meta Jul 23, 2024

Llama 3.1 405B released

release legacy open-weight

405B dense model. Was the largest open-weight model at release. Superseded by Llama 4.

Model: Llama 3.1 405B API: meta-llama/llama-3.1-405b-instruct Context: 128K Output: 8K
Successor: llama-4-maverick
Meta Jul 23, 2024

Llama 3.1 70B released

release legacy open-weight

70B dense model. Standard workhorse before Llama 3.3.

Model: Llama 3.1 70B API: meta-llama/llama-3.1-70b-instruct Context: 128K Output: 8K
Predecessor: llama-3-70b Successor: llama-3-3-70b
Meta Jul 23, 2024

Llama 3.1 8B released

release legacy open-weight

8B dense model. Smallest in the Llama 3.1 family. Good for local inference.

Model: Llama 3.1 8B API: meta-llama/llama-3.1-8b-instruct Context: 128K Output: 8K
Predecessor: llama-3-8b
OpenAI Jul 18, 2024

GPT-4o Mini released

release legacy fast

Smaller GPT-4o variant. Very cost-effective for production use.

Model: GPT-4o Mini API: gpt-4o-mini-2024-07-18 Context: 128K Output: 16K
Successor: gpt-4-1-mini

June 2024

2 events
ZAI Jun 1, 2024

GLM-4 released

release legacy balanced

Previous-generation GLM flagship. 128K context. Superseded by GLM-5.

Model: GLM-4 API: glm-4 Context: 128K Output: 8K
Successor: glm-5
ZAI Jun 1, 2024

GLM-4V released

release legacy multimodal

Vision-capable GLM-4 variant. 128K context, image understanding.

Model: GLM-4V API: glm-4v Context: 128K Output: 8K
Successor: glm-5

May 2024

3 events
Google May 14, 2024

Gemini 1.5 Flash released

release legacy balanced

First-gen Flash model. 1M context. Superseded by 2.5 Flash.

Model: Gemini 1.5 Flash API: gemini-1.5-flash Context: 1M Output: 8K
Successor: gemini-2-5-flash
OpenAI May 13, 2024

GPT-4o released

release legacy balanced

OpenAI's former flagship. Multimodal (text + image). 128K context.

Model: GPT-4o API: gpt-4o-2024-08-06 Context: 128K Output: 16K
Predecessor: gpt-4-turbo Successor: gpt-4-1
DeepSeek May 1, 2024

DeepSeek V2 released

release legacy open-weight

Previous-generation MoE model. 236B total params. Superseded by V3.

Model: DeepSeek V2 API: deepseek-v2 Context: 128K Output: 8K
Successor: deepseek-v3

April 2024

3 events
Meta Apr 18, 2024

Llama 3 70B released

release legacy open-weight

Original Llama 3 70B. Only 8K context. Superseded by 3.1 with 128K.

Model: Llama 3 70B API: meta-llama/llama-3-70b-instruct Context: 8K Output: 4K
Successor: llama-3-1-70b
Meta Apr 18, 2024

Llama 3 8B released

release legacy open-weight

Original Llama 3 8B. Only 8K context. Superseded by 3.1 with 128K.

Model: Llama 3 8B API: meta-llama/llama-3-8b-instruct Context: 8K Output: 4K
Successor: llama-3-1-8b
OpenAI Apr 9, 2024

GPT-4 Turbo released

release legacy balanced

GPT-4 Turbo with 128K context and vision. Superseded by GPT-4o.

Model: GPT-4 Turbo API: gpt-4-turbo-2024-04-09 Context: 128K Output: 4K
Predecessor: gpt-4 Successor: gpt-4o

March 2024

2 events
Anthropic Mar 7, 2024

Claude 3 Haiku released

release deprecated fast

Deprecated fast model. Will be retired April 19, 2026. Migrate to Claude Haiku 4.5.

Model: Claude 3 Haiku API: claude-3-haiku-20240307 Context: 200K Output: 4K
Successor: claude-haiku-4-5 Retires: Apr 19, 2026
Moonshot AI Mar 1, 2024

Moonshot v1 128K released

release legacy balanced

Original Moonshot model. 128K context. Superseded by K2 series.

Model: Moonshot v1 128K API: moonshot-v1-128k Context: 128K Output: 8K
Successor: kimi-k2

February 2024

3 events
Anthropic Feb 29, 2024

Claude 3 Opus released

release legacy frontier

Original Claude 3 flagship model. 200K context. Still callable but superseded.

Model: Claude 3 Opus API: claude-3-opus-20240229 Context: 200K Output: 4K
Successor: claude-opus-4-0
Anthropic Feb 29, 2024

Claude 3 Sonnet released

release legacy balanced

Original Claude 3 balanced model. 200K context.

Model: Claude 3 Sonnet API: claude-3-sonnet-20240229 Context: 200K Output: 4K
Successor: claude-3-5-sonnet
Google Feb 15, 2024

Gemini 1.5 Pro released

release legacy frontier

First model with 2M context. Tiered pricing (doubles for >128K). Still available.

Model: Gemini 1.5 Pro API: gemini-1.5-pro Context: 2M Output: 8K
Successor: gemini-2-5-pro

December 2023

1 events
Mistral Dec 11, 2023

Mixtral 8x7B released

release legacy open-weight

Pioneering open-weight MoE model. 8 experts, 7B each. 32K context.

Model: Mixtral 8x7B API: open-mixtral-8x7b Context: 32K Output: 8K

September 2023

1 events
Mistral Sep 27, 2023

Mistral 7B released

release legacy open-weight

Mistral's first open-weight model. 7B params, 32K context. Still used for local inference.

Model: Mistral 7B API: open-mistral-7b Context: 32K Output: 8K

March 2023

1 events
OpenAI Mar 14, 2023

GPT-4 released

release legacy frontier

Original GPT-4 model. 8K context. Still available but expensive for its capability.

Model: GPT-4 API: gpt-4 Context: 8K Output: 4K
Successor: gpt-4-turbo