Claude Opus 4.6
claude-opus-4-6 Anthropic's flagship model for complex agents and coding. Extended + adaptive thinking, 1M context, 128K output.
99 models across 11 providers · 57 live or preview · timeline →
claude-opus-4-6 Anthropic's flagship model for complex agents and coding. Extended + adaptive thinking, 1M context, 128K output.
claude-sonnet-4-6 Fast balanced model with extended + adaptive thinking. 1M context, 64K output.
claude-haiku-4-5-20251001 Fastest Claude model with near-frontier intelligence. Extended thinking, 200K context, 64K output.
claude-opus-4-5-20251101 Previous-generation flagship. Extended thinking, 200K context.
claude-opus-4-1-20250805 Premium reasoning model with extended thinking. 200K context, 32K output.
claude-opus-4-20250514 First-generation Claude 4 flagship. Extended thinking, 200K context, 32K output.
claude-3-opus-20240229 Original Claude 3 flagship model. 200K context. Still callable but superseded.
claude-sonnet-4-5-20250929 Previous-generation balanced model with extended thinking. 200K context.
claude-sonnet-4-20250514 First-generation Claude 4 balanced model. Extended thinking, 200K context.
claude-3-5-sonnet-20241022 Claude 3.5 generation balanced model. 200K context, 8K output. Still callable.
claude-3-sonnet-20240229 Original Claude 3 balanced model. 200K context.
claude-3-haiku-20240307 Deprecated fast model. Will be retired April 19, 2026. Migrate to Claude Haiku 4.5.
deepseek-reasoner DeepSeek V3.2 thinking mode via deepseek-reasoner endpoint. Up to 64K output with reasoning chains.
deepseek-chat DeepSeek's latest model via deepseek-chat endpoint. V3.2 non-thinking mode. 128K context.
deepseek-reasoner Open-weight reasoning model. Chain-of-thought with distilled variants. Superseded by V3.2 thinking.
deepseek-chat Open-weight MoE model. 671B total, 37B active params. Superseded by V3.2 on API.
deepseek-v2 Previous-generation MoE model. 236B total params. Superseded by V3.
gemini-2.5-pro Previous-gen flagship with thinking budgets. 1M context. Price doubles for context >200K.
gemini-2.5-flash Second-generation Flash. 1M context, excellent speed/cost ratio. Supports thinking budgets.
gemini-2.5-flash-lite Smallest 2.5 model. 1M context, best budget option in the Gemini lineup.
google/gemma-3-12b-it Mid-size open-weight Gemma. 12B parameters, 128K context, vision.
google/gemma-3-27b-it Google's open-weight model. 27B parameters, 128K context, vision support.
google/gemma-3-4b-it Small open-weight Gemma. 4B parameters, 128K context.
google/gemma-3-1b-it Smallest Gemma 3 model. 1B parameters, 32K context. Text only.
gemini-3.1-pro-preview Google's most capable model. 1M context, multimodal. Price doubles for context >200K.
gemini-3-flash-preview Third-generation Flash model. 1M context, fast and affordable.
gemini-3.1-flash-lite-preview Cost-efficient model in the Gemini 3.1 family. 1M context, lowest pricing tier.
gemini-1.5-pro First model with 2M context. Tiered pricing (doubles for >128K). Still available.
gemini-1.5-flash First-gen Flash model. 1M context. Superseded by 2.5 Flash.
gemini-2.0-flash Deprecated. Will be shut down June 1, 2026. Migrate to Gemini 2.5 Flash.
meta-llama/llama-4-scout Smaller MoE Llama 4 model. 109B active params, massive 10M context window.
meta-llama/llama-4-maverick Meta's MoE model with 400B active parameters. 1M context, vision support. Strong performance at low cost.
meta-llama/llama-3.3-70b-instruct Strong 70B dense model. 128K context. Best Llama 3.x text-only model.
meta-llama/llama-3.2-11b-vision-instruct 11B multimodal model. 128K context. Good efficiency for vision tasks.
meta-llama/llama-3.2-90b-vision-instruct 90B multimodal model. 128K context. Best Llama 3.x vision model.
meta-llama/llama-3.1-405b-instruct 405B dense model. Was the largest open-weight model at release. Superseded by Llama 4.
meta-llama/llama-3.1-70b-instruct 70B dense model. Standard workhorse before Llama 3.3.
meta-llama/llama-3.1-8b-instruct 8B dense model. Smallest in the Llama 3.1 family. Good for local inference.
meta-llama/llama-3-70b-instruct Original Llama 3 70B. Only 8K context. Superseded by 3.1 with 128K.
meta-llama/llama-3-8b-instruct Original Llama 3 8B. Only 8K context. Superseded by 3.1 with 128K.
minimax-m2.7 Next-gen LLM designed for autonomous real-world productivity. 205K context, large output window.
minimax-01 Previous-generation MiniMax model. 456B MoE with 1M context. Superseded by M2.7.
codestral-2501 Mistral's coding specialist. 256K context, strong code generation.
labs-devstral-small-2512 Mistral's latest coding-focused model. Agentic coding capabilities.
devstral-medium-2507 Medium-size coding model. Balanced capability and cost.
mistral-large-2411 Mistral's largest non-reasoning model. 123B parameters, 128K context.
magistral-medium-2507 Mistral's flagship reasoning model with chain-of-thought capabilities. 128K context.
magistral-small-2507 Smaller reasoning model. Open-source. 128K context.
pixtral-large-2411 Vision-capable variant of Mistral Large. 128K context, image understanding.
mistral-small-2503 Open-weight small model. 24B parameters, 128K context, vision support.
pixtral-12b-2409 Open-weight 12B multimodal model. 128K context, image understanding.
mistral-small-2501 First Mistral Small 3 model. 24B params, text-only. Superseded by 3.1 with vision.
open-mixtral-8x7b Pioneering open-weight MoE model. 8 experts, 7B each. 32K context.
open-mistral-7b Mistral's first open-weight model. 7B params, 32K context. Still used for local inference.
kimi-k2-thinking Thinking model based on K2. General agentic and reasoning capabilities, deep reasoning tasks.
kimi-k2.5 Kimi's most versatile model. Native multimodal architecture, vision + text, thinking and non-thinking modes. 256K context.
kimi-k2 MoE model with 1T total / 32B active params. Exceptional coding and agent capabilities. 256K context.
moonshot-v1-128k Original Moonshot model. 128K context. Superseded by K2 series.
gpt-5.3-codex OpenAI's latest coding-focused model. 91.5% LiveCodeBench. 400K context.
gpt-5.4 OpenAI's most capable model for complex reasoning and coding. 1.1M context, multimodal.
gpt-5.4-pro Highest-tier reasoning model with deep thinking. 1.1M context.
o3 OpenAI's reasoning model. Excels at math, science, and complex multi-step problems. 85.3% GPQA.
o4-mini Latest mini reasoning model. 83.2% GPQA, 85.9% coding benchmarks.
gpt-5.4-mini Smaller 5.4 variant for coding, computer use, and subagents. 1.1M context.
gpt-5.3-chat-latest Chat-optimized variant of the GPT-5.3 series.
gpt-5.4-nano Cheapest model in the GPT-5.4 family for high-volume simple tasks.
gpt-5.2 Previous-generation GPT-5 flagship. 400K context.
gpt-5 Original GPT-5 frontier model. 400K context.
gpt-4 Original GPT-4 model. 8K context. Still available but expensive for its capability.
o3-mini Smaller reasoning model. Good balance of reasoning capability and cost. 79.1% GPQA.
o1 OpenAI's first reasoning model. Still available but superseded by o3.
gpt-4.1-2025-04-14 GPT-4.1 model with 1M context window. Good instruction following.
gpt-4.1-mini-2025-04-14 Smaller GPT-4.1 model with 1M context. Cost-efficient.
gpt-4o-2024-08-06 OpenAI's former flagship. Multimodal (text + image). 128K context.
gpt-4-turbo-2024-04-09 GPT-4 Turbo with 128K context and vision. Superseded by GPT-4o.
gpt-4o-mini-2024-07-18 Smaller GPT-4o variant. Very cost-effective for production use.
qwen/qwen2.5-coder-32b-instruct Open-weight coding specialist. 32B params, 128K context.
qwen3-max Alibaba's flagship Qwen model. 262K context. Supports thinking mode with chain-of-thought. Tiered pricing by context length.
qwen/qwq-32b Open-weight reasoning model. 32B params, chain-of-thought. Budget reasoning option.
qwen3.5-plus Qwen3.5 series balanced model. Text/image/video input. 1M context, faster and cheaper than Qwen3-Max.
qwen3.5-flash Qwen3.5 speed model. 1M context, lowest cost in Qwen lineup.
qwen/qwen3-235b-a22b Open-weight MoE Qwen3. 235B total, 22B active params. 128K context.
qwen/qwen3-32b Open-weight dense Qwen3. 32B params, 128K context.
qwen/qwen2.5-vl-72b-instruct Open-weight multimodal model. 72B params, 128K context, strong vision capabilities.
qwen/qwen2.5-72b-instruct Previous-gen open-weight model. 72B dense. Superseded by Qwen3.
grok-code-fast-1 xAI's coding-focused model. 256K context, optimized for code generation.
grok-4.20 xAI's newest flagship with industry-leading speed and agentic tool calling. 2M context, lowest hallucination rate.
grok-4.20-multi-agent Variant of Grok 4.20 for collaborative agent-based workflows. Multiple agents operate in parallel.
grok-4.1-fast Fast non-reasoning variant. 2M context. Cost leader among frontier-class providers.
grok-4 Previous-generation Grok flagship. 256K context. Superseded by Grok 4.20.
grok-3 First stable Grok 3 release. 131K context. Superseded by Grok 4.
grok-4-fast Previous-generation fast Grok. 2M context. Superseded by Grok 4.1 Fast.
grok-3-mini Smaller Grok 3 variant with reasoning. 131K context. Cost-efficient.
glm-5.1 Zhipu's latest model. Optimized for coding and agent tasks. Available via z.ai and OpenRouter.
glm-5 Zhipu's GLM-5 model. Strong reasoning and coding. Available via z.ai API.
glm-5-turbo Fast variant of GLM-5. Lower cost, higher speed.
glm-4-flash Free/fast GLM model. 128K context. Minimal cost for basic tasks.
glm-4v Vision-capable GLM-4 variant. 128K context, image understanding.
glm-4 Previous-generation GLM flagship. 128K context. Superseded by GLM-5.
No models match — try broadening your filters or clearing search.