Llama 4 Scout
Smaller MoE Llama 4 model. 109B active params, massive 10M context window.
All tracked Meta models — pricing, context windows, lifecycle state, and release history in one place.
Smaller MoE Llama 4 model. 109B active params, massive 10M context window.
Meta's MoE model with 400B active parameters. 1M context, vision support. Strong performance at low cost.
Strong 70B dense model. 128K context. Best Llama 3.x text-only model.
11B multimodal model. 128K context. Good efficiency for vision tasks.
90B multimodal model. 128K context. Best Llama 3.x vision model.
405B dense model. Was the largest open-weight model at release. Superseded by Llama 4.
70B dense model. Standard workhorse before Llama 3.3.
8B dense model. Smallest in the Llama 3.1 family. Good for local inference.
Original Llama 3 70B. Only 8K context. Superseded by 3.1 with 128K.
Original Llama 3 8B. Only 8K context. Superseded by 3.1 with 128K.
Predecessor → successor chains tracked for Meta models.
Meta's MoE model with 400B active parameters. 1M context, vision support. Strong performance at low cost.
Smaller MoE Llama 4 model. 109B active params, massive 10M context window.
Strong 70B dense model. 128K context. Best Llama 3.x text-only model.
11B multimodal model. 128K context. Good efficiency for vision tasks.
90B multimodal model. 128K context. Best Llama 3.x vision model.
405B dense model. Was the largest open-weight model at release. Superseded by Llama 4.
70B dense model. Standard workhorse before Llama 3.3.
8B dense model. Smallest in the Llama 3.1 family. Good for local inference.
Original Llama 3 70B. Only 8K context. Superseded by 3.1 with 128K.
Original Llama 3 8B. Only 8K context. Superseded by 3.1 with 128K.