Summary

  • In April 2025, Meta launched Llama 4, the latest in its series of AI models, which have been improved over their predecessors.
  • One key new feature is the Mixture of Experts (MoE) architecture, which only activates a fraction of the model’s parameters for each token, making it more computationally efficient.
  • Another improvement is native multimodal processing capabilities, meaning the models can understand text and images simultaneously.
  • The new series also has an industry-leading context window, with some models supporting up to 10 million tokens, allowing for input of over five million words.
  • Together, these updates position Llama 4 as a versatile and high-performance AI model that rivals or surpasses leading models in reasoning, coding, and other tasks.
  • It is rumoured that each query to ChatGPT uses multiple Nvidia GPUs, which creates overhead. In contrast, Llama 4 models can run on a single Nvidia H100 GPU.

By Alvin Wanjala

Original Article