Summary

  • Meta has partnered with Cerebras Systems to create its new Llama API, which offers developers inference speeds of up to 18 times faster than GPU-based solutions.
  • The Llama models can achieve over 2,600 tokens per second, compared to roughly 130 for ChatGPT and 25 for DeepSeek, according to Artificial Analysis benchmarks.
  • The API offers tools for fine-tuning and evaluating models, along with the Llama 3.3 8B model, for generating data, training on it, and testing the quality of custom models.
  • The collaboration marks Meta’s entry into the AI computation market and the creation of a new revenue stream from its AI investments.
  • Developers can currently access the Llama API in a limited preview, but Meta expects to expand it in the future.

By Michael Nuñez

Original Article