Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
1 min read
Summary
Meta has partnered with Cerebras Systems to create its new Llama API, which offers developers inference speeds of up to 18 times faster than GPU-based solutions.
The Llama models can achieve over 2,600 tokens per second, compared to roughly 130 for ChatGPT and 25 for DeepSeek, according to Artificial Analysis benchmarks.
The API offers tools for fine-tuning and evaluating models, along with the Llama 3.3 8B model, for generating data, training on it, and testing the quality of custom models.
The collaboration marks Meta’s entry into the AI computation market and the creation of a new revenue stream from its AI investments.
Developers can currently access the Llama API in a limited preview, but Meta expects to expand it in the future.