Bigger isn’t always better: Examining the business case for multi-million token LLMs
1 min read
Summary
Artificial intelligence (AI) companies including OpenAI, Google DeepMind and MiniMax are racing to expand AI models’ context length.
This allows models to process and retain more information, offering the potential for deeper comprehension and more seamless interactions.
However, there are challenges, with early adopters like JPMorgan Chase observing that models struggle with approximately 75% of context, with performance collapsing to near-zero beyond 32K tokens.
There are also economic trade-offs versus retrieval-augmented generation (RAG), with large prompts requiring more powerful GPUs, but RAG having to conduct multiple retrieval steps, making it more cost-effective.
There may come a point where the benefits of a larger context window diminish due to rising computational costs, slower inference and poorer usability.
The optimal approach will depend on the use case, with large context windows for analysis and RAG for dynamic queries.