Summary

  • Artificial intelligence (AI) companies including OpenAI, Google DeepMind and MiniMax are racing to expand AI models’ context length.
  • This allows models to process and retain more information, offering the potential for deeper comprehension and more seamless interactions.
  • However, there are challenges, with early adopters like JPMorgan Chase observing that models struggle with approximately 75% of context, with performance collapsing to near-zero beyond 32K tokens.
  • There are also economic trade-offs versus retrieval-augmented generation (RAG), with large prompts requiring more powerful GPUs, but RAG having to conduct multiple retrieval steps, making it more cost-effective.
  • There may come a point where the benefits of a larger context window diminish due to rising computational costs, slower inference and poorer usability.
  • The optimal approach will depend on the use case, with large context windows for analysis and RAG for dynamic queries.

By Advitya Gemawat, Microsoft

Original Article