Summary

  • Researchers from Stanford University and Google DeepMind have created Step-Wise Reinforcement Learning (SWiRL), which they say can improve large language models (LLMs) so they can more effectively handle complex, multi-step tasks.
  • The team behind SWiRL said current methods for training LLMs tend to struggle with complex planning and tool integration, making them unsuitable for real-world applications that often require several steps to complete.
  • By generating synthetic data and using a specialised reinforcement learning algorithm, SWiRL can train LLMs to break complex issues down into a series of more manageable tasks.
  • The team tested SWiRL on question-answering and mathematical reasoning tasks, showing it improved accuracy by 11-21% across a range of different datasets, including HotPotQA, GSM8K, MuSiQue and BeerQA.
  • The researchers said SWiRL could offer benefits to enterprises looking to integrate reasoning models into their applications and workflows.

By Ben Dickson

Original Article