AI can fix bugs—but can’t find them: OpenAI’s study highlights limits of LLMs in software engineering

A new paper by OpenAI has found that language models are unable to earn money as a freelance software engineer as yet, despite being able to solve bugs.
The research team created a new LLM benchmark called SWE-Lancer to test how well foundation models could perform on real-life tasks posted to freelance platforms.
They gave three LLMs the chance to complete 1,488 tasks that included fixing bugs, implementing features and acting in a managerial role, with $1m in payouts on offer.
However, the best performer only achieved 26.2% of the total achievable reward, and the majority of solutions it produced were incorrect.
While the models performed well at pinpointing the location of an issue, they failed to identify its root cause, thus resulting in imperfect solutions to the problems posed.

Fast Feed