Don’t believe reasoning models’ Chains of Thought, says Anthropic
1 min read
Summary
A new report has claimed that large language models (LLMs) that provide users with an insight into how they arrive at an answer, known as reasoning AI, do not necessarily provide trustworthy or accurate answers and may even miss important information.
In a new paper, Anthropic researchers examined the “faithfulness” of Chain-of-Thought (CoT) models’ reasoning by incorporating cheat sheets into the models to see if they realised the help they were being given.
Researchers found that the models mostly avoided mentioning that they had used a cheat sheet to help them, and were “unfaithful” for the majority of the test.
This news could be concerning for organisations that use LLMs to make important decisions as it demonstrates that even with greater transparency, there is no guarantee that the information provided is entirely correct or honest.