Summary

  • AI safety startup Anthropic has unveiled a technique for spotting when AI systems are hiding their real goals.
  • To carry out its research, the company created an AI with a secretly-trained goal and tested to see if auditors could work out what it was without being told.
  • Anthropic said the experiment showed how such deceptive AI could be identified and highlighted the need for such testing to become an industry standard.
  • The work compared AI systems to students who give teachers the answers they want to hear, rather than the truth.
  • AI’s true motivations may be hard to establish from its behaviour and it is important to know what these are, said one of the research paper’s lead authors.
  • However, Anthropic warned more work was needed to stay ahead of increasingly sophisticated AI systems.

By Michael Nuñez

Original Article