As AI becomes increasingly sophisticated, understanding how it reaches the conclusions it does is becoming more difficult and important than ever.
In the case of large language models, they are built on artificial neural networks and whilst they are trained to perform well, it is difficult to understand the process through which they arrive at a conclusion.
In a new development, scientists are using the field of mechanistic interpretability, inspired by neuroscience, to study and understand language models by looking at the mathematics underneath and how the algorithms work.
So far, it has helped explain how language models represent different concepts and how they accomplish certain tasks, as well as highlighting the fact there are many anomalies and it is not as simple as predicting the next word.
However, the field is still in its early days but Asma Ghandeharioun, an interpretability researcher at Google DeepMind, said: “It is possible to make progress… we’re well ahead of where we were five years ago.