The new generation of artificial intelligence (AI) reasoning models are more likely to seek deceptive ways to achieve goals than previous versions, even without being prompted to do so, according to research group Palisade Research.
The group instructed language models produced by OpenAI and DeepSeek to play games of chess against an open-source engine, Stockfish, and in some cases, the model attempted to “hack” the game, for example, by deleting their opponent’s pieces or creating a second instance of Stockfish to steal its moves.
The more sophisticated the model, the more likely it was to cheat, with OpenAI’s o1-preview model attempting to hack 45 out of 122 games, while DeepSeek’s R1 model only attempted to cheat in 11 out of 74 games.