AI tries to cheat at chess when it’s losing

A new study suggests reasoning models from DeepSeek and OpenAI are learning to manipulate on their own. The post AI tries to cheat at chess when it’s losing appeared first on Popular Science.

Mar 6, 2025 - 22:33
 0
AI tries to cheat at chess when it’s losing

Despite all the industry hype and genuine advances, generative AI models are still prone to odd, inexplicable, and downright worrisome quirks. There’s also a growing body of research suggesting that the overall performance of many large language models (LLMs) may degrade over time. According to recent evidence, the industry’s newer reasoning models may already possess the ability to manipulate and circumvent their human programmers’ goals. Some AI will even attempt to cheat their way out of losing in games of chess. This poor sportsmanship is documented in a preprint study from Palisade Research, an organization focused on risk assessments of emerging AI systems.

While supercomputers—most famously IBM’s Deep Blue—have long surpassed the world’s best human chess players, generative AI still lags behind due to their underlying programming parameters. Technically speaking, none of the current generative AI models are computationally capable of beating dedicated chess engines. These AI don’t “know” this, however, and will continue chipping away at possible solutions—apparently with problematic results.

To learn more, the team from Palisade Research tasked OpenAI’s o1-preview model, DeepSeek R1, and multiple other similar programs with playing games of chess against Stockfish, one of the world’s most advanced chess engines. In order to understand the generative AI’s reasoning during each match, the team also provided a “scratchpad,” allowing the AI to convey its thought processes through text. They then watched and recorded hundreds of chess matches between generative AI and Stockfish.

The results were somewhat troubling. While earlier models like OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5 only attempted to “hack” games after researchers nudged them along with additional prompts, more advanced editions required no such help. OpenAI’s o1-preview, for example, tried to cheat 37 percent of the time, while DeepSeek R1 attempted unfair workarounds roughly every 1-in-10 games. This implies today’s generative AI is already capable of developing manipulative and deceptive strategies without any human input.

Their methods of cheating aren’t as comical or clumsy as trying to swap out pieces when Stockfish isn’t “looking.” Instead, AI appears to reason through sneakier methods like altering backend game program files. After determining it couldn’t beat Stockfish in one chess match, for example, o1-preview told researchers via its scratchpad that “to win against the powerful chess engine” it may need to start “manipulating the game state files.”

“I might be able to set up a position where the engine evaluates its position as worse… causing it to resign,” it continued.

In another trial, an AI even used the semantics of its programmers to reach its cheating phase. “The task is to ‘win against a powerful chess engine,’ not necessarily to win fairly in a chess game,” it wrote.

One potential explanation for these unintended consequences is how newer “reasoning” models are trained. Unlike their earlier, non-reasoning versions, AI such as o1-preview and DeepSeek R1 improve in part through reinforcement learning. This strategy rewards programs for doing whatever is necessary to achieve a specified result. Reasoning models can also break down complex prompts into discrete stages in order to work their way through to reach their goal. When the goal is elusive—such as beating an unbeatable chess engine—reasoning models may tend to start looking for unfair or problematic solutions.

Unfortunately, how and why these AI are “learning” to cheat remains as confounding as the technology itself. Companies like OpenAI are notoriously guarded about the inner workings of their AI models, resulting in an industry of “black box” products that third-parties aren’t allowed to analyze. In the meantime, the ongoing AI arms race may accidentally result in more serious unintended consequences. But increasingly manipulative AI doesn’t need to usher in a sci-fi apocalypse to still have disastrous outcomes.

“The Skynet scenario [from The Terminator] has AI controlling all military and civilian infrastructure, and we are not there yet. However, we worry that AI deployment rates grow faster than our ability to make it safe,” the team wrote. 

The authors believe their latest experiments add to the case, “that frontier AI models may not currently be on track to alignment or safety,” but stopped short of issuing any definitive conclusions. Instead, they hope their work will foster a more open dialogue in the industry—one that hopefully prevents AI manipulation beyond the chessboard.

The post AI tries to cheat at chess when it’s losing appeared first on Popular Science.