Microsoft Copilot joins ChatGPT at the feet of the mighty Atari 2600 Video Chess
Copilot's confidence was... misplaced
Not content with humiliating ChatGPT at the hands of Video Chess on an Atari 2600 emulator, Robert Caruso has tried again, this time with Microsoft's Copilot.
Theoretically, the result would be the same, and Copilot would take a similar drubbing. Yet... what if Copilot triumphed where ChatGPT could not? "There's no reason to think it would," wrote Caruso, but... "Imagine everyone's head exploding if a MICROSOFT product outperformed ChatGPT."
So Caruso fired up the Stella emulator and had a pre-game chat with Copilot to explain what tripped up ChatGPT. He told the chatbot that one of the main reasons why ChatGPT lost was that it could not keep track of the board. If Copilot suffered the same difficulty, then there'd be little point in bothering to play.
With the confidence that only an AI chatbot could muster, Copilot insisted not only could it play chess, but it was also jolly good at it. Caruso said, "It claimed it could think 10–15 moves ahead — but figured it would stick to 3–5 moves against the 2600 because it makes 'suboptimal moves' that it 'could capitalize on... rather than obsess over deep calculations.'"
And keeping track of the board? Copilot boasted, "I make a strong effort to remember previous moves and maintain continuity in gameplay, so our match should be much smoother."
Copilot admitted to having the same spatial memory gaps as ChatGPT, yet said it could analyze the current board and pick good moves. Caruso would need to give the chatbot a screenshot of the board after the Atari's move and feed Copilot's moves into Video Chess by hand.
The game was afoot!
By now, anybody with experience of today's generative AI systems will know what happened. Copilot's hubris was misplaced. Its moves were... interesting, and it managed to lose two pawns, a knight, and a bishop while the mighty Atari 2600 Video Chess was only down a single pawn. Eventually, Caruso asked Copilot to compare what it thought the board looked like with the last screenshot he'd pasted, and the chatbot admitted they were different.
- Chap claims Atari 2600 'absolutely wrecked' ChatGPT at chess
- Google offered millions to ally itself with trade body fighting Microsoft
- CrowdStrike apologizes to Congress for 'perfect storm' that caused global IT outage
- Microsoft wasn't CISPE's only suitor – it seems Google was willing to pay for its views on cloudy licensing to prevail
"ChatGPT déjà vu."
There was no way Microsoft's chatbot could win with this handicap. Still, it was gracious in defeat: "Atari's earned the win this round. I'll tip my digital king with dignity and honor [to the] the vintage silicon mastermind that bested me fair and square."
Caruso's experiment is amusing but also highlights the absolute confidence with which an AI can spout nonsense. Copilot (like ChatGPT) had likely been trained on the fundamentals of chess, but could not create strategies. The problem was compounded by the fact that what it understood the positions on the chessboard to be, versus reality, appeared to be markedly different.
The story's moral has to be: Beware of the confidence of chatbots. LLMs are apparently good at some things. A 45-year-old chess game is clearly not one of them. ®