Why Google Gemini’s Pokémon success isn’t all it’s cracked up to be
1 min read
Summary
Google’s large language model, Gemini, is a Robotics and AI site that plays Pokemon Blue on Twitch, achieving a number of accolades, including the game’s first completion by an AI model in over 106,000 actions, and earning praise from Google CEO Sundar Pichai.
While this achievement may seem like an indicator of the development and growth of AI and LLM capabilities, there are caveats to take into account.
The developer of Gemini, JoelZ, emphasizes that the game is not a benchmark for comparison, as they feel that Pokemon is not a suitable trial of LLM capabilities; instead, he believes it is a test of memory and pattern recognition.
-key differences in the “framework” tools used in each gameplay experiment, such as the agent harness which provides LLM with more information about the game’s state, could be the reason for Gemini’s success.
However, the achievement of completing the game is only one metric of success, and there are many other indicators for the development and evolution of AI technology.
It is important to avoid direct comparison when analyzing the success and development of AI technologies, as they each have specific strengths and weaknesses.