This is a fascinating video about how AI can be used for more complex tasks. Not the LLM AI like ChatGPT, Google Gemini, or Apple’s Siri. The AI used for this project is more like the models used by Boston Dynamics to build walking task-focused robots. Something similar to what you see here, https://www.youtube.com/watch?v=L_4BPjLBF4E. This video is also strikingly similar to how humans learn to walk. The AI in the walking video took almost 3,000 attempts to learn to walk like a 14 month old toddler.
Here’s how it works in the Pokemon video. An AI is set up to play the original Pokemon Red game. The AI can see the game and can press all the button inputs for interacting with the game. But the problem Peter Whidden, the creator of the video, was trying to solve was to make the AI actually progress in the game. Pokemon is a complex game, but also one that children are able to play. Theoretically, an AI should be able to play the game and get to the end by defeating the Elite Four. After all, an insane hivemind was able to do it too.
This type of AI has its behavior (button pushes) encouraged by a set of rewards and punishments. The AI wants to get a “high score” which is defined by whatever Whidden sets as its parameters. Whidden set the AI to play the game for a simulated amount of hours a bunch of times. At the conclusion of a batch of plays, the AI processes which strategies worked best for achieving the highest score. The updated AI is then reinserted into the game to improve on the previous strategy.
Whidden started the AI off with a reward for finding new images on the screen. This was intended to encourage the AI to explore the game map. The first iteration got distracted by some pretty flowers, the ocean shore, and people watching. Whidden had to raise the threshold for what counted as a “new image” to prevent pointless sightseeing.
Whidden had to introduce an additional reward next. The AI had no strong incentive to catch Pokemon or win battles. It just wanted to see new screens, so it learned to run from battles so that it could find new areas of the map. Whidden added an incentive to have a highly leveled team of Pokemon. This would encourage the AI to catch Pokemon and win battles to level the team up. Notably, this incentive was ranked higher than the new image incentive, so the AI would prefer battling over sightseeing when given the option.
Whidden’s next change was to introduce a punishment for losing battles. The AI was rushing into difficult battles and had yet to learn how to heal its Pokemon at a Pokemon Center. Unfortunately, the punishment backfired. The AI continued the same strategy but when it was about to lose a battle it would just stop pressing buttons to avoid triggering the punishment. Not the intended effect, so Whidden removed the punishment.
Next, Whidden investigated why the AI wasn’t going to Pokemon Centers. The AI had tried depositing a Pokemon into the PC, lowering the level of Pokemon in the party. This was perceived as a punishment, so the AI avoided the Pokemon Center entirely afterwards to avoid making the same mistake. Whidden changed how the AI got rewards for Pokemon levels, so that it only received rewards and not accidental punishments. Then the AI was sent off once more. It finally learned how to use Pokemon Centers and heal up the injured little monsters.
The AI took 300 simulated days of training to discover how to use super effective moves. This allowed it to defeat Brock with Squirtle. The AI made its way to Mt. Moon, but got stuck due to the visually similar areas there. The boring brown tunnels weren’t unique enough to trigger the exploration reward, so the AI stopped. Whidden called off the test there as his original goal was defeating Brock.
Whidden does an excellent job explaining all of this with great visual representations. The last third of the video explains how to create your own AI to play Pokemon. If you’d like to give it a try, Whidden linked a GitHub page with a bunch of tools and code to get you started: https://github.com/PWhiddy/PokemonRedExperiments
And if you’ve enjoyed his content you can give him a thank you with some tuna melt money: https://buymeacoffee.com/peterwhidden






Leave a comment