What’s up designers, and welcome back to Rempton Games. I know I haven’t posted in a while, but I am finally back with some new Pokemon AI content. Specifically, I decided to try my hand at making an AI (or actually, a couple AIs) for Pokemon Showdown. In this video we will first be taking a look at why and how I made these bots, and how you can make your own Showdown battle bots. After that we will take a look at them in action, where I will be pitting them head to head in a Pokemon AI Battle Royale. Finally, at the end of the video I will be answering some questions about my previous Pokemon Emerald AI project, and I have a major announcement of future work that you will definitely want to stick around for. If you want to skip to any of those sections you can check the chapters or the time stamps in the description.
Lets start by taking a look at how and why I made these AI. I assume most of you are already familiar with Pokemon Showdown, but for those who aren’t it is an online Pokemon battle simulator, where you can build Pokemon teams and compete against other players. I decided to make bots for Pokemon Showdown, rather than for ROMs like my previous project, because I really wanted to focus on the battling aspect, and not have to worry about factors such as navigating the world, solving puzzles, capturing Pokemon, etc. For the same reason, I chose to focus on the Random Battle format, in which both players are given a semi-random team of 6 Pokemon, to remove factors like Teambuilding and just focus on making the best decisions in battle.
Of course, I was far from the first person to have this idea, so the first step was doing research on the work that had previously been done in this field. There have been a handful of different research projects in the past with a similar goal of testing and designing Pokemon battle bots, many of them for Showdown specifically, and I’ll leave links to several in the description if you are interested in reading them.
Some of these previous projects even had open source Gitlab repositories available for download, which was incredibly helpful. The Pokemon showdown source code is already open source, but for this project I also ended up using a gitlab library called Poke-env, primarily written by Harris Sahovic. Poke-env basically made it easier to send messages and access information from Pokemon Showdown. It was incredibly user-friendly and well documented,and I would 100% recommend it to anyone interested in trying their own bots. A link to the repository is in the description.
The poke-env documentation includes a set of “Getting Started” tutorials to help users get acquainted with the library, and following these tutorials I created the first two of my bots – Random_Player and Max_Damage_Player. Both of these bots are incredibly simple – Random player literally just picks random moves each turn, and Max_Damage_Player simply chooses the move that has the highest base power. These bots are fine for educational purposes, but there was plenty of room for improvement.
The next bot I made was what I call a “rule based AI”. Rule based AIs basically make decisions based on a flowchart – if this is true do a particular action, otherwise do this other action. These AI can be quite complicated, and can be incredibly effective, but are also probably the least sophisticated. The effectiveness of these types of AI depends entirely on the knowledge of the person creating them, and the only way to make better decisions is to add more and better rules to the system.
I called this bot the “Smart Damage Player”, but at the end of the day it’s not actually that smart (except when compared to the dummies that the previous bots were) . In fact, this bot was a pretty simple bot that used relatively rudimentary “rules of thumb” to determine what attacks to use, and which Pokemon to switch to. Basically, it first uses info about type matchups and and speed tiers to decide whether it should switch to a different Pokemon. Then, if it decides not to switch it uses a simplified version of the damage formula to estimate how much damage each attack will do, and chooses the attack with the highest damage.
The second bot I made was a “search-based AI”. These AI make decisions by searching through a bunch of different possibilities, and choosing the best one. For games this is usually done by constructing a “game tree”, which basically maps out all of the possible moves that can be taken by each player. You start out at the top with the current state of the game, then below that you map out all of the possible actions that you can take this turn. Then, for each of those moves you look at all of the actions your opponent can take in response, then how you respond to those, and on and on until you reach the end of the game. Now you’ve mapped the entire game, and just have to follow the path that leads to victory! At least, in theory.
For a simple game like Tic-tac-toe it is possible to calculate the entire game tree. However, for anything more complicated this quickly becomes infeasible, due to the physical limitations of the computing hardware. For Pokemon, trying to construct the entire game tree is completely out of the question – even if we simplified the game so that neither player can switch until a Pokemon dies, that still leaves each player with 4 possible actions each turn – we call this number “the branching factor”. Assuming that the average Pokemon game lasts about 20 turns, and you have to calculate moves for both yourself and your opponent, this results in around 4 to the 40th power different final states, which is about 1.2 Septillion different states – and this a VERY conservative estimate.
Since we clearly can’t calculate the whole tree, we need to set a limit on how deep into the tree we go, and how many moves we look at. We also have a few different ways of searching through the game tree – for this project specifically, I chose to use a technique called Minimax. Minimax works by constructing a game tree down to your specified depth, then assigning each state a score. The bottom nodes in the tree, called “leaves”, get scored based on the state of the game at those nodes. For example, the score increases when I damage or knock out an opponent’s Pokemon, and decreases when my Pokemon are damaged or knocked out.
Once all the bottom nodes are scored, those scores work their way up the tree. For each of my opponent’s actions the score will be based on the minimum of all of the actions below it. This is because we assume my opponent wants to minimize my score. For each of my moves the score is the maximum of all of the child moves, because I obviously want to choose the action that maximizes my own score. Once the scores have worked their way all the way up to the top of the tree I will have determined the “safest” action to take – basically, the action that will get me the best outcome (assuming my opponent is playing optimally).
I had high hopes for my Minimax bot, but to be honest it didn’t do as well as I expected. With a search depth of 1 it performed about the same, or maybe a little bit better, than my rule-based bot. However, as I increased the depth of the search tree it actually started to perform worse, rather than better. There could be a few reasons for this. The first explanation is simply that the code has a bug in it somewhere that is causing the bot to make worse decisions. I tried to work out all of the bugs the best I could, but it is still entirely possible that I missed something.
The second possibility is simply that traditional game-tree search doesn’t translate perfectly to Pokemon. Game tree search is designed for games that alternate between one player and the next, but Pokemon doesn’t work this way. Instead, both players choose their actions simultaneously, and the game decides which order those actions should happen in. My program tried to work around these constraints, but it’s still possible that the simultaneous nature of the game makes game tree techniques less effective.
The final possible reason is simply that my bot doesn’t have enough information to make smart decisions. This could be for a few different reasons. The first is just the limitations of the program – there are lots of things, such as status effects, weather conditions, and entry hazards, that my bot simply doesn’t take into account. Maybe if it did, it would be able to make better decisions ( and this goes for my rule-based bot as well).
Another source of missing information is just the game itself. For one thing, in order to produce the game tree I try to take my opponent’s possible moves into account, but the simple fact is that I don’t know everything about my opponent. I don’t know what moves they have until they use them, I don’t know what Pokemon are on their team until they switch to them, and I don’t know exactly what stats their Pokemon have, which makes it impossible to accurately predict their actions. There is also a lot of randomness involved in Pokemon – attacks can miss, or get critical hits, some attacks have multiple hits, or a chance of a secondary effect (such as flinching or causing status effects). Even a simple attack like Tackle has some randomness involved with exactly how much damage it does. Because of all of this, it’s possible that the reason my predictions get worse as the tree gets deeper is simply because this randomness makes the game impossible to accurately predict.
One type of bot I did not create was an AI that uses machine learning. This is because I don’t think Pokemon is currently a very good application of machine learning, and my research seems to support this. For example, the paper “Showdown AI Competition” proposed a competition between several different Pokemon Showdown AIs, and compared several different techniques. In that paper there were two different reinforcement learning agents, and both performed pretty poorly against the other techniques that they faced. I simply think that they vast amount of state information a reinforcement learning bot would have to track (including nearly 900 different Pokemon and over 800 moves), as well as the randomness inherent in the game, make it a very non-ideal use of machine learning. There have been researchers that have developed effective machine learning AI for Pokemon-type games, but in my research these games always use very simplified versions of the game that eliminate many of its particular challenges.
Now that we have discussed the different bots and how they work, let’s see what they can do with a round-robin tournament! In this tournament there will be 5 competitors. Four of them we have already seen – the random-move bot, the simple max-damage bot, the rules-based bot, and the minimax bot. The last competitor was designed by Harris Sahovic as part of the poke-env library – it’s called the “Simple heuristics player”, and is basically a more advanced version of my rules-based bot.
I pitted each of these bots against every other bot in 1,000 battles. There are 10 different match-ups, for a total of 10,000 battles, and I have put the results in the table on-screen. As expected, the Random player is the clear loser, winning a vast majority of battles to every other bot. However, there are still a few interesting things to note. The first is that, even though it lost quite badly every time, it was still able to win some matches against all of its opponents. Its also interesting that the minimax player did particularly bad against the random player – probably because it is trying to predict the opponent’s actions, which is impossible against a player who doesn’t act predictably.
The next worst bot was also the next simplest – the max damage player, which lost to everything except the Random player. After that was the minimax player and the smart damage player, with the winner of our bot tournament being the heuristic player. The heuristic player definitely has the most in-depth decision making process, and takes several different factors of the game state into account.
However, I think there is still a lot of room for improvement. I have some ideas on how I can improve the decision-making abilities of both my minimax and rule based bots, but to be honest I think the main way I’m going to get big improvements in decision making is simply to increase the complexity of my rules and scoring functions. I think this project has really made me appreciate just how complicated of a game Pokemon is, and the incredibly number of factors that have to be taken into account to make smart decisions. I’ll probably follow up on this at some point in the future, but I expect future bots to require a huge amount of brute-force programming, and this project was already more than my schedule can handle.
That’s all I have to say about these bots, and I hope you enjoyed learning about them. However, before I end the video, I want to talk a bit about my previous Pokemon AI project, because I know a lot of people have a lot of questions, and I’ll try to answer some of the main ones.
One of the big questions I got about the Emerald bot is – how far did it make it on it’s own? And the answer is…not very far. That project was far from perfect, and was made during a very limited time window.On its own, I know it got far enough to get it’s first Pokemon and face the rival, but I don’t think it ever went much further than that. The software interface honestly wasn’t great – for example, just pressing the button long enough to step a single square in a direction was pretty finicky, and despite my best efforts it still usually ended up getting stuck somewhere.
However, I have since realized that trying to make a bot that handles every different aspect of playing Pokemon was doomed from the start. Even if I could make the mapping and battling functions work perfectly – a really big ask, especially considering how non-user friendly the interface was – it still wouldn’t be enough. Some people have asked how it deals with puzzles, and the answer is.. It doesn’t. Pokemon games are full of clever navigation puzzles – most of the gyms are designed like mazes, with moving floors and teleportation pads, and you are required to use HM abilities like strength and surf to progress. There really isn’t a simple way to deal with these types of hurdles.
Some of you have also asked how the project deals with backtracking, such as when you have to go back to previous areas in order to progress. It actually does pretty well with these types of problems – at least in theory. The search algorithm is designed to number the spaces that it visits, and if it runs out of unvisited spaces it is programmed to go back to the space it visited least-recently. I believe it successfully walked to the end of route 102 and back, and given enough time it would revisit every space. This would be a slow process, but the idea is that it would sort of loop around the world, making a little bit of progress every time. Of course, this would be impossible due to the issues I mentioned previously.
I know many of you were disappointed with my previous video, because you wanted to see an AI play completely through a Pokemon game. Unfortunately, I failed to deliver. The only way I can think to make this project feasible would be to basically make a rom-hacked version of Pokemon that was built to be “bot-friendly”. This is something that would take a huge amount of work, and isn’t really possible for me right now. HOWEVER, I do want to make a commitment – if my channel should ever grow to 100,000 subscribers, I promise to make a customized Pokemon game, upgrade my bot to play all the way through it, and post the complete play-through on my channel!
Will that ever happen? Who knows! But I do know that’s all I have for today. If you liked this video, or if you want to see an AI play all the way through a Pokemon game, them make sure to leave a like, subscribe, and maybe even tell a friend about the channel! If you want to see more, definitely check out my previous Pokemon AI video if you haven’t, and I also have a bunch of other Pokemon, programming, and game design videos on my channel I think you’ll like. And join me next time for the ultimate Settlers of Catan strategy guide! Until then, thank you so much for watching and I’ll see you all next time.