What’s up designers, and welcome back to Rempton Games. What you just saw was an artificial intelligence that I built at the end of 2018 to autonomously play Pokemon Emerald. Today, I want to share with you all how I designed and built this AI, talk about some of the principles behind it, and even look at some ways this AI could be further improved upon.
Before we jump into those details, however, I just want to say a big thank you to all of you watching this. Although this channel is still very small, it has been growing thanks to viewers like you sharing, commenting, and liking these videos, and that means a lot to me. I also really appreciate the kind words and even the critiques that you have provided in the comments. With that out of the way, lets get started.
Before I dig into the details of how I programmed this particular agent we should briefly go over some background on AI. For most people, when you hear the term AI your mind probably goes to something like HAL 9000 from 2001: A Space Odyssey, or maybe GlaDOS from Portal – machines that think and learn like a human being. Those types of machines are what are called “General AI” or “Strong AI”, and those sorts of machines or programs are still a long ways away.
Instead, in modern computer science we use various types of “weak AI”, that are generally pretty good at a specific task but can really ONLY do that task. Among weak AI there are a huge variety of different techniques that can be used, from very simple techniques that involve following a predetermined series of steps (that may not be considered “intelligent” at all), to advanced machine learning techniques used by something like Amazon Alexa.
While all of these different techniques work wildly differently, they all require three things. First, they require some way to collect data, through digital messages or sensors such as cameras, microphones, etc. Second, they all process this data in some way in order to come up with a decision. Finally, they require some way to actually execute their decision, such as moving a robot arm or saying a verbal response through a speaker.
These are the same three things that we need to build our Pokemon Playing AI. We somehow need to allow our AI to read data from the game, manipulate that data, and then send instructions back into the game. To accomplish this I decided to use a program called the OpenAI Gym Retro. This program was created by OpenAI, an AI research organization, for the purposes of conducting AI research using retro video games. It is actually intended to be used for something called Reinforcement Learning, which is an AI technique that allows the agent to slowly learn a task over time by rewarding it when it does well at the task and punishing it when it does poorly. However, my approach for this project was a bit different, and I mostly used Gym Retro as a platform to interact with the game.
The way Gym Retro works, you need three things. First, you need a ROM of the game you are working with. A ROM is basically a digital copy of a game that is created by copying all of the data from the original cartridge. This makes the ROM basically identical to the original game, and let’s you play it on a different device – in this case, my computer instead of a Game Boy Advance.
In order to actually run the ROM you need an emulator, which digitally mimics the hardware of the original console in the same way that a ROM mimics the software. Finally, you need the OpenAI Gym Retro software itself. This software basically acts as a wrapper around the emulator that allows you to read the data at specific memory locations, and manipulate that data using Python scripts.
Gym Retro allows us to get the three things we need to program our AI. First, we have a file called Data.Json that allows us to read data from the game. Each piece of data that we are reading has a name, a memory address that tells us where to read from, and a datatype that tells us how to interpret that data. As you can see from this file, we can use this to collect data such as the stats of each Pokemon in our team, or our character’s X and Y position.
Next, we have another file called Emerald.py. This is the file that will actually be determining how our AI agent performs, and honestly it’s a bit of a mess but I was doing the best I could at the time. Because this is where the majority of the action in this program takes place there is a lot going on, and I will dig more into exactly what this program is doing in just a bit.
The final component we need is a file called scenario.json. This file specifies the various actions that our AI agent can take. In this case, those actions correspond to various buttons that exist on the Game Boy Advanced, and when the AI sends a command the emulator treats that as if that particular button was pressed.
Now that we know what tools and files we need, let’s look into how we actually write those files. I’ll start with our data file – this file lets us keep track of several important pieces of information about our game, but first we need to locate that information. I’ll be the first to admit that I’m not much of a data-miner – in fact, working on this project is the only time I’ve really done anything of the sort, which made this process difficult. However, I will show you the method I used that was enough to get me by, for the most part.
The way I located the data I needed was by using a part of Gym Retro called the Integration UI. This program basically lets you play through the game manually while keeping track of various locations in memory. To find the memory location you want to locate, you first need to know what value you expect it to have. For example, let’s suppose I wanted to locate my Mudkip’s health value. I can use the search bar in the Integration UI to make a new search, and call it MudkipHealth. For value, I know that it’s current health is 27, so I search for that. That brings up a whole bunch of different values, and each of these represent a location in memory that has a value of 27. I need to narrow it down, so what I’m going to do is get in a battle and let my health drop down. Now I can search again for this new value – this time, it only searches among the locations it had already found, which should significantly narrow it down. Now we only have a handful of potential locations. Notice that a lot of these potential locations are actually the same memory address – this is because the same address can be read in several different ways, which represent different data-types. We just need to choose the data-type that matches what we are looking for – in this case, an unsigned little-endian integer.
Locating the data this way can be time consuming, but luckily I didn’t have to find every single value by hand. One way to speed up this process is to use resources like Bulbapedia that have information on the different data structures used in the Gen 3 Pokemon games. Using this information, I know that once I locate the HP value of my Pokemon I should find their Attack value four bytes after. I should also be able to find the next Pokemon in my party 100 bytes after, and so forth. By using this data, I only need to search for a few key values to find most of the information I need. Once I have found the values I am looking for I can not only use the for my AI program, but I can also do fun stuff like making all of my Mudkip’s stats 999.
Now that we have located the data our AI needs, let’s look at how the actual agent itself is designed. The way I see it, Pokemon is basically split into two main parts – navigating the overworld, and battling. Each of these tasks is very different, so I actually use two different techniques to handle these two modes.
Let’s first look at the task of navigating the overworld in Pokemon. A big part of playing Pokemon is walking around the overworld from town to town, and the AI had to be able to navigate the world somehow. This meant it had to have some idea of where it was, where it needed to go, and also needed to be able to deal with complications such as dialogue boxes, cut-scenes, and walking in and out of buildings.
I know that somewhere in the game’s memory is information about each area map and how they are connected, and if I could access that information my agent would be able to navigate in a much more intelligent manner. Unfortunately I am still a novice dataminer, and the technique that I have been using requires me to already know the value of the memory location I am searching for. Because of this, I had to come up with a different solution.
Luckily, I was able to locate memory values that keep track of the player’s X and Y positions, by using the assumption that their starting position at the beginning of the game would be considered “0, 0”. While this information isn’t much to go off of, I was able to develop a navigation system with two main parts – mapping, and path-finding.
Because I am unable to access the game’s internal map data, I decided to have my character create their own maps. Every time the AI moves (or attempts to move) they learn a little bit more about the world around them. If they can walk to that new square they mark that space as walkable, if they can’t they mark it as an obstacle, and if stepping on a new square takes them to a new location (such as a doorway that takes you inside a building) they mark it as a warp. Each time they move to a new square the AI updates its internal map to reflect this new information, and these maps can be used for pathfinding.
The second component is pathfinding. The agent doesn’t really have any idea where it is “supposed” to go, so it makes up for this by simply going everywhere. It basically has 2 goals. First, if there are any spaces that it can reach that are still unknown, it will try to go to the closest of those spaces. If there aren’t any unknown spaces that it can reach it will backtrack and go back to the square that it visited the least recently. Using this method, it should eventually reach every space.
Once the agent has selected a destination, it uses a pathfinding algorithm known as A* to actually find a path and navigate to that destination. While I am not going to go into all the details of A* search here (there is a Computerphile video that I will link to that I’m sure does a fantastic job of explaining it), the really brief explanation is that it builds the path one square at a time by determining which square will get us closest to the end goal while estimating how much distance still remains. This algorithm is a very common one used for navigation, since it is always guaranteed to give an optimal path and it is very time efficient.
Putting these pieces all together, and the agent moves around the world by picking a destination, pathfinding to that destination, and building up a map of the world around them as they go. However, moving around the world is only half of what we need it to do. As this is a Pokemon AI, it of course also needs to be able to battle.
I’m going to be upfront and confess that when I was working on this I was not really able to implement the battle system that I dreamed of. This is because, due to my limited experience datamining, I was unable to locate certain data that is necessary for my design to work. With that information I would have been able to implement the system I am about to describe, but keep in mind that from here the discussion is more hypothetical – this is how I would design an AI battle system, but it has not yet been built.
My concept would basically use a game tree to determine the most effective action to take each turn. A game tree basically goes over every possible action that could be taken during a turn – each attack that your Pokemon could perform, each Pokemon you could switch to, perhaps even actions such as running away or using an item – and assigns a score to it. In this instance, for example, the score would take into account how much damage you can do to your opponent – more damage is better, with a bonus for knocking out one of their Pokemon. However, it would also assign negative points for the damage your opponent could do to you, and give a big penalty for one of your own Pokemon fainting. In order to determine these scores, the AI would need to know information such as the types of your Pokemon and the opponent’s Pokemon, your Pokemon’s stats, the types of every move you have available, as well as their damage values. It can use this information to calculate how much each attack will do to your opponent, on average, and will do the same for your opponent. It could then look several moves ahead, and find the course of action that is likely to result in the most damage to your opponent, which causing the least damage for your own Pokemon. This might include choosing the most effective attacks for your Pokemon, or switching to a Pokemon that has a more advantageous match-up.
There is a lot more I could talk about with this system, but I think that covers most of the important bases. I’m sure many of you still have a lot of questions, and I will try to answer those in the comments down below. If enough of you have questions or want to hear more, maybe I’ll eventually make a follow up video to respond to those, so please let me know if you are interested in hearing more. I also have a number of other projects I have worked on over the years, so let me know if you found this interesting and maybe we can talk about those some time.
That’s all I have for today. Once again thank you so much for watching this video. If you liked it, please leave a like, and subscribe so you don’t miss more videos like this in the future. If you want to see more, check out my other videos, like my previous one where I look at some of the tricky (and controversial) economic problems surrounding the price of games and the game industry. And join me next time, for another installment of my Game Designer Spotlight series, this time focusing on Richard Garfield. Until then, thank you so much for watching and I’ll see you all next time!