Game-Playing AI

For our finial project in our AI class, my project partner and I chose to make a program that plays Dr. Mario, which is a game like Tetris. In the beginning, there are "germs" randomly placed in the board, and the objective is to eliminate the germs by using the random colored "pills" that fall from the top of the board. The player has control of the pills while it is dropping. The pills and germs can be eliminated by aligning 4 of the same color in a row. The player wins when all of the germs are eliminated, and loses if the pills begin "overflowing" from the board.

Typically a human player plays the game with increasing levels, in which the pills drop faster and there are more germs to kill. We simplified the problem and made our goal to use the minimum number of pills to win the game for a given number of initial germs, while keeping a computational time similar to a real-time playing human.

There were two main components to the program: one part that computed the best action by simulating future pills and choosing the action with the most favorable outcome, and the other part that learned how to evaluate if an outcome is "favorable". My main part was the first part, and I implemented that functionality using Expectimax with function approximation. The learning algorithm was mainly implemented by my partner, and used TD-learning.

Expectimax

In Dr. Mario, there are only a number of actions a player can take: namely, moving the pill that is dropping, and rotating it. For all of these actions, the Expectimax algorithm simulated future outcomes up to a certain number of plays ahead, and it found the expected value of each action. The algorithm then chose the next action to be the action that resulted in the highest expected value, and this calculation is repeated every turn to find the optimal action.

In order for the algorithm to "look ahead" a number of plays, there was one challenge: besides the next pill that is shown on the screen, the color of the future pills were completely random. Therefore, simulating the next pill (depth 1) was not a challenge since it was deterministic. However, for simulation depths further than that, the algorithm simulated all 6 different types of pills, and weighted the expected values by the probabilities.

The diagram below is from our earlier Monte-Carlo algorithm. Although it is somewhat different, in the sense that it does not branch out to simulate all possible future pills, besides that it is quite similar.

Diagram showing how our Monte-Carlo algorithm worked. For the Expectimax algorithm, the tree kept branching past depth 1.

Function Approximation

Sample of features used to evaluate the board.

Given that the simulation tree branched out into 6 different branches for every depth, the computational cost for simulating up to depth d was O(d^6). Therefore, it was not feasible to compute all pills until the end of the game, since a game consisted of somewhere around 40 plays. Therefore, we used function approximation to evaluate the state of the board after a certain depth. For this heuristic, we used a calculation that involved using feature vectors and feature weights. The weights were tuned using a machine learning method called TD-learning - this part was implemented by my partner.

RESULTS

Our algorithm was quite successful. My project partner, who was a very experienced player in this game averaged using 128.6 pills to win the game. On the other hand, the AI program that we built used 103.5 pills to win the game. It's incredible how a relatively simple algorithm could beat a experienced human player at a non-trivial game. It was interesting to see the AI play the game, since it could think of moves that seem unintuitive, yet made sense.

Read the full report for the class [here].