MiniMax Tree

Introduction

The next step up from using a Priority Map (explained here), where only the next move is examained, is to use a MiniMax Tree that looks many moves in advance. A MiniMax Tree is simply a tree whose nodes represent the state of the game. Each node can have any number of children, and the lines to those child nodes represent moves by players. Each layer in the tree is called a ply and the plies alternate from player1 to player2. This type of AI works best when there is "perfect information" about the game. By that I mean both players must be able to see the entire state of the game. So games like checkers,tic-tac-toe and chess work well with this system, but something like dominos or poker would not work as well since each player has some state of the game that the other does not know about. For this discussion we will always assume player1 is the player evaluating the next move.

How it works.

When it is time for player1 to make a move a tree is created with the root node being the current state (ply1). All possible moves by player1 from that state create the next ply of nodes (ply2). The next ply (ply3) contains the state after all the possible moves by player2 following the moves in ply2. The pattern continues, generally for a fix depth of plies. Even a simple game like checkers or ConnectFour can build a tree far too large to store in memory or to traverse in a reasonable amount of time. I'll mention more about reducing the nodes traveled and the size of the tree later. For now let's assume there are leaf nodes representing an end state that can be given a value. The value in each leaf represents the outcome of the game from player1's perspective. A higher value means a better outcome of player1 and a lower means a worse outcome. The valuation of the end nodes is game specific and won't be discussed here.

It would make sense from the standpoint of player1 to pick the highest value of the leaf nodes. However half of the nodes to be traveled through are reached by a move from player2 and it is clearly not in player2's best interest to pick high values for player1. This causes an alternating system of evaluations as the values are bubbled up to top of the tree. Player1 is interested in maximizing the point value and player2 is interested in minimzing those values. At alternating plies the value passed up to the parent will be the highest value, and on the others it will be the lowest value. This comes from the assumption that each player will make the best move available to them.

Once the tree is constructed (this can be done on the fly, but we'll treat it as if the tree is complete for now), a depth first search is done on the tree with the goal being propagating the values from the leaf nodes to the top of the tree. However each ply has a different set of rules as to which value to send to the parent. The plies representing player1's moves, the even plies, are considered the MAX plies. The odd plies (excluding the root ply) are considered the MIN plies. The MAX plies are going to send the highest score from that ply upward and the MIN plies will send the lowest.

Let's take a look at a basic MiniMax algorithm. In the following pseudocode the work is done in 2 very similar methods. evaluateMax takes a node and then cycles through the children looking for the highest value, evaluateMin does the same except it looks for the lowest value.

        // returns the node corresponding to the state after the best move
        //   assumes Node has a way of storing the move.
  1:    Node doMiniMax([in]tree) {
  2:      return evaluateMax(tree.root);
  3:    }
  4:
  5:    Node evaluateMax([in]node) {
  6:      bestNode = negativeInfinityNode                 // keep track of the node with the best value
  7:      children[] = node.getChildren()                 // get the next set of states
  8:      if (no children) {                              // this is a leaf
  9:        computeValue(node)                            // generate the game specific value for a node
 10:        return node                                   // all done here
 11:      }
 12:      for (each child) {                              // loop through all the children -> can optimize this!!
 13:        tempNode = evaluateMin(children[i])           // get the min node from the next ply
 14:        if (tempNode.value > bestNode.value)          // is it a greater value than our current?
 15:          bestNode = tempNode                         // update the bestNode
 16:      }
 17:      if (node is not root) {                         // should only be false for ply2
 18:        node.value = bestNode.value                   // set the parent's value to the max child value
 19:        return node                                   // keep from just passing the leaf node up
 20:      }
 21:      return bestNode                                 // for ply2 return the actual node back
 22:    }
 23:
 24:    Node evaluateMin([in]node) {
 25:      bestNode = positiveInfinityNode                 // keep track of the node with the best value
 26:      children[] = node.getChildren()                 // get the next set of states
 27:      if (no children) {                              // this is a leaf
 28:        computeValue(node)                            // generate the game specific value for a node
 29:        return node                                   // all done here
 30:      }
 31:      for (each child) {                              // loop through all the children -> can optimize this!!
 32:        tempNode = evaluateMax(children[i])           // get the max node from the next ply
 33:        if (tempNode.value < bestNode.value)          // is it a smaller value than our current?
 34:          bestNode = tempNode                         // update the bestNode
 35:      }
 36:      if (node is not root) {                         // should never be false
 37:        node.value = bestNode.value                   // set the parent's vale to the min child value
 38:        return node                                   // keep from just passing the leaf node up
 39:      }
 40:      return bestNode
 41:    }
        

Looking at the algorithm, it is a little confusing trying to match the method calls to the nodes. Because it appears that the nodes in ply2, the MAX ply, get passed to the evaluateMin method. This does happen, but the node passed into the two methods is actually the parent and it is the value of the children that gets minimized or maximized. I'll walk through a call to explain:

  • doMiniMax(tree)
    Start things off by passing in the tree to evaluate
  • evaluateMax(n1)
    We want the maximum value move starting from this state
    • children[] contains n2, n3, n4
      These are the states the we can reach from our current state. We will return the highest value out of the 3 states.
    • evaluateMin(n2)
      From this state it is Player2's move and they will want to minimize our score.
      • children[] contains n5, n6, n7
        These are the states reached by Player2's moves. We will return the lowest value of these nodes.
      • evaluateMax(n5)... value of 10
        Just like evaluateMax(n1), shorthanding a return value here.
      • evaluateMax(n6)... value of 13
        Just like evaluateMax(n1), shorthanding a return value here.
      • evaluateMax(n7)... value of 8
        Just like evaluateMax(n1), shorthanding a return value here.
      • return n7 value of 8
        After completeing the loop we return the node with the lowest value.
    • evaluateMin(n3) ... value of 5
      Just like evaluateMin(n2), shorthanding a return value here.
    • evaluateMin(n4) ... value of 15
      Just like evaluateMin(n2), shorthanding a return value here.
    • return n4 value of 15
      After completing the loop we return the node with the highest value.
  • return n4 value of 15

How big does the tree get?

Consider that a tree of all moves in a tic-tac-toe game will have more than 9! nodes in it (362,880) and that is just a game with 9 squares where all you do is remove places to move. In a game like chess, where pieces can move back and forth without eliminating moves, the trees even just a couple levels deep can get to overwhelming sizes. The opening move in chess has 20 different options alone.

How do you keep it under control?

As written above this algorithm will traverse the entire tree which in all but the most trivial cases will be far too lengthy. It's also the case that many times all nodes will not need to be investigated due to the alternating pattern of min and max value searching. A shortcut can be used called Alpha-beta shortcutting. Let's revise the algorithm to include Alpha-beta shortcuts and then describe how the shortcutting works.

        // returns the node corresponding to the state after the best move
        //   assumes Node has a way of storing the move.
  1:    Node doMiniMax([in]tree) {
  2:      return evaluateMax(positiveInfinityNode, tree.root);
  3:    }
  4:
  5:    Node evaluateMax([in]nodeToBeat, [in]node) {
  6:      bestNode = negativeInfinityNode                 // keep track of the node with the best value
  7:      children[] = node.getChildren()                 // get the next set of states
  8:      if (no children) {                              // this is a leaf
  9:        computeValue(node)                            // generate the game specific value for a node
 10:        return node                                   // all done here
 11:      }
 12:      for (each child) {                              // loop through all the children -> can optimize this!!
 13:        tempNode = evaluateMin(bestNode, children[i]) // get the min node from the next ply
 14:        if (tempNode.value > nodeToBeat.value) }      // on calls from min we can stop if we are greater than
 15:          node.value = tempNode.value                 // what evaluateMin is already going to return
 16:          return node
 17:        }
 18:        if (tempNode.value > bestNode.value)          // is it a greater value than our current?
 19:          bestNode = tempNode                         // update the bestNode
 20:      }
 21:      if (node is not root) {                         // should only be false for ply2
 22:        node.value = bestNode.value                   // set the parent's value to the max child value
 23:        return node                                   // keep from just passing the leaf node up
 24:      }
 25:      return bestNode                                 // for ply2 return the actual node back
 26:    }
 27:
 28:    Node evaluateMin([in]nodeToBeat, [in]node) {
 29:      bestNode = positiveInfinityNode                 // keep track of the node with the best value
 30:      children[] = node.getChildren()                 // get the next set of states
 31:      if (no children) {                              // this is a leaf
 32:        computeValue(node)                            // generate the game specific value for a node
 33:        return node                                   // all done here
 34:      }
 35:      for (each child) {                              // loop through all the children -> can optimize this!!
 36:        tempNode = evaluateMax(bestNode, children[i]) // get the max node from the next ply
 37:        if (tempNode.value < nodeToBeat.value) }      // on calls from max we can stop if we are less than
 38:          node.value = tempNode.value                 // what evaluateMax is already going to return
 39:          return node
 40:        }
 41:        if (tempNode.value < bestNode.value)          // is it a smaller value than our current?
 42:          bestNode = tempNode                         // update the bestNode
 43:      }
 44:      if (node is not root) {                         // should never be false
 45:        node.value = bestNode.value                   // set the parent's vale to the min child value
 46:        return node                                   // keep from just passing the leaf node up
 47:      }
 48:      return bestNode
 49:    }
        

Only 8 new lines and a couple of extra variables and we have enabled Alpha-Beta Shortcuts. All we needed to do was add a check to see if the tempNode value is less than or greater than, depending on the Min/Max bias, a nodeToBeat variable passed in by the parent. If we meet the condition we can return early because we know this particular branch will not be selected. This can be a huge savings in computation time and in memory, if the nodes are created dynamically.

The idea behind Alpha-Beta shortcutting is that the parent node passes in the current best value. If the best value in the child is already out of range then no more works needs to be done. It works because the child and parent are working in opposition. Let's walk through a short example:

I'll step through the algorithm on the tree above to help explain.

  1. doMiniMax() calls evaluateMax(n1) for the first time, passing in positiveInfinity for the node to beat.
  2. evaluateMax(n1) sets bestNode to negativeInfinity and calls evaluateMin(n2) passing in bestNode(-inf)
  3. evaluateMin(n2) sets bestNode to positiveInfinity and calls evaluateMax(n5) passing in bestNode(+inf)
  4. This reachs n5, a leaf which returns the value of 15
  5. evaluateMin(n2) has no more children, sets its value to 15 and returns
  6. evaluateMax(n1) sets bestNode to 15 and calls evaluateMin(n3) passing in bestNode(15)
  7. evaluateMin(n3) sets bestNode to positiveInfinity and calls evaluateMax(n6) passing in bestNode(+inf)
  8. n6 is a leaf, returns the value of 20
  9. evaluateMin(n3) sets bestNode to 20 and calls evaluateMax(n7) passing in bestNode(20)
  10. evaluateMax(n7) sets bestNode to negativeInfinity and calls evaluateMin(n10) passing in bestNode(-inf)
  11. n10 is a leaf and returns a value of 17
  12. evaluateMax(n7) sets bestNode to 17, 17 is not greater than nodeToBeat(20) so it continues calling evaluateMin(n11) with bestNode(17)
  13. n11 is a leaf and returns a value of 25
  14. evaluateMax(n7) checks and sees 25 is greater than the nodeToBeat. It shortcuts and doesn't call to n12, but returns with a value of 25 knowing it will not get picked by the min parent above.
  15. evaluateMin(n3) checks and sees 25 is not less than nodeToBeat(15) so calls evaluateMax(n8) passing in bestNode(20)
  16. n8 is a leaf and returns a value of 4
  17. evaluateMin(n3) checks and 4 is less than the nodeToBeat(15). It shortcuts and doesn't call n9. It sets a value of 4 and returns knowing it will not get picked by the max parent above.
  18. evaluateMax(n1) checks and sees 4 is not greater than +inf and calls evaluateMin(n4) passing in bestNode(15).
  19. n4 is a leaf and returns a value of 12.
  20. evaluateMax(n1) gets to the end of it's children, sets its value to that of bestNode(15) and returns to doMiniMax()
It takes a few runs through the algorithm to figure out how it works, but it is actually pretty elegant and has the potential to save lots of cycles.

Do I really have to walk the whole tree?

The next optimization to look at is to limit the depth of the tree searched and to define some way of evaluating a branch if it does not end in a leaf already. For instance in a chess match it is impossible to search the entire tree for an outcome all the way to the end, there are just too many options. Instead one might limit the search to 3 nodes deep and then evaluate the board to see if the outcome is positive or not. Some factors could be pieces captured, pieces lost, position gained, reference to a particular gambit or position, number of pieces in striking position, number of pieces in a spot to be captured or make a capture, etc...

This is a pretty simple optimization to make. It consists of keeping track of which ply you are in and comparing that number with a global variable the defines how deep you will go. Other ideas include counting the number of nodes searched and stopping after a certain number and setting a timer that will short-circuit the search after a defined amount of time (much like in a chess match).

These are the changes to the listing above that will add depth checking to the alpha-beta cutoff MiniMax algorthim. Lines 8 and 31 are the important changes. It is at that point that when we reach the max depth we stop the traversal and head back up the tree. Be careful not to modify the arguement passed into the two methods without reverting or you'll get only the left side of the tree. That is why I choose to use a local variable to modify and pass on to the node's children. I also opted to just decrement a depth counter versus counting up and comparing to a global. Either way will work, but this way provides more flexibility in the depth searched.

  1:    Node doMiniMax([in]maxPlyDepth, [in]tree) {
  2:      return evaluateMax(maxPlyDepth, positiveInfinityNode, tree.root);

  5:    Node evaluateMax([in]plyDepth, [in]nodeToBeat, [in]node) {
 5+:      curPlyDepth = plyDepth-1                        // keep track of the depth (don't modify the arguement)

  8:      if (--curPlyDepth ==0 || no children) {         // we're at max depth or this is a leaf

 13:        tempNode = evaluateMin(curPlyDepth,
                                   bestNode, children[i]) // get the min node from the next ply


 28:    Node evaluateMin([in]plyDepth, [in]maxPlyDepth, [in]nodeToBeat, [in]node) {
28+:      curPlyDepth = plyDepth-1                        // keep track of the depth (don't modify the arguement)

 31:      if (--curPlydepth || no children) {             // we're at max depth or this is a leaf

 36:        tempNode = evaluateMax(curPlyDepth,
                                   bestNode, children[i]) // get the max node from the next ply
        

Yet another optimization is to store states in a lookup table the allow the quick reference to their value. Particularly when a game can have pieces that move back the way they came it is possible to have pieces in the same position again. In such a case a lookup table can speed the decision process along. This is a pretty advanced optimization however and could have an entire article (or series of them) written for it and I will leave that for later.

References

[1] AI Depot article - MiniMax Explained
[2] Generation5 - MiniMax Trees
[3] Generation5 - Basic Othello Strategy
[4] Generation5 - Simple Board Game AI
[5] Caltech - Minimax and Alpha Beta Template
[6] AlphaBeta cuttoff discussion
[7] MiniMax and Alpha-Beta Cutoff