You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The term 'zero-sum' is confusing but makes sense if you imagine each player is charged an entry of 1/2 (for chess). Constant-sum game would have been better.
class: middle
Tic-Tac-Toe game tree
.width-80[]
.question[What is an optimal strategy (or perfect play)? How do we find it?]
Assumptions
We assume a deterministic, turn-taking, two-playerzero-sum game with perfect information.
e.g., Tic-Tac-Toe, Chess, Checkers, Go, etc.
We will call our two players MAX and MIN. MAX moves first.
The minimax value$\text{minimax}(s)$ is the largest achievable payoff (for MAX) from state $s$, assuming an optimal adversary (MIN).
.center.width-100[]
The optimal next move (for MAX) is to take the action that maximizes the minimax value in the resulting state.
Assuming that MIN is an optimal adversary that maximizes the worst-case outcome for MAX.
This is equivalent to not making an assumption about the strength of the opponent.
???
Blackboard.
class: middle
.width-100[]
class: middle
Properties of Minimax
Completeness:
Yes, if tree is finite.
Optimality:
Yes, if MIN is an optimal opponent.
What if MIN is suboptimal?
Show that MAX will do even better.
What if MIN is suboptimal and predictable?
Other strategies might do better than Minimax. However they may do worse on an optimal opponent.
class: middle
Minimax efficiency
Assume $\text{minimax}(s)$ is implemented using its recursive definition.
How efficient is minimax?
Time complexity: same as DFS, i.e., $O(b^m)$.
Space complexity:
$O(bm)$, if all actions are generated at once, or
$O(m)$, if actions are generated one at a time.
.question[Do we need to explore the whole game tree?]
Pruning
.center.width-70[]
.width-100[]
Therefore, it is possible to compute the correct minimax decision without looking at every node in the tree.
class: middle
.center.width-80[]
class: middle
.grid[
.kol-2-3[
We want to compute $v = \text{minimax}(n)$, for $\text{player(n)}$=MIN.
We loop over $n$'s children.
The minimax values are being computed one at a time and $v$ is updated iteratively.
Let $\alpha$ be the best value (i.e., the highest) at any choice point along the path for MAX.
If $v$ becomes lower than $\alpha$, then $n$ will never be reached in actual play.
Therefore, we can stop iterating over the remaining $n$'s other children.
]
.kol-1-3[
.center.width-100[]]
]
???
Go back to the previous slide and the transition from (d) to (e).
If the minimax value $v$ for MIN becomes lower than the best value $\alpha$ for MAX, then $n$ will never be reached.
class: middle
Similarly, $\beta$ is defined as the best value (i.e., lowest) at any choice point along the path for MIN. We can halt the expansion of a MAX node as soon as $v$ becomes larger than $\beta$.
$\alpha$-$\beta$ pruning
Updates the values of $\alpha$ and $\beta$ as the path is expanded.
Prune the remaining branches (i.e., terminate the recursive calls) as soon as the value of the current node is known to be worse than the current $\alpha$ or $\beta$ value for MAX or MIN, respectively.
???
If the minimax value $v$ for MAX becomes larger the best value $\beta$ for MIN, then $n$ will never be reached.
$\alpha$-$\beta$ search
.width-90[]
???
Note that MAX plays first, hence the first call to MAX-VALUE in the main function.
class: middle
Properties of $\alpha$-$\beta$ search
Pruning has no effect on the minimax values. Therefore, completeness and optimality are preserved from Minimax.
Time complexity:
The effectiveness depends on the order in which the states are examined.
If states could be examined in perfect order, then $\alpha-\beta$ search examines only $O(b^{m/2})$ nodes to pick the best move, vs. $O(b^m)$ for minimax.
$\alpha-\beta$ can solve a tree twice as deep as minimax can in the same amount of time.
Equivalent to an effective branching factor $\sqrt{b}$.
Space complexity: $O(m)$, as for Minimax.
Game tree size
.center.width-30[]
Chess:
$b \approx 35$ (approximate average branching factor)
$d \approx 100$ (depth of a game tree for typical games)
$b^d \approx 35^{100} \approx 10^{154}$.
For $\alpha-\beta$ search and perfect ordering, we get $b^{d/2} \approx 35^{50} = 10^{77}$.
Finding the exact solution with Minimax remains intractable.
Transposition table
Repeated states occur frequently because of transpositions: distinct permutations of the move sequence end in a same position.
Similarly to the closed set in Graph-Search (Lecture 2), it is worth storing the evaluation of a state such that further occurrences of the state do not have to be recomputed.
.question[What data structure should be used to efficiently store and look-up values of positions?]
Imperfect real-time decisions
Under time constraints, searching for the exact solution is not feasible in most realistic games.
Solution: cut the search earlier.
Replace the $\text{utility}(s)$ function with a heuristic evaluation function$\text{eval}(s)$ that estimates the state utility.
Replace the terminal test by a cutoff test that decides when to stop expanding a state.
.center.width-100[]
.question[Can $\alpha-\beta$ search be adapted to implement H-Minimax?]
???
Yes.
Replace the if-statements with the terminal test with if-statements with the cutoff test.
class: middle
Evaluation functions
An evaluation function $\text{eval}(s)$ returns an estimate of the expected utility of the game from a given position $s$.
The computation must be short (that is the whole point to search faster).
Ideally, the evaluation should order states in the same way as in Minimax.
The evaluation values may be different from the true minimax values, as long as order is preserved.
In non-terminal states, the evaluation function should be strongly correlated with the actual chances of winning.
???
Like for heuristics in search, evaluation functions can be learned using machine learning algorithms.
class: middle
Quiescence
.center.width-70[]
These states only differ in the position of the rook at lower right.
However, Black has advantage in (a), but not in (b).
If the search stops in (b), Black will not see that White's next move is to capture its Queen, gaining advantage.
Cutoff should only be applied to positions that are quiescent.
i.e., states that are unlikely to exhibit wild swings in value in the near future.
The horizon effect
Evaluations functions are always imperfect.
If not looked deep enough, bad moves may appear as good moves (as estimated by the evaluation function) because their consequences are hidden beyond the search horizon.
and vice-versa!
Often, the deeper in the tree the evaluation function is buried, the less the quality of the evaluation function matters.