Shubham's Blog

Week 3 of SoK 2025

This week, I focused on integrating the Monte Carlo Tree Search (MCTS) algorithm into the MankalaEngine. The primary goal was to test the performance of the MCTS-based agent against various existing algorithms in the engine. Let's dive into what MCTS is, how it works, and what I discovered during the testing phase.

What is Monte Carlo Tree Search (MCTS)?

The Monte Carlo Tree Search (MCTS) is a heuristic search algorithm used for decision-making in sequential decision problems. It incrementally builds a search tree and simulates multiple random moves at each step to evaluate potential outcomes. These simulations help the algorithm determine the most promising move to make.

How Does MCTS Work?

MCTS operates through four key steps:

1. Selection

The algorithm starts at the root node (representing the current game state) and traverses down the tree to a leaf node (an unexplored state). During this process, the algorithm selects child nodes using a specific strategy.

A popular strategy for node selection is Upper Confidence Bounds for Trees (UCT). The UCT formula helps balance exploration and exploitation by selecting nodes based on the following equation:

UCT = mean + C × sqrt(ln(N) / n)

Where:

2. Expansion

Once the algorithm reaches a leaf node, it expands the tree by adding one or more child nodes representing potential moves or decisions that can be made from the current state.

3. Simulation

The algorithm then performs a simulation or rollout from the newly added child node. During this phase, the algorithm randomly plays out a series of moves (typically following a simple strategy) until the game reaches a terminal state (i.e., win, loss, or draw).

This is where the Monte Carlo aspect of MCTS shines. By simulating many random games, the algorithm gains insights into the likely outcomes of different actions.

4. Backpropagation

After the simulation ends, the results are propagated back up the tree, updating the nodes with the outcome of the simulation. This allows the algorithm to adjust the expected rewards of the parent nodes based on the result of the child node’s simulation.

read more here - MCTS

Implementing MCTS in C++ for MankalaEngine

With a solid understanding of the algorithm, I began implementing MCTS in C++. The initial step involved integrating the MCTS logic into the benchmark utility of the MankalaEngine. After resolving a series of issues and running multiple tests, the code was functioning as expected.

Testing Results

I compared the performance of the MCTS agent against other existing agents in the MankalaEngine, such as Minimax, MTDF, and Random agents. Here’s how the MCTS agent performed:

Random Agent (Player 1) vs. MCTS (Player 2)

MCTS (Player 1) vs. Random Agent (Player 2)

MCTS vs. Minimax & MTDF

Key Improvements for MCTS

While MCTS performed well against the Random Agent, there is still room for improvement, especially in its simulation phase. Currently, the algorithm uses a random policy for simulations, which can be inefficient. To improve performance, we can:

Upcoming Tasks

In the upcoming week, I plan to: