Combining Tree Search with Value Prediction Models Capable of Estimating Their Uncertainty

Maciej Świechowski

The paper: [Paper.pdf] [Link to IEEE Xplore]

The details are in the paper, which has been accepted to the IEEE Transactions on Games journal, but due to the publication queue, it will receive volume and page numbers not earlier than December 2025.

I will update the bibliography data accordingly. The current is:

@article{swiechowski2025combining,

title={{Combining Tree Search With Value Prediction Models Capable of Estimating Their Uncertainty}},

author={{\'S}wiechowski, Maciej},

journal={IEEE Transactions on Games},

year={2025},

pages={1-11},

doi={10.1109/TG.2025.3561016},

note={(early access)},

publisher={IEEE}}

 

I want to thank the anonymous reviewers of this paper for their numerous valuable suggestions for improvement, particularly those concerning the experimental verification.

The idea: prerequisites

 

The approach utilizes a tree search algorithm -- specifically, the paper examines Monte Carlo Tree Search (MCTS) in conjunction with a model that satisfies two conditions:

1.              The model is a value prediction model. It predicts game results (terminal scores) as the tree search algorithm would if computational power was not an issue.

- the predicted scores must be in the same domain as the actual game outcomes.

2.      The model is capable of estimating the uncertainty of its prediction. In other words: how confident the model is about the predicted game outcome.

As long as such a model is available or can be developed (e.g., trained), the proposed approach can be applied.

 

If you want to learn more about Monte Carlo Tree Search, I recommend the survey papers:

1.      From 2022: https://www.cyberiada.eu/mcts_survey.html (Link to Springer Nature)

2.      From 2012: Link to IEEE Xplore

The idea: algorithm

 

The method assumes using a model in a uniform fashion - depending on a dynamic test.

First, set a threshold for uncertainty estimation. If the uncertainty value in a given state is lower than the threshold, the model is considered sufficiently certain and can be relied upon.

Now, at each step of the search algorithm perform a check:

model_uncertainty(current_state) ≤ threshold

If this condition is satisfied, immediately terminate the current iteration and return the score predicted by the model:

score = model.predict()

Instead of uncertainty, we can use the opposite term of 'certainty' or 'confidence' and then invert the condition to exceed the threshold.


This applies both to the Selection and Simulation (Monte Carlo) phases of the algorithm.

If this occurs during the Selection phase, the node is not expanded further unless the actual game follows its path (in which case it becomes the new root node).

A screenshot of a computer

AI-generated content may be incorrect.

 

Information with solid fillThis procedure results in MCTS iterations of highly varying lengths.

Consequently, it is preferable to control the simulation budget with finer granularity, using steps (forward model calls) or allocated time, rather than the number of iterations.

 

A diagram of a tree

AI-generated content may be incorrect.

The prediction model is applied conditionally, based on the model's uncertainty in a given state. The left subtree illustrates an example of its use during the selection phase of MCTS. In the middle tree, the model is not used at all, as all uncertainty checks fell below the required threshold, resulting in the score being propagated from the natural terminal state. In the right subtree, the first state in which the model was sufficiently confident was encountered during the simulation (Monte Carlo) phase.

Setting the uncertainty threshold

 

Setting the uncertainty estimation threshold is highly specific to the game (or problem, in general) in question.

The paper places significant emphasis on analyzing various thresholds and their implications, based on the actual quality of the models.

For example:

-          what happens if a high threshold is set, causing the model to be deemed unreliable in many states? Does the search algorithm still benefit from it?

-          what if a low threshold is set while the uncertainty estimation is inaccurate, leading to numerous "false positive" estimations? Does this negatively impact performance?

The goal of this research was not only to propose a new method but also to thoroughly analyze multiple scenarios, as illustrated above. Specifically, it examines various combinations of uncertainty estimation quality, true model quality, and the chosen threshold.

Selected findings of the paper

 

v Whether the proposed approach improves the standard (vanilla) MCTS depends on several factors:

o   The strength of the baseline MCTS.

o   The accuracy of the predictive model.

o   The frequency of the predictive model's use == how often the model indicates that uncertainty is below the established threshold.

o   The precision of the uncertainty estimation. In particular, if there are many false positives?

  by false positives, we understand states, in which the model is certain, but its output is inaccurate.

 

v It is preferable to rely on search than using inaccurate models.

False positive uncertainty estimations are more detrimental than false negatives.

Employing models less frequently but ensuring their use in scenarios where they provide high accuracy is more beneficial than frequent use of less accurate models.

To prevent the model from being utilized too often, one can establish a high confidence threshold.

 

v Models with a prediction accuracy of 0.95 or higher are considered safe to use.

We have not encountered any scenario in which employing a model with an accuracy of 0.95 or higher did not lead to an improvement.

While weaker models might also be sufficient, their effectiveness is scenario-dependent and requires experimentation.

v Weaker players benefit more from broad selection of prediction models (even less accurate ones).

Especially those that perform fewer iterations per move.

This trend has been consistently observed across all experiments.

 

v The method requires much fewer iterations to be a competent player.
By implementing the proposed approach, even with a smaller computational budget (e.g., 100 iterations), we can achieve the performance level of a standard MCTS that requires a substantially larger computational budget (e.g., 1000 iterations).

It is like a shortcut to performance!

v Increasing the number of MCTS iterations from 100 to 1000 has a more substantial impact on the score than increasing from 1000 to 10000

 

v In most cases (unless the accuracy is > 0.95) it is advantageous to combine tree-search with prediction models.

If models are not perfectly accurate, the optimal frequency of model use, which depends on accuracy, becomes a non-monotonic function.

The optimal frequency falls within the (0%, 100%) interval but is not located at its extremes (unless the accuracy is at extremes: 0 or 1).

For instance, in our first experiment, with an accuracy of 0.95, the optimal frequency is 0.8. As accuracy diminishes, the optimal frequency also decreases, thereby allowing the search process to correct some of the model's errors.

 

v Achieving high absolute accuracy is typically very challenging.
whereas, it is more feasible to train a model that frequently signals a lack of confidence yet provides highly accurate predictions when it confidently asserts.

What matters most is the conditional accuracy of the model when its confidence exceeds the predetermined threshold.


v The method has the potential to become the state-of-the-art tree search approach.

It is generally stronger than the pure search-based approach (vanilla MCTS) especially with lower simulation budgets.

It is generally stronger than an AlphaZero-inspired evaluation method especially with weaker prediction models.

 

v The method can be used with various types of prediction models and different ideas for uncertainty estimation.

For instance, our experiments included:

o   Synthetic models for function optimization based on perturbing the optimal value with noise with magnitude inversely proportional to model accuracy.

o   Stockfish for Chess, where the advantage of any player serves as the predictor for certainty.

o   A Random Forest model trained for Othello using data from games of the top-1000 players on the e-Othello platform.

For uncertainty estimation, we employed the so-called prediction intervals method, specifically adapted for Random Forests.

 

v The paper also presents further findings concerning performance in specific problems: optimizing functions, Chess and Othello.

 

Empirical Results

In the paper, you will find experimental results that support the findings.

The following methods were tested:

       The proposed method with various parameterizations.

       Baseline vanilla MCTS.

       An AlphaZero-inspired evaluation method:

o   This is an MCTS-based player that employs a simplified AlphaZero-inspired leaf evaluation approach. When a node is first created, the player backpropagates the model s prediction for that node. On subsequent visits to the same node, it ignores the value function prediction, instead selecting a child node and traversing to it as in the standard MCTS algorithm.

o   I express my gratitude to the reviewers for suggesting this approach.

Three large-scale experiments were conducted:

1.      Experiments in a single-player game of function optimization.

o   The goal is to find the function s maximum.

o   Actions are subdivisions of the domain.

o   Three different functions are used.

This experiment verifies fundamental properties of the method. For example, how combinations of the frequency of model use and its true accuracy perform.

A screenshot of a computer

AI-generated content may be incorrect.

Here are the normalized and averaged results across all functions. The proposed method is parameterized with model's frequency of estimated certainty (how often it is used to terminate iterations) and model's prediction accuracy -- and the AlphaZero-inspired method (abbreviated as AZ-Ins-LE in the last rows). Each table corresponds to a different simulation budget for MCTS. The scores are aggregated across the three functions and normalized to the range [0,1]. The first row (probability of use equal to 0) serves as the baseline variant of the MCTS algorithm. The colors light turquoise, dark blue, and gold are used to highlight instances where the proposed method surpasses the baseline MCTS, AZ-Ins-LE, or both, respectively.

 

 

2.      Experiments in Chess:

o   Stockfish was used as the prediction model. Although it can function as a standalone player, only its prediction capabilities were utilized in this research.

-         there is a time parameter (40ms vs. 200ms) that controls the strength of the model

o   Uncertainty is based on the magnitude of the advantage for a given player (thus, it does not account for the uncertainty of draws).

-         there is an advantage parameter that controls how often the model is used

o   Refer to the paper for additional details about this setup.

A table with numbers and letters

AI-generated content may be incorrect.

The results for chess. Each cell contains an average [0,1] (where 0.5 denotes equal performance) of the proposed method against four comparative approaches. Calculations for 200 repeats, starting positions switched after 100

 

 

3.      Experiments in Othello (Reversi):

o   A custom Random Forest model was trained to serve as the prediction model.

-   there is a training dataset size parameter that controls the strength of the model

o   Uncertainty is derived from prediction intervals.

-   there is a prediction interval length parameter that controls how often the model is used

o   Refer to the paper for additional details about this setup.

A table with numbers and a few words

AI-generated content may be incorrect.

The results for Othello. Each cell contains an average score [0,1] (where 0.5 denotes equal performance) of the proposed method against four comparative approaches. Calculations for 200 repeats, starting positions switched after 100.

 

Established methods of combining tree search with (heuristic) evaluation models

 

Combining tree search with heuristic evaluation dates back to the first checkers programs.

Here are examples of integrating external models with the MCTS algorithm (specifically). Bibliographic references supporting each example, along with additional details, are included in the paper.

In the Monte Carlo phase:

       Early termination

       Complete replacement

       Epsilon-greedy

       Altering policy distribution

       Action pruning

In the Selection phase:

       Weighted evaluation

       Weighted exploration (alteration to the UCB/UCT algorithm)

       Progressive widening

       Move sorting

In addition, in the paper, we discuss related methods of utilizing uncertainty within the MCTS algorithm.