Combining
Tree Search with Value Prediction Models Capable of Estimating Their
Uncertainty
Maciej Świechowski
The paper: [Paper.pdf] [Link to IEEE Xplore]
The details are in the paper, which has been accepted to the IEEE Transactions on Games journal, but due to the publication queue, it will receive volume and page numbers not earlier than December 2025.
I will update the bibliography data accordingly. The current is:
|
@article{swiechowski2025combining, title={{Combining Tree Search With Value Prediction Models Capable of Estimating Their Uncertainty}}, author={{\'S}wiechowski, Maciej}, journal={IEEE Transactions on Games}, year={2025}, pages={1-11}, doi={10.1109/TG.2025.3561016}, note={(early access)}, publisher={IEEE}} |
I want to thank the anonymous reviewers of this paper for their numerous valuable suggestions for improvement, particularly those concerning the experimental verification.
The approach
utilizes a tree search algorithm -- specifically, the paper examines Monte
Carlo Tree Search (MCTS) in conjunction with a model that satisfies two
conditions:
1.
The
model is a value prediction model. It predicts game results (terminal
scores) as the tree search algorithm would if computational power was not an
issue.
- the predicted
scores must be in the same domain as the actual game outcomes.
2. The model is capable
of estimating the uncertainty of its prediction. In other words:
how confident the model is about the predicted game outcome.

As long as
such a model is available or can be developed (e.g., trained), the proposed
approach can be applied.
If you want
to learn more about Monte Carlo Tree Search, I recommend the survey papers:
1. From 2022: https://www.cyberiada.eu/mcts_survey.html
(Link to
Springer Nature)
2. From 2012: Link to IEEE
Xplore
The
method assumes using a model in a uniform fashion - depending on a dynamic
test.
First, set a
threshold for uncertainty estimation. If the uncertainty value in a given state
is lower than the threshold, the model is considered sufficiently certain and can be
relied upon.
Now, at each
step of the search algorithm perform a check:
model_uncertainty(current_state) ≤ threshold
If this
condition is satisfied, immediately terminate the current iteration and return
the score predicted by the model:
score = model.predict()
Instead of uncertainty, we can use the opposite term of 'certainty' or 'confidence' and then invert the condition to exceed the threshold.
This applies
both to the Selection and Simulation (Monte Carlo) phases of the algorithm.
If this
occurs during the Selection phase, the node is not expanded further unless the
actual game follows its path (in which case it becomes the new root node).

This procedure results in MCTS
iterations of highly varying lengths.
Consequently,
it is preferable to control the simulation budget with finer granularity, using
steps (forward model calls) or allocated time, rather than the number of
iterations.

The
prediction model is applied conditionally, based on the model's uncertainty in
a given state. The left subtree illustrates an example of its use during the
selection phase of MCTS. In the middle tree, the model is not used at all, as
all uncertainty checks fell below the required threshold, resulting in the
score being propagated from the natural terminal state. In the right subtree,
the first state in which the model was sufficiently confident was encountered
during the simulation (Monte Carlo) phase.
Setting the uncertainty
estimation threshold is highly specific to the game (or problem, in general) in
question.
The paper
places significant emphasis on analyzing various thresholds and their
implications, based on the actual quality of the models.
For example:
-
what happens if a high
threshold is set, causing the model to be deemed unreliable in many states?
Does the search algorithm still benefit from it?
-
what if a low threshold
is set while the uncertainty estimation is inaccurate, leading to numerous
"false positive" estimations? Does this negatively impact
performance?
The goal of
this research was not only to propose a new method but also to thoroughly
analyze multiple scenarios, as illustrated above. Specifically, it examines
various combinations of uncertainty estimation quality, true model
quality, and the chosen threshold.
v Whether the proposed approach
improves the standard (vanilla) MCTS depends on several factors:
o
The
strength of the baseline MCTS.
o
The
accuracy of the predictive model.
o
The
frequency of the predictive model's use == how often the model indicates that
uncertainty is below the established threshold.
o
The
precision of the uncertainty estimation. In particular, if there are many false
positives?
by false positives, we understand
states, in which the model is certain, but its output is inaccurate.
v It is preferable to rely on search
than using inaccurate models.
False
positive uncertainty estimations are more detrimental than false negatives.
Employing
models less frequently but ensuring their use in scenarios where they provide
high accuracy is more beneficial than frequent use of less accurate models.
To
prevent the model from being utilized too often, one can establish a high
confidence threshold.
v Models with a prediction accuracy of
0.95 or higher are considered safe to use.
We
have not encountered any scenario in which employing a model with an accuracy
of 0.95 or higher did not lead to an improvement.
While
weaker models might also be sufficient, their effectiveness is
scenario-dependent and requires experimentation.
v Weaker players benefit more from
broad selection of prediction models (even less accurate ones).
Especially
those that perform fewer iterations per move.
This
trend has been consistently observed across all experiments.
v The method requires much fewer
iterations to be a competent player.
By implementing the proposed approach, even with a smaller computational budget
(e.g., 100 iterations), we can achieve the performance level of a standard MCTS
that requires a substantially larger computational budget (e.g., 1000
iterations).
It is
like a shortcut to performance!
v Increasing the number of MCTS iterations from 100 to
1000 has a more substantial impact on the score than increasing from 1000 to
10000
v In most cases (unless the accuracy is
> 0.95) it is advantageous to combine tree-search with prediction models.
If
models are not perfectly accurate, the optimal frequency of model use, which
depends on accuracy, becomes a non-monotonic function.
The
optimal frequency falls within the (0%, 100%) interval but is not located at
its extremes (unless the accuracy is at extremes: 0 or 1).
For
instance, in our first experiment, with an accuracy of 0.95, the optimal
frequency is 0.8. As accuracy diminishes, the optimal frequency also decreases,
thereby allowing the search process to correct some of the model's errors.
v Achieving high absolute accuracy is
typically very challenging.
whereas, it is more feasible to train a model that frequently signals a lack of
confidence yet provides highly accurate predictions when it confidently
asserts.
What
matters most is the conditional accuracy of the model when its confidence
exceeds the predetermined threshold.
v The method has the potential to
become the state-of-the-art tree search approach.
It is
generally stronger than the pure search-based approach (vanilla MCTS)
especially with lower simulation budgets.
It is
generally stronger than an AlphaZero-inspired evaluation method especially with
weaker prediction models.
v The method can be used with various
types of prediction models and different ideas for uncertainty estimation.
For instance, our experiments included:
o
Synthetic
models for function optimization based on perturbing the optimal value with
noise with magnitude inversely proportional to model accuracy.
o
Stockfish
for Chess, where the advantage of any player serves as the predictor for
certainty.
o
A
Random Forest model trained for Othello using data from games of the top-1000
players on the e-Othello platform.
For uncertainty estimation, we employed the so-called
prediction intervals method, specifically adapted for Random Forests.
v The paper also presents further
findings concerning performance in specific problems: optimizing functions,
Chess and Othello.
In the
paper, you will find experimental results that support the findings.
The
following methods were tested:
The proposed method with
various parameterizations.
Baseline vanilla MCTS.
An AlphaZero-inspired
evaluation method:
o
This is an MCTS-based
player that employs a simplified AlphaZero-inspired leaf evaluation approach.
When a node is first created, the player backpropagates the model
s prediction for that node. On subsequent visits to the same node, it
ignores the value function prediction, instead selecting a child node and
traversing to it as in the standard MCTS algorithm.
o
I express my gratitude
to the reviewers for suggesting this approach.
Three
large-scale experiments were conducted:
1.
Experiments in a
single-player game of function optimization.
o
The goal is to find the function s
maximum.
o
Actions are subdivisions
of the domain.
o
Three different
functions are used.
This experiment verifies fundamental properties of the
method. For example, how combinations of the frequency of model use and its
true accuracy perform.

Here are the normalized and averaged
results across all functions. The proposed method is parameterized with model's
frequency of estimated certainty (how often it is used to terminate iterations)
and model's prediction accuracy -- and the AlphaZero-inspired method
(abbreviated as AZ-Ins-LE in the last rows). Each table corresponds to a
different simulation budget for MCTS. The scores are aggregated across the
three functions and normalized to the range [0,1]. The first row (probability
of use equal to 0) serves as the baseline variant of the MCTS algorithm. The
colors light turquoise, dark blue, and gold are used to highlight instances
where the proposed method surpasses the baseline MCTS, AZ-Ins-LE, or both,
respectively.
2.
Experiments in
Chess:
o
Stockfish was used as
the prediction model. Although it can function as a standalone player, only its
prediction capabilities were utilized in this research.
-
there
is a time parameter (40ms vs. 200ms) that controls the strength of the model
o
Uncertainty
is based on the magnitude of the advantage for a given player (thus, it does
not account for the uncertainty of draws).
-
there
is an advantage parameter that controls how often the model is used
o
Refer to the paper for
additional details about this setup.

The results for chess. Each cell
contains an average [0,1] (where 0.5 denotes equal performance) of the proposed
method against four comparative approaches. Calculations for 200 repeats,
starting positions switched after 100
3.
Experiments in
Othello (Reversi):
o A custom Random Forest
model was trained to serve as the prediction model.
-
there
is a training dataset size parameter that controls the strength of the model
o
Uncertainty is derived
from prediction intervals.
-
there
is a prediction interval length parameter that controls how often the model is
used
o
Refer to the paper for
additional details about this setup.

The results for Othello. Each cell
contains an average score [0,1] (where 0.5 denotes equal performance) of the
proposed method against four comparative approaches. Calculations for 200
repeats, starting positions switched after 100.
Combining
tree search with heuristic evaluation dates back to the first checkers
programs.
Here are
examples of integrating external models with the MCTS algorithm (specifically).
Bibliographic references supporting each example, along with additional
details, are included in the paper.
In the
Monte Carlo phase:
Early termination
Complete replacement
Epsilon-greedy
Altering policy
distribution
Action pruning
In the
Selection phase:
Weighted evaluation
Weighted exploration
(alteration to the UCB/UCT algorithm)
Progressive widening
Move sorting
In addition,
in the paper, we discuss related methods of utilizing uncertainty within the
MCTS algorithm.