Submitted:
01 May 2026
Posted:
06 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Background and Conceptual Foundations
2.1. Decision Quality in Gaming Contexts
2.2. Optimality and Its Variants in Games
2.3. Robustness in Gaming Search Algorithms
2.4. Evaluation Metrics as Conceptual Anchors
2.5. Evaluation Metrics as Conceptual Anchors
3. Gaming Specific Taxonomy and Design Space
3.1. Design Space Axes for Gaming Search Algorithms
3.1.1. Interaction Topology
- Single-agent planning/search: The algorithm optimizes decisions against a static environment or implicit dynamics, as in pathfinding and navigation tasks [23,25,26,27]. Decision quality is measured through solution quality and efficiency, while optimality is often defined in terms of shortest paths or minimal cost.
- Multi-agent (cooperative or competitive) interaction: The algorithm operates in environments with multiple interacting agents, such as RTS games, where coordination, partial observability, and real-time constraints dominate [17,21,28]. Here, decision quality emphasizes collective effectiveness and adaptability, while optimality is usually approximate or implicit.
3.1.2. Information and Uncertainty Structure
3.2. Computational and Temporal Regime
- Training time plus play time regimes: Learning augmented systems invest significant computation offline or online during training, then combine fast inference with limited search during gameplay [5,20,22]. Optimality is evaluated empirically rather than analytically, and robustness depends on generalization beyond training conditions.
3.3. Source of Search Guidance
3.4. Practical Clusters of Gaming Search Algorithms
3.4.1. Spatial Pathfinding and Navigation Search
3.4.2. Adversarial Game Tree Search
3.4.3. Monte Carlo Tree Search and Bandit Planning
3.4.4. Optimization and Metaheuristic Search (Including Rolling Horizon)
3.4.5. Learning Augmented Search and Neural Planning
3.4.6. Uncertainty, Partial Observability, and Search Games
3.5. Multi Label Classification and Hybridization
3.6. Role of the Taxonomy in the Survey
4. Historical Evolution of Decision Quality, Optimality, and Robustness
4.1. Analytic Foundations and Early Search Theory (1940s–1960s)
4.2. The Emergence of Heuristic and Adversarial Search (1970s–1990s)
4.3. Scaling, Engineering, and Practical Dominance (1990s–Early 2000s)
4.4. Sampling Based Planning and Statistical Decision Quality (mid-2000s–2010s)
4.5. Learning Augmented Search and Empirical Optimality (2010s–Present)
4.6. Persistent Uncertainty Centric Traditions
4.7. Synthesis
5. Cluster by Cluster Deep Analysis
5.1. Cluster 1: Spatial Pathfinding and Navigation Search
5.1.1. Foundations and Decision Quality
5.1.2. Optimality: From Exact Guarantees to Approximate Solutions
5.1.3. Robustness Under Dynamic and Real-Time Constraints
5.1.4. Evolution Within the Cluster
5.1.5. Comparative Insights and Trade Offs
5.2. Cluster 2: Adversarial Game Tree Search
5.2.1. Foundations and Decision Quality in Adversarial Settings
5.2.2. Optimality and the Minimax Paradigm
5.2.3. Robustness Through Worst Case Reasoning
5.2.4. Scaling Limits and the Transition to Engineering Solutions
5.2.5. Evolution Within the Cluster
5.2.6. Comparative Insights and Trade Offs
5.3. Cluster 3: Monte Carlo Tree Search and Bandit Planning
5.3.1. Decision Quality Through Sampling and Statistical Estimation
5.3.2. Optimality: Asymptotic Guarantees and Practical Approximation
5.3.3. Robustness to Uncertainty, Opponents, and Computation
5.3.4. Evolution Within the Cluster
5.3.5. Comparative Insights and Trade Offs
5.4. Cluster 4: Optimization and Metaheuristic Search (Including Rolling Horizon)
5.4.1. Decision Quality as Objective Driven Search
5.4.2. Optimality: Approximation, Practicality, and the Limits of Guarantees
5.4.3. Robustness Through Stochasticity, Diversity, and Graceful Degradation
- 1.
- Stochastic exploration: Random mutation, recombination, and probabilistic acceptance mechanisms reduce sensitivity to local minima and noisy evaluations, which are prevalent in simulation driven gaming settings.
- 2.
- Population diversity: Maintaining multiple candidate solutions enhances robustness to deceptive reward landscapes and supports contingency planning when the environment changes or when evaluation is uncertain.
- 3.
5.4.4. Evolution Within the Cluster
5.4.5. Comparative Insights and Trade Offs
- Decision quality vs. interpretability: Metaheuristics can find strong decisions without explicit modeling assumptions, but the resulting strategies may be harder to interpret or justify compared to heuristic minimax or A* style methods.
- Optimality vs. flexibility: These methods rarely offer formal optimality guarantees; however, they adapt easily to novel objective functions, constraints, and game mechanics—an important advantage in diverse game genres.
- Robustness vs. sample efficiency: Population based and stochastic exploration can be robust to noise and deceptive landscapes, but may require many evaluations to reach high quality decisions, making compute efficiency a critical limitation under tight real-time budgets.
5.5. Cluster 5: Learning Augmented Search and Neural Planning
5.5.1. Decision Quality: From Handcrafted Evaluation to Learned Priors
5.5.2. Optimality: Empirical Dominance Versus Formal Guarantees
- Conditional optimality relative to learned components: Given a learned policy/value/model, the search procedure may be optimal with respect to the induced surrogate objective, but not necessarily optimal in the underlying game.
- Asymptotic or limit optimality replaced by scaling laws: Improvements are often tied to compute scaling (more training, larger models, deeper search), rather than to convergence proofs.
5.5.3. Robustness: Generalization, Distribution Shift, and Uncertainty
- Distribution shift and overfitting: A learned policy or model can be highly effective in distribution yet degrade under scenario changes, rule variations, or different opponent styles.
- Model bias and compounding error: Learned model planning can suffer when model inaccuracies systematically distort search, potentially producing confident but incorrect decisions [20].
- Partial observability sensitivity: In RTS environments with fog of war, robustness requires maintaining performance under hidden information and incomplete state estimation. Map feature learning in partially observable RTS settings highlights this challenge directly, indicating that robust decision making depends on representations that remain informative under occlusion and uncertainty [17].
- Scale brittleness: Robustness in modern RTS settings includes scaling across map sizes and scenario configurations. Approaches explicitly focused on scale flexibility underscore that robustness is not merely about noise tolerance but about maintaining competence under structural changes in game instances [21]. Survey analyses of RTS deep reinforcement learning emphasize that evaluation practices vary widely and that robustness claims often depend heavily on experimental protocol and environment configuration [35].
5.5.4. Evolution Within the Cluster
- 1.
- Learned evaluation and action priors integrated with search: Neural policy/value guidance combined with tree search elevated decision quality in Go by targeting search effort effectively [5].
- 2.
- Self play as a general mechanism for producing strategic priors: Removal of human data dependencies established self play as a scalable pathway to strong decision quality and broad opponent robustness within a domain [22].
- 3.
- Learned dynamics models enabling planning beyond explicit simulators: Planning with learned models extended the search paradigm to settings where dynamics are not directly encoded or where simulation is expensive, changing the role of search from tree expansion over known transitions to planning over learned latent dynamics [20].
- 4.
- 5.
- Mastery of complex imperfect information games through model free MARL (Stratego/DeepNash) [57].
- 6.
- The emergence of search free grandmaster level play via large scale language modelling: A development that may herald a new phase in which the role of explicit search is fundamentally reconsidered [34].
5.5.5. Comparative Insights and Trade Offs
- Decision quality vs. compute and data dependence: The highest decision quality in this cluster often relies on substantial training compute and data generation through self play or large scale interaction [20,22]. This contrasts with classical search methods whose performance depends more directly on runtime search budgets than on training pipelines.
- Robustness to strategic diversity vs. brittleness to distribution shift: Self play and learned guidance can enhance robustness to opponent diversity, yet systems may remain brittle under shifts in environment structure, partial observability, or scale changes unless explicitly designed and evaluated for these regimes [17,21].
- Guidance strength vs. failure severity: When learned priors are accurate, they dramatically improve search efficiency; when incorrect, they can misdirect search systematically, potentially reducing robustness below that of less informed but more conservative methods [20].
5.6. Cluster 6: Uncertainty, Partial Observability, and Search Games
5.6.1. Decision Quality Under Uncertainty: Belief Conditioned Effectiveness
5.6.2. Optimality: From Exact Solutions to Model Relative Optimal Policies
5.6.3. Robustness as a First Class Objective
5.6.4. Evolution Within the Cluster
- Modern partial observability games (2020s): Explicit partial observability game constructs such as PO-OSG extend the cluster into broader AI/game theoretic contexts where uncertainty, incentives, and belief based decision making are central [30].
5.6.5. Comparative Insights and Trade Offs
- Versus adversarial game tree search: Minimax provides robustness to a worst case opponent under perfect information, but search games address robustness to hidden state and uncertainty a different and often harder condition. The trade off is that formal minimax optimality is replaced by belief or model relative optimal policies [3,6,37].
- Versus learning augmented methods: learning can produce powerful priors for partially observable environments, but this cluster emphasizes that robustness requires principled handling of hidden information and evaluation across uncertainty distributions [2,38,57]. Modern partial observability formulations underscore that strong empirical performance alone does not resolve the conceptual difficulty of defining optimal behavior under partial observability [30].
6. Cross Cluster Comparative Analysis
6.1. Decision Quality Across Clusters: From Structural Guarantees to Empirical Strength
6.2. Optimality: From Exact Solutions to Conditional and Empirical Notions
6.3. Robustness: Expanding the Notion of Reliability
6.4. Hybridization and Migration of Ideas
6.5. Trade Offs and Design Frontiers
- Decision quality vs. Optimality: Exact guarantees often limit scalability, while empirical methods achieve higher quality in practice without proofs.
- Optimality vs. Robustness: Worst case guarantees can reduce adaptability, whereas robust empirical performance may sacrifice formal correctness.
- Robustness vs. Efficiency: Stochastic and population based methods improve resilience but may increase computational cost.
- Learning driven quality vs. Generalization robustness: Learned priors elevate performance but risk brittleness under distribution shift.
6.6. Synthesis
7. Evolution of Evaluation Methodologies in Gaming Search
7.1. Early Analytic and Correctness Oriented Evaluation
7.2. Empirical Performance and Competitive Benchmarks
7.3. Anytime Behavior and Real-Time Constraints
7.4. Statistical Evaluation and Sampling Based Metrics
7.5. Learning Centric and Distributional Evaluation
7.6. Evaluation in Uncertain and Search Game Domains
7.7. Synthesis and Open Challenges
8. Game Type Driven Insights
8.1. Board Games: Perfect Information and Strategic Optimality
8.2. Navigation and Spatial Games: Real-Time Optimality Under Environmental Dynamics
8.3. Real-Time Strategy Games: Multi Agent Complexity and Partial Observability
8.4. Stochastic and Imperfect Information Games: Belief Aware Decision Making
8.5. General and Multi Domain Games: Scalability and Transfer
8.6. Synthesis: How Game Type Shapes Algorithmic Priorities
9. Open Challenges and Future Research Directions
9.1. Reconciling Empirical Performance with Principled Optimality
- Resource bounded optimality frameworks that explicitly account for computation, uncertainty, and learning dynamics.
- Policy level optimality criteria for learned systems that connect empirical dominance to equilibrium concepts under realistic assumptions.
- Formal analysis of hybrid systems, where learned priors guide search but do not fully replace it, clarifying how guarantees degrade or persist under approximation.
9.2. Robustness Under Distribution Shift and Model Mismatch
- Robust generalization guarantees for learned policies and value functions in games with changing dynamics or information structures.
- Evaluation protocols that stress test robustness, moving beyond narrow benchmarks to scenario ensembles that reveal brittleness.
9.3. Unified Evaluation Methodologies Across Game Types
- Multi dimensional evaluation frameworks that jointly report quality, optimality approximation, robustness, and computational cost.
- Standardized yet flexible benchmarks that accommodate diverse game types while preserving comparability.
9.4. Scaling to Multi Agent, Real-Time, and Partially Observable Games
- Hierarchical and decomposed search learning hybrids that separate strategic, tactical, and operational decision layers.
- Coordination aware evaluation and training for multi agent decision quality beyond independent or loosely coupled policies.
- Belief space planning at scale, integrating partial observability directly into search and learning architectures.
9.5. Interpretable and Controllable Gaming Search Systems
- Interpretable search learning hybrids, where learned components provide guidance without obscuring decision rationale.
- Constraint aware and controllable learning, allowing designers to impose safety, style, or fairness constraints on decision making.
- Post hoc analysis tools that connect observed game play behavior to underlying search and learning dynamics.
9.6. Toward Unified Frameworks for Quality, Optimality, and Robustness
- Meta reasoning and adaptive computation allocation, where algorithms dynamically balance search depth, exploration, and robustness based on game context and uncertainty.
- Cross cluster synthesis, combining worst case reasoning from adversarial search, statistical robustness from MCTS, and inductive bias from learning augmented methods.
- Game aware algorithm selection and configuration, where the structure of the target game guides the prioritization of quality, optimality, and robustness.
9.7. Outlook
10. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| MCTS | Monte Carlo Tree Search |
| RTS | Real-Time Strategy |
| UCT | Upper Confidence bounds applied to Trees |
| FPS | First Person Shooter |
| HPA* | Hierarchical Pathfinding A* |
| LLM | Large Language Model |
| RL | Reinforcement Learning |
| MARL | Multi Agent Reinforcement Learning |
| PO-OSG | Partially Observable Off Switch Game |
| GPU | Graphics Processing Unit |
| FAB | (Washburn’s) Find-A-moving-target via Backward induction Algorithm |
| LP | Linear Programming |
| DP | Dynamic Programming |
| R2PS | Worst Case Robust Real-Time Pursuit Strategies |
| LLM-A* | Large Language Model enhanced A* |
Appendix A
| Title | Year | Cluster(s) | Game Type | Key Metrics (families) |
|---|---|---|---|---|
| Search and Screening [12] | 1946 | C6 | Stochastic / Imperfect information | Runtime; Search effort; Robustness / Detection |
| Programming a Computer for Playing Chess [3] | 1950 | C2 | Board games | Not specified |
| The theory of search III: the optimum distribution of searching effort [13] | 1957 | C6 | Stochastic / Imperfect information | Path cost / Optimality; Runtime; Robustness / Detection |
| A Note on Two Problems in Connexion with Graphs [23] | 1959 | C1, C2 | Navigation / Grid maps | Path cost / Optimality; Search effort |
| A Formal Basis for the Heuristic Determination of Minimum Cost Paths [25] | 1968 | C1, C2 | Navigation / Grid maps | Path cost / Optimality; Search effort |
| An Analysis of Alpha-Beta Pruning [6] | 1975 | C1, C2 | Board games | Path cost / Optimality; Search effort |
| Search for a Moving Target: The FAB Algorithm [14] | 1983 | C6 | Stochastic / Imperfect information | Search effort; Robustness / Detection |
| Depth-First Iterative-Deepening: An Optimal Admissible Tree Search [10] | 1985 | C1, C2 | Navigation / Grid maps | Path cost / Optimality; Search effort |
| The history heuristic and alpha-beta search enhancements in practice [29] | 1989 | C2 | Board games | Search effort |
| Real-Time Heuristic Search [18] | 1990 | C1 | Navigation / Grid maps | Runtime; Search effort |
| An optimal branch-and-bound procedure for the constrained path moving target search problem [36] | 1990 | C6 | Stochastic / Imperfect information | Path cost / Optimality; Search effort; Robustness / Detection |
| A linear programming approach to the search game on a network with a mobile hider [37] | 1992 | C6 | Stochastic / Imperfect information | Search effort; Robustness / Detection |
| Optimal and Efficient Path Planning for Partially-Known Environments [31] | 1994 | C1, C6 | Navigation / Grid maps | Path cost / Optimality; Runtime; Robustness / Detection |
| Best-first fixed-depth minimax algorithms [7] | 1995 | C2 | Board games | Search effort |
| Differential Games (Book) [59] | 1999 | C1, C2, C6 | Search / Pursuit Evasion | Path cost / Optimality; Runtime; Robustness / Detection |
| Deep Blue [4] | 2002 | C2 | Board games | Not specified |
| D* Lite [26] | 2002 | C1 | Navigation / Grid maps | Path cost / Optimality; Runtime; Search effort |
| Near Optimal Hierarchical Path-Finding [27] | 2004 | C1 | Navigation / Grid maps | Path cost / Optimality; Runtime |
| Three States and a Plan: The A.I. of F.E.A.R. [11] | 2005 | C1 | Navigation / Grid maps | Runtime; Robustness / Detection |
| A Two-Sided Optimization for Theater-Ballistic Missile Defense [61] | 2005 | C6 | Search / Pursuit Evasion | Runtime; Robustness / Detection |
| Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search [8] | 2006 | C3, C5 | Not specified | Search effort |
| Bandit Based Monte-Carlo Planning [9] | 2006 | C3, C5 | Not specified | Search effort |
| Online Graph Pruning for Pathfinding on Grid Maps [43] | 2011 | C1 | Navigation / Grid maps | Runtime; Search effort |
| Jump Point Search [24] | 2011 | C1 | Navigation / Grid maps | Runtime; Search effort |
| Information Set Monte Carlo Tree Search [15] | 2012 | C3, C6 | Stochastic / Imperfect information | Search effort; Robustness / Detection |
| A Survey of Monte Carlo Tree Search Methods [32] | 2012 | C3 | General | Not specified |
| Rolling Horizon Evolution versus Tree Search for Navigation in Single-Player Real-Time Games [19] | 2013 | C1, C4 | Navigation / Grid maps | Runtime; Search effort |
| SEARCH GAMES: LITERATURE AND SURVEY [16] | 2015 | C6 | Stochastic / Imperfect information | Robustness / Detection |
| A Search Allocation Game with Private Information of Initial Target Position [58] | 2015 | C6 | Stochastic / Imperfect information | Robustness / Detection |
| Simulation and Comparison of Efficiency in Pathfinding Algorithms in Games [42] | 2015 | C1 | Navigation / Grid maps | Runtime; Path cost / Optimality |
| Mastering the Game of Go with Deep Neural Networks and Tree Search [5] | 2016 | C3, C5 | Board games | Win rate / Elo; Search effort |
| Mastering the game of Go without human knowledge [22] | 2017 | C5, C3 | Board games | Win rate / Elo; Search effort; Robustness / Detection |
| A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play [1] | 2018 | C2, C3, C5 | Chess, Shogi, Go | Win rate / Elo; Search effort (simulations) |
| Guiding Monte Carlo Tree Search by Scripts in Real-Time Strategy Games [28] | 2013 | C3, C5 | RTS | Win rate / Elo; Runtime; Search effort |
| Superhuman AI for multiplayer poker [33] | 2019 | C3, C6 | Poker (Texas Hold’em) | Win rate / Elo; Expected value (bb/100) |
| Analysis of Statistical Forward Planning Methods in Pommerman [51] | 2019 | C3, C4 | Not specified | Score / Reward; Win rate |
| Action Guidance with MCTS for Deep Reinforcement Learning [50] | 2019 | C3, C5 | Atari (ALE) | Score / Reward |
| Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model [20] | 2020 | C5, C2, C3 | General / Multi-domain | Score / Reward; Search effort; Robustness / Detection |
| Rolling Horizon Evolutionary Algorithms for General Video Game Playing [52] | 2020 | C4 | Not specified | Score / Reward; Win rate |
| Deep Reinforcement Learning for General Game Playing [55] | 2020 | C3, C5 | Not specified | Win rate / Elo |
| A Review on Informed Search Algorithms for Video Games Pathfinding [41] | C1 | Navigation / Grid maps | Path length / path cost; Node expansions; Runtime | |
| Path Finding and Map Feature Learning in RTS Games with Partial Observability [17] | 2021 | C5, C1, C6 | RTS | Robustness / Detection; Runtime |
| Solving Partially Observable Stochastic Shortest-Path Games [39] | 2021 | C6 | General | Optimality gap; Convergence gurantees |
| Mastering Atari Games with Limited Data [54] | C3, C5 | Atari (ALE) | Score / Reward; Sample efficiency (100k interactions) | |
| A Systematic Review and Analysis of Intelligence-Based Pathfinding Algorithms in Video Games [44] | 2022 | C1, C4, C5 | General / Multi-domain | Not specified |
| Supervised and Reinforcement Learning from Observations in Reconnaissance Blind Chess [38] | 2022 | C5, C6 | Chess (imperfect info variant) | Win rate / Elo |
| Player of Games [2] | 2022 | C1, C2, C6 | Chess, Go, Poker (Texas Hold’em) | Win rate / Elo |
| Mastering the game of Stratego with model-free multiagent reinforcement learning [57] | 2022 | C5, C6 | Stratego | Win rate / Elo |
| Generalized Entropy and Solution Information for Measuring Puzzle Difficulty [53] | 2023 | C4 | Not specified | Robustness / Detection |
| Monte Carlo Tree Search: a review of recent modifications and applications [49] | 2023 | C3 | Not specified | Not specified |
| Deep Reinforcement Learning in Real-Time Strategy Games: A Systematic Literature Review [35] | 2024 | C5, C4 | RTS | Robustness / Detection |
| Optimizing Monte Carlo Tree Search for Parallel Computing on GPUs [48] | 2025 | C3 | Not specified | Runtime; Search effort |
| LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning [45] | 2024 | C1, C5 | Navigation / Grid maps | Path length /path cost; Node expansions; Runtime |
| Monte Carlo Tree Search with Boltzmann Exploration [46] | 2024 | C3 | Go | Win rate / Elo; Rollouts / simulations |
| AlphaZero Neural Scaling and Zipf’s Law: a Tale of Board Games and Power Laws [56] | 2024 | C2, C5 | Chess, Go | Win rate / Elo; Compute scaling (FLOPs) |
| Grandmaster-Level Chess Without Search [34] | 2024 | C2, C5 | Chess | Win rate / Elo |
| Enhancing Deep Reinforcement Learning for Scale Flexibility in Real-Time Strategy Games [21] | 2025 | C5, C1 | RTS | Robustness / Detection |
| Partially Observable Off-Switch Games [30] | 2025 | C6 | Stochastic / Imperfect information | Robustness / Detection |
| Monte Carlo Tree Search for Knowledge Graph Reasoning [60] | 2025 | C3 | Not specified | Search effort |
| Novelty in Monte Carlo Tree Search [47] | 2025 | C3 | Not specified | Score / Reward; Search effort (simulations) |
| R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability [40] | 2025 | C6 | Search / pursuit evasion games | Capture time; Robustness guartantees |
References
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144. [Google Scholar] [CrossRef] [PubMed]
- Schmid, M.; Moravcik, M.; Burch, N.; Kadlec, R.; Davidson, J.; Waugh, K.; Bard, N.; Timbers, F.; Lanctot, M.; Holland, Z.; et al. Player of Games. Science Advances 2022. [Google Scholar]
- Shannon, C.E. XXII. Programming a computer for playing chess. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1950, 41, 256–275. [Google Scholar] [CrossRef]
- Campbell, M.; Hoane, A.J., Jr.; Hsu, F.h. Deep blue. Artif. Intell. 2002, 134, 57–83. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
- Knuth, D.E.; Moore, R.W. An analysis of alpha-beta pruning. Artif. Intell. 1975, 6, 293–326. [Google Scholar] [CrossRef]
- Plaat, A.; Schaeffer, J.; Pijls, W.; De Bruin, A. Best-first fixed-depth minimax algorithms. Artif. Intell. 1996, 87, 255–293. [Google Scholar] [CrossRef]
- Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In Proceedings of the International conference on computers and games; Springer, 2006; pp. 72–83. [Google Scholar]
- Kocsis, L.; Szepesvári, C. Bandit based monte-carlo planning. In Proceedings of the European conference on machine learning; Springer, 2006; pp. 282–293. [Google Scholar]
- Korf, R.E. Depth-first iterative-deepening: An optimal admissible tree search. Artif. Intell. 1985, 27, 97–109. [Google Scholar] [CrossRef]
- Orkin, J. Three states and a plan: the AI of FEAR. In Proceedings of the Game developers conference, CMP Game Group SanJose, California, 2006; Vol. 2006, p. 4. [Google Scholar]
- Koopman, B.O. Search and screening; Number no. 56 in OEG report. In Operations Evaluation Group, Office of the Chief of Naval Operations; Navy Dept: Washington, D.C, 1946. [Google Scholar]
- Koopman, B.O. The theory of search: III. The optimum distribution of searching effort. Oper. Res. 1957, 5, 613–626. [Google Scholar] [CrossRef]
- Washburn, A.R. Search for a Moving Target: The FAB Algorithm. Oper. Res. 1983, 31, 739–751. [Google Scholar] [CrossRef]
- Cowling, P.I.; Powley, E.J.; Whitehouse, D. Information set monte carlo tree search. IEEE Trans. Comput. Intell. AI Games 2012, 4, 120–143. [Google Scholar] [CrossRef]
- Hohzaki, R. SEARCH GAMES: LITERATURE AND SURVEY. J. Oper. Res. Soc. Jpn. 2016, 59, 1–34. [Google Scholar] [CrossRef]
- Pan, H. Pathfinding and Map Feature Learning in RTS Games with Partial Observability. In Proceedings of the AIIDE Workshops, 2021. [Google Scholar]
- Korf, R.E. Real-time heuristic search. Artif. Intell. 1990, 42, 189–211. [Google Scholar] [CrossRef]
- Perez, D.; Samothrakis, S.; Lucas, S.; Rohlfshagen, P. Rolling horizon evolution versus tree search for navigation in single-player real-time games. In Proceedings of the Proceedings of the 15th annual conference on Genetic and evolutionary computation, 2013; pp. 351–358. [Google Scholar]
- Schrittwieser, J.; Antonoglou, I.; Hubert, T.; Simonyan, K.; Sifre, L.; Schmitt, S.; Guez, A.; Lockhart, E.; Hassabis, D.; Graepel, T.; et al. Mastering atari, go, chess and shogi by planning with a learned model. Nature 2020, 588, 604–609. [Google Scholar] [CrossRef] [PubMed]
- Lemos, M.L.H.D.; Tavares, A.R.; Marcolino, L.S.; Chaimowicz, L.; et al. Enhancing deep reinforcement learning for scale flexibility in real-time strategy games. Entertain. Comput. 2025, 52, 100843. [Google Scholar] [CrossRef]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed]
- Dijkstra, E.W. A note on two problems in connexion with graphs. Edsger Wybe Dijkstra His Life Work Leg. 2022, 287–290. [Google Scholar]
- Harabor, D.; Grastien, A. The JPS pathfinding system. In Proceedings of the International Symposium on Combinatorial Search, 2012; Vol. 3, pp. 207–208. [Google Scholar] [CrossRef]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- Koenig, S.; Likhachev, M. D* lite. In Proceedings of the Eighteenth national conference on Artificial intelligence, 2002; pp. 476–483. [Google Scholar]
- Botea, A.; Müller, M.; Schaeffer, J. Near optimal hierarchical path-finding. J. Game Dev. 2004, 1, 1–30. [Google Scholar]
- Yang, Z.; Ontanón, S. Guiding Monte Carlo tree search by scripts in real-time strategy games. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment; 2019; Vol. 15, pp. 100–106. [Google Scholar]
- Schaeffer, J. The history heuristic and alpha-beta search enhancements in practice. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 11, 1203–1212. [Google Scholar] [CrossRef]
- Garber, A.; Subramani, R.; Luu, L.; Bedaywi, M.; Russell, S.; Emmons, S. The partially observable off-switch game. In Proceedings of the AAAI Conference on Artificial Intelligence; 2025; Vol. 39, pp. 27304–27311. [Google Scholar] [CrossRef]
- Stentz, A. Optimal and efficient path planning for partially-known environments. In Proceedings of the Proceedings of the 1994 IEEE international conference on robotics and automation. IEEE, 1994; pp. 3310–3317. [Google Scholar]
- Browne, C.B.; Powley, E.; Whitehouse, D.; Lucas, S.M.; Cowling, P.I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samothrakis, S.; Colton, S. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 2012, 4, 1–43. [Google Scholar] [CrossRef]
- Brown, N.; Sandholm, T. Superhuman AI for multiplayer poker. Science 2019, 365, 885–890. [Google Scholar] [CrossRef] [PubMed]
- Ruoss, A.; DelétTang, G.; Li, S.; Czarnecki, W.M.; Gretton, A.; Vinyals, O. Grandmaster-Level Chess Without Search. arXiv 2024, arXiv:2402.04494. [Google Scholar] [CrossRef]
- Barros e Sa, G.C.; Madeira, C.A.G. Deep reinforcement learning in real-time strategy games: a systematic literature review: GC Barros e Sá et al. Appl. Intell. 2025, 55, 243. [Google Scholar] [CrossRef]
- Eagle, J.N.; Yee, J.R. An optimal branch-and-bound procedure for the constrained path, moving target search problem. Oper. Res. 1990, 38, 110–114. [Google Scholar] [CrossRef]
- Anderson, E.J.; Aramendia, M. A linear programming approach to the search game on a network with mobile hider. SIAM J. Control Optim. 1992, 30, 675–694. [Google Scholar] [CrossRef]
- Bertram, T.; Fürnkranz, J.; Müller, M. Supervised and reinforcement learning from observations in reconnaissance blind chess. In Proceedings of the 2022 IEEE Conference on Games (CoG); IEEE, 2022; pp. 311–318. [Google Scholar]
- Tomášek, P.; Horák, K.; Aradhye, A.; Bošanský, B.; Chatterjee, K. Solving partially observable stochastic shortest-path games. In Proceedings of the Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI), 2021; pp. 4182–4189. [Google Scholar]
- Lu, R.; Shi, R.; Zhu, Y.; Zhao, D. R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability. arXiv 2025, arXiv:2511.17367. [Google Scholar]
- Kapi, A.Y. A Review on Informed Search Algorithms for Video Games Pathfinding. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 2589–2598. [Google Scholar] [CrossRef]
- Noori, A.; Moradi, F. Simulation and Comparison of Efficency in Pathfinding algorithms in Games. Ciência E Nat. 2015, 37, 230–238. [Google Scholar] [CrossRef]
- Harabor, D.; Grastien, A. Online graph pruning for pathfinding on grid maps. In Proceedings of the AAAI conference on artificial intelligence; 2011; Vol. 25, pp. 1114–1119. [Google Scholar] [CrossRef]
- Lawande, S.R.; Jasmine, G.; Anbarasi, J.; Izhar, L.I. A systematic review and analysis of intelligence-based pathfinding algorithms in the field of video games. Appl. Sci. 2022, 12, 5499. [Google Scholar] [CrossRef]
- Meng, S.; Wang, Y.; Yang, C.F.; Peng, N.; Chang, K.W. LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning. In Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024; pp. 10246–10263. [Google Scholar]
- Painter, M.; Baioumy, M.; Hawes, N.; Lacerda, B. Monte Carlo Tree Search with Boltzmann Exploration. In Proceedings of the Advances in Neural Information Processing Systems, 2024; Vol. 37. [Google Scholar]
- Baier, H.; Kaisers, M. Novelty in Monte Carlo Tree Search. IEEE Transactions on Games 2025. [Google Scholar] [CrossRef]
- Klęsk, P. MCTS-NC: A thorough GPU parallelization of Monte Carlo Tree Search implemented in Python via numba.cuda. SoftwareX 2025, 30, 102139. [Google Scholar] [CrossRef]
- Świechowski, M.; Godlewski, K.; Sawicki, B.; Mańdziuk, J. Monte Carlo tree search: a review of recent modifications and applications. Artif. Intell. Rev. 2023, 56, 2497–2562. [Google Scholar] [CrossRef]
- Kartal, B.; Hernandez-Leal, P.; Taylor, M.E. Action Guidance with MCTS for Deep Reinforcement Learning. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment; 2019; Vol. 15, pp. 153–159. [Google Scholar]
- Perez-Liebana, D.; Gaina, R.D.; Drageset, O.; Ilhan, E.; Balla, M.; Lucas, S.M. Analysis of statistical forward planning methods in Pommerman. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment; 2019; Vol. 15, pp. 66–72. [Google Scholar]
- Gaina, R.D.; Devlin, S.; Lucas, S.M.; Pérez-Liébana, D. Rolling horizon evolutionary algorithms for general video game playing. IEEE Trans. Games 2020, 14, 232–242. [Google Scholar] [CrossRef]
- Shen, J.; Sturtevant, N.R. Generalized entropy and solution information for measuring puzzle difficulty. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment; 2024; Vol. 20, pp. 117–126. [Google Scholar]
- Ye, W.; Liu, S.; Kurutach, T.; Abbeel, P.; Gao, Y. Mastering Atari Games with Limited Data. In Proceedings of the Advances in Neural Information Processing Systems, 2021; Vol. 34, pp. 25476–25488. [Google Scholar]
- Goldwaser, A.; Thielscher, M. Deep reinforcement learning for general game playing. In Proceedings of the AAAI Conference on Artificial Intelligence; 2020; Vol. 34, pp. 1701–1708. [Google Scholar] [CrossRef]
- Neumann, O.; Gros, C. AlphaZero Neural Scaling and Zipf’s Law: a Tale of Board Games and Power Laws. arXiv 2024, arXiv:2412.11979. [Google Scholar]
- Perolat, J.; De Vylder, B.; Hennes, D.; Taez, E.; Strub, F.; Meunier, V.; Lanctot, M.; Munos, R.; Gruslys, A.; Lockhart, E.; et al. Mastering the game of Stratego with model-free multiagent reinforcement learning. Science 2022, 378, 990–996. [Google Scholar] [CrossRef] [PubMed]
- Hohzaki, R.; Joo, K. A search allocation game with private information of initial target position. J. Oper. Res. Soc. Jpn. 2015, 58, 353–375. [Google Scholar] [CrossRef]
- Isaacs, R. Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization; Courier Corporation, 1999. [Google Scholar]
- Liu, L. Monte carlo tree search for graph reasoning in large language model agents. In Proceedings of the Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025; pp. 4966–4970. [Google Scholar]
- Brown, G.; Carlyle, M.; Diehl, D.; Kline, J.; Wood, K. A two-sided optimization for theater ballistic missile defense. Oper. Res. 2005, 53, 745–763. [Google Scholar] [CrossRef]







| Cluster | Core Idea | Decision Quality | Optimality | Robustness | Typical Game Types |
|---|---|---|---|---|---|
| C1: Pathfinding / Navigation | Heuristic search + replanning for spatial movement | Path cost/length; smoothness; responsiveness under time constraints | Exact (admissible heuristics) or bounded/approx via abstraction and replanning | Stability under dynamic maps + compute variability; anytime behavior | Navigation, grid maps, game world traversal |
| C2: Adversarial Game-Tree | Minimax search, pruning, heuristic evaluation | Move/strategy strength vs opponents; depth limited state evaluation | Minimax correct under pruning; practical depth limited “optimality” | Worst case opponent modeling; conservative safety against exploitation | Board games (Chess-like), perfect info adversarial |
| C3: MCTS / Bandits | Sampling based planning with exploration exploitation control | Empirical win rate/score; improvement with simulation budget | Asymptotic convergence under assumptions; finite budget approximation | Robustness to stochasticity + variable compute budgets (anytime) | Large branching games; stochastic games; some RTS via abstractions |
| C4: Metaheuristics / Rolling Horizon | Budgeted optimization over action sequences (e.g., evolutionary RH) | Objective driven performance under tight budgets; diversity of candidate plans | Approximate best so far within budget; horizon limited | Graceful degradation; diversity driven resilience; noise tolerance | Real-time single player control; puzzle/optimization like game settings |
| C5: Learning Augmented Search | Learned priors (policy/value/model) guide or replace parts of search | High empirical strength at scale; learned strategic structure | Benchmark defined “near-optimality”; guarantees typically replaced by empirical dominance | Generalization and distributional robustness become central (shift, scale, fog of war) | Go/ Chess/ Shogi/ Atari-like; complex RTS and multi agent domains |
| C6: Uncertainty / Search Games | Belief and model based policies for hidden state and moving targets | Belief conditioned effectiveness; detection/capture probability; expected performance | Model relative optimality (LP/DP/control); equilibrium under incomplete info | Primary objective: robust performance under uncertainty, partial observability, adversarial hiding | Pursuit evasion; moving target search; fog of war-like uncertainty settings |
| Cluster | Representative papers |
|---|---|
| C1: Pathfinding / Navigation | [10,11,17,18,23,24,25,26,27,31,41,42,43,44,45] |
| C2: Adversarial Game Tree | [1,2,3,4,6,7,20,29,34,56,59] |
| C3: MCTS / Bandits | [1,5,8,9,15,19,20,28,32,33,46,47,48,49,50,51,54,55,60] |
| C4: Metaheuristics / Rolling Horizon | [12,13,19,29,51,52,53] |
| C5: Learning Augmented Search | [1,2,5,17,20,21,22,34,35,38,45,50,54,55,56,57] |
| C6: Uncertainty / Search Games | [2,12,13,14,15,16,17,30,33,36,37,38,39,40,57,58,59,61] |
| Trade off | Cluster tendency A | Cluster tendency B | Survey insight |
|---|---|---|---|
| Optimality vs Scalability | C1/C2 (stronger guarantees) | C3/C5 (scales to huge spaces) | As games grow, strict guarantees become conditional or asymptotic; empirical dominance often replaces proof based optimality. |
| Worst case robustness vs Empirical robustness | C2 (worst case opponent) | C3/C5 (statistical + learned robustness) | Worst case models can be conservative; sampling/learning broaden robustness but create new failure modes (shift, model bias). |
| Quality vs Time budget (anytime behavior) | C1/C4 (explicit anytime design) | C2 (depth limited) | Real-time domains reward algorithms that degrade gracefully; quality must be measured as a function of compute, not a single score. |
| Model relative optimality vs Robustness under mismatch | C6 (optimal under model) | C5 (learns from data, risks shift) | C6 delivers principled belief conditional policies but depends on assumptions; learning can generalize but must be stress tested distributionally. |
| Handcrafted guidance vs Learned guidance | C1/C2 (heuristics/eval) | C5 (learned priors/models) | Learned priors can collapse branching factors and raise decision quality, but robustness demands evaluation across scenarios and scales. |
| Metric Category | Examples | Clusters where common |
|---|---|---|
| Decision quality (outcome) | Win rate; tournament results; score/return; task success rate | C2, C3, C5 (also C4) |
| Decision quality (solution quality) | Path cost/length; sub optimality gap; plan quality over horizon | C1, C4 |
| Optimality / guarantees | Minimax correctness; admissibility; convergence (asymptotic); equilibrium notions (model relative) | C1, C2, C3, C6 |
| Robustness (distributional) | Performance across maps/scales; scenario sweeps; robustness to hidden info; variance across seeds | C3, C5, C6 |
| Robustness (search theoretic) | Detection probability; expected capture time; coverage under uncertainty | C6 |
| Resource efficiency | Runtime; time per move; nodes expanded; rollouts/second; memory usage; GPU throughput | C1, C2, C3 (also C4, C5) |
| Game type | Dominant clusters | Primary challenges | Typical metrics |
|---|---|---|---|
| Board games (Chess/Go/Shogi) | C2, C3, C5 | Huge game trees; long horizon strategy; opponent strength; compute constraints | Win rate; Elo/rating; depth; rollouts; node expansions |
| Navigation / Grid maps | C1 (sometimes C5) | Large maps; dynamic obstacles; real-time constraints; smoothness and stability | Path cost/length; runtime; node expansions; replanning latency |
| Real-time strategy (RTS) | C3, C5 (plus hybrids) | Partial observability; multi agent coordination; large branching; real-time action selection; scale changes | Win rate; scenario success; time per decision; generalization across maps/scales |
| Stochastic / Imperfect information games | C3, C6 (sometimes C5) | Hidden state; randomness; belief maintenance; opponent unpredictability | Expected return; robustness across scenarios; variance; detection probability |
| Search / Pursuit evasion style domains | C6 (sometimes C1/C3 as tools) | Moving targets; sensing uncertainty; adversarial hiding; resource allocation | Detection probability; expected capture time; coverage; worst case guarantees |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).