Submitted:
20 May 2024
Posted:
20 May 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials & Methods

2.1. Setup and Environment Preparation
2.2. Design and Functionality of the Q-Network
2.3. Implementing Experience Replay for Stable Learning
2.4. Role and Functions of the DQN Agent
2.5. Execution and Maintenance of the Training Loop
2.6. Performance Evaluation of the Agent
2.7. Visualizing the Agent’s Gameplay
2.8. Proposed Optimization Algorithm for Hyperparameter Tuning

3. Results
3.1. Outcomes of the ESO Algorithm

4. Discussion & Implications
5. Conclusion & Future Work
References
- Kadhim, A.I. Survey on supervised machine learning techniques. Artif. Intell. Rev. 2019, 52, 273–292. [Google Scholar] [CrossRef]
- Yau, K.A.; Elkhatib, Y.; Hussain, A.; Al-fuqaha, A.L.A. Unsupervised machine learning for networking: Techniques, applications and research challenges. IEEE Access 2019, 7, 65579–65615. [Google Scholar]
- Singh, B.; Kumar, R.; Singh, V.P. Reinforcement learning in robotic applications: A comprehensive survey. Artif. Intell. Rev. 2022, 55, 1–46. [Google Scholar]
- Rao, S.; Verma, A.K.; Bhatia, T.A. Review on social spam detection: Challenges, open issues, and future directions. Expert Syst. Appl. 2021, 186, 115742. [Google Scholar] [CrossRef]
- Sahil, S.; Zaidi, A.; Samar, M.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digit. Signal Process. 2022, 126, 103514. [Google Scholar]
- Bochenek, B.; Ustrnul, Z. Machine learning in weather prediction and climate analyses—applications and perspectives. Atmosphere 2022, 13, 180–2022. [Google Scholar] [CrossRef]
- Keerthana, S.; Santhi, B. Survey on applications of electronic nose. J. Comput. Sci. 2020, 16, 314–320. [Google Scholar] [CrossRef]
- Razzaghi, P.; Tabrizian, A.; Guo, W.; Chen, S.; Taye, A.; Thompson, E.; Wei, P. A survey on reinforcement learning in aviation applications. arXiv 2022, arXiv:2211.02147. [Google Scholar]
- Sivamayil, K.; Rajasekar, E.; Aljafari, B.; Nikolovski, S.; Vairavasun- daram, S.; Vairavasundaram, I. A systematic study on reinforcement learning based applications. Energies 2023, 16, 1512. [Google Scholar] [CrossRef]
- Gronauer, S.; Diepold, K. Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev. 2022, 1–49. [Google Scholar]
- Nvan Eck, J.; van Wezel, M. Application of reinforcement learning to the game of othello. Comput. Oper. Res. 2008, 35, 1999–2017. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wier- stra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Rajendran, D.; Santhanam, P. Towards digital game-based learning content with multi-objective reinforcement learning. Mater. Today Proc. 2021. [CrossRef]
- Liu, S.; Cao, J.; Wang, Y.; Chen, W.; Liu, Y. Self-play reinforcement learning with comprehensive critic in computer games. Neurocomputing 2021, 449, 207–213. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M. Mastering the game of go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T. A general rein- forcement learning algorithm that masters chess, shogi, and go through self-play. Science 2018, 362, 1140–1144. [Google Scholar] [CrossRef]
- Nu, C.; Ferna, J.; Perez, R. Learning to select goals in automated planning with deep-q learning. Expert Syst. Appl. 2022, 202, 117265. [Google Scholar]
- Gong, X.; Yu, J.; Lu, S.; Lu, H. Actor-critic with familiarity-based trajectory experience replay. Inf. Sci. 2022, 582, 633–647. [Google Scholar] [CrossRef]
- Such, F.; Madhavan, V.; Conti, E.; Lehman, J.; Stanley, K.; Clune, J. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv 2017, arXiv:1712.06567. [Google Scholar]
- Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q–learning. In Proceedings of the AAAI Conference on Artificial Intelligence 2016, 30. [Google Scholar] [CrossRef]
- OpenAI; Berner, C. ; Brockman, G.; Chan, B.; Cheung, V.; Debiak, P.; Dennison, C.; Farhi, D.; Fischer, Q.; Hashme, S. Dota 2 with large scale deep reinforcement learning. Dota 2 with large scale deep reinforcement learning. arXiv arXiv:1912.06680, 2019.
- Vinyals, O.; Fortunato, M.; Pointer networks, N.J. 2015; arXiv:1506.03134.
- Lee, J.; Lee, Y.; Kim, J.; Kosiorek, A.; Choi, S.; Teh, Y.W. Set trans- former: A framework for attention-based permutation-invariant neural networks. In Proceedings of the International Conference on Machine Learning; 2018. [Google Scholar]
- Ammanabrolu, P.; Riedl, M.O. Playing text–adventure games with graph–based deep reinforcement learning. arXiv 2018, arXiv:1812.01628. [Google Scholar]
- Adolphs, L.; Hofmann, T. Ledeepchef: Deep reinforcement learning agent for families of text–based games. In Proceedings of the AAAI Conference on Artificial Intelligence; 2019. [Google Scholar]
- Brown, N.; Bakhtin, A.; Lerer, A.; Gong, Q. Combining deep reinforcement learning and search for imperfect–information games. arXiv arXiv:2007.13544, 2020.
- Ye, D.; Liu, Z.; Sun, M.; Shi, B.; Zhao, P.; Wu, H.; Yu, H.; Yang, S.; Wu, X.; Guo, Q. Mastering complex control in moba games with deep reinforcement learning. Proc. AAAI Conf. Artif. Intell. 2020, 34, 6672–6679. [Google Scholar] [CrossRef]
- Roy, B. An analysis of temporal–difference learning with function approximation. Autom. Control. IEEE Trans. 1997, 42, 674–690. [Google Scholar]
| Iteration | SOA Reward | EVO Reward | ESO Reward |
|---|---|---|---|
| 1 | 200.0 | 150.0 | 400.0 |
| 2 | 350.0 | 250.0 | 550.0 |
| 3 | 400.0 | 300.0 | 650.0 |
| 4 | 450.0 | 350.0 | 750.0 |
| 5 | 500.0 | 400.0 | 850.0 |
| 6 | 600.0 | 450.0 | 950.0 |
| 7 | 700.0 | 500.0 | 1050.0 |
| 8 | 800.0 | 550.0 | 1100.0 |
| Algorithm | Reward Achieved | Time Taken (seconds) |
|---|---|---|
| SOA | 840.0 | 45 |
| EVO | 640.0 | 50 |
| ESO | 1100.0 | 32 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).