Submitted:
25 June 2023
Posted:
26 June 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Like numerous ML programmers devote significant time to adjusting the hyperparameters, notably for huge datasets or intricate ML algorithms having numerous hyperparameters, it decreases the degree of human labor required.
- It boosts the efficacy of ML models. Numerous ML hyperparameters have diverse optimal values to attain the best results on different datasets or problems.
- It boosts the replicability of the frameworks and techniques. Several ML algorithms may solely be justly assessed when the identical degree of hyper-parameter adjustment is applied; consequently, utilizing the equivalent HPO approach to several ML algorithms also assists in recognizing the ideal ML model for a specific problem.
- It encompasses three well-known machine learning algorithms (SVM, RF and KNN) and their fundamental hyper-parameters.
- It assesses conventional HPO methodologies, comprising their pros and cons, to facilitate their application to different ML models by selecting the fitting algorithm in pragmatic circumstances.
- It investigates the impact of HPO techniques on the comprehensive precision of landslide susceptibility mapping.
- It contrasts the increase in precision from the starting point and predetermined parameters to fine-tuned parameters and their impact on three renowned machine learning methods.
Study Area
Landslide Conditioning Factors
2. Methodology
3. Hyper-Parameters
3.1. Discrete Hyper-Parameter
3.2. Continuous Hyper-Parameter
3.3. Conditional Hyper-Parameters
3.4. Categorical Hyper-Parameters
3.5. Big Hyper-Parameter Configuration Space with Different Types of Hyper-Parameters
4. Hyper-Parameter Optimization Techniques:
4.1. Babysitting
4.2. Grid Search
- Commence with a wide exploration region and sizable stride length.
- Utilizing prior effective hyper-parameter settings, diminish the exploration area and stride length.
- Persist in repeating step 2 until the optimum outcome is achieved.
4.3. Random Search
Bayesian Optimization
- Create a surrogate probabilistic model of the target function.
- Find the best hyper-parameter values on the surrogate model.
- Employ these hyper-parameter values to the existing target function for evaluation.
- Add the most recent observations to the surrogate model.
- Repeat steps 2 through 4 until the allotted number of iterative cycles is reached.
4.4. BO-GP
4.5. BO-TPE
4.6. Metaheuristic Algorithms
4.7. Genetic Algorithm (GA)
- Commence by randomly initializing the genes, chromosomes, and population that depict the whole exploration space, as well as the hyper-parameters and their corresponding values.
- Identify the fitness function, which embodies the main objective of an ML model, and employ the findings to evaluate each member of the current generation.
- Use chromosome methodologies such as crossover, mutation, and selection to generate a new generation comprising the subsequent hyper-parameter values that will be evaluated.
- Continue executing steps 2 and 3 until the termination criteria are met.
- Conclude the process and output the optimal hyper-parameter configuration.
4.8. Particle Swarm Optimization (PSO)
5. Mathematical and Hyper-Parameter Optimization
5.1. Mathematical Optimization
5.2. Hyper-Parameter Optimization
- Choose the performance measurements and the objective function.
- Identify the hyper-parameters that need tuning, list their categories, and select the optimal optimization method.
- Train the ML model using the default hyper-parameter setup or common values for the baseline model.
- Commence the optimization process with a broad search space, selected through manual testing and/or domain expertise, as the feasible hyperparameter domain.
- If required, explore additional search spaces or narrow down the search space based on the regions where best functioning hyper-parameter values have been recently evaluated.
- Finally, provide the hyper-parameter configuration that exhibits the best performance.
6. Hyper-Parameters in Machine Learning Models
6.1. KNN
6.2. SVM
6.3. Random Forest (Tree Based Models)
7. Results
| Optimization Algorithm | Accuracy (%) | CT(s) |
| GS | 0.90730 | 4.70 |
| RS | 0.92663 | 3.91 |
| BO-GP | 0.93266 | 16.94 |
| BO-TPE | 0.94112 | 1.43 |
| GA | 0.94957 | 4.90 |
| PSO | 0.95923 | 3.12 |
| Optimization Algorithm | Accuracy (%) | CT(s) |
| BO-TPE | 0.95289 | 0.55 |
| BO-GP | 0.94565 | 5.78 |
| PSO | 0.90277 | 0.43 |
| GA | 0.90277 | 1.18 |
| RS | 0.89855 | 0.73 |
| GS | 0.89794 | 1.23 |
| Optimization Algorithm | Accuracy (%) | CT(s) |
| BO-GP | 0.90247 | 1.21 |
| BO-TPE | 0.89462 | 2.23 |
| PSO | 0.89462 | 1.65 |
| GA | 0.88194 | 2.43 |
| RS | 0.88194 | 6.41 |
| GS | 0.78925 | 7.68 |
7.1. Landslide Susceptibility Maps
7.1.1. Random Forest


7.1.2. KNN


7.1.3. SVM


8. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Polanco, C. Add a new comment. Science 2014, 346, 684–685. [Google Scholar]
- Zöller, M.-A.; Huber, M.F. Benchmark and survey of automated machine learning frameworks. Journal of artificial intelligence research 2021, 70, 409–472. [Google Scholar] [CrossRef]
- Elshawi, R.; Maher, M.; Sakr, S. Automated machine learning: State-of-the-art and open challenges. arXiv 2019, preprint. arXiv:1906.02287 2019. [Google Scholar]
- DeCastro-García, N.; Munoz Castaneda, A.L.; Escudero Garcia, D.; Carriegos, M.V. Effect of the sampling of a dataset in the hyperparameter optimization phase over the efficiency of a machine learning algorithm. Complexity 2019, 2019. [Google Scholar] [CrossRef]
- Abreu, S. Automated architecture design for deep neural networks. arXiv, 2019; preprint. arXiv:1908.10714. [Google Scholar]
- Olof, S.S. A comparative study of black-box optimization algorithms for tuning of hyper-parameters in deep neural networks. 2018.
- Luo, G. A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Network Modeling Analysis in Health Informatics and Bioinformatics 2016, 5, 1–16. [Google Scholar] [CrossRef]
- Maclaurin, D.; Duvenaud, D.; Adams, R. Gradient-based hyperparameter optimization through reversible learning. In Proceedings of the International conference on machine learning; 2015; pp. 2113–2122. [Google Scholar]
- Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. Advances in neural information processing systems 2011, 24. [Google Scholar]
- Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. Journal of machine learning research 2012, 13. [Google Scholar]
- Eggensperger, K.; Feurer, M.; Hutter, F.; Bergstra, J.; Snoek, J.; Hoos, H.; Leyton-Brown, K. Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In Proceedings of the NIPS workshop on Bayesian Optimization in Theory and Practice; 2013. [Google Scholar]
- Eggensperger, K.; Hutter, F.; Hoos, H.; Leyton-Brown, K. Efficient benchmarking of hyperparameter optimizers via surrogates. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence; 2015. [Google Scholar]
- Li, L.; Jamieson, K.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research 2017, 18, 6765–6816. [Google Scholar]
- Yao, Q.; Wang, M.; Chen, Y.; Dai, W.; Li, Y.-F.; Tu, W.-W.; Yang, Q.; Yu, Y. Taking human out of learning applications: A survey on automated machine learning. arXiv, 2018; preprint. arXiv:1810.13306. [Google Scholar]
- Lessmann, S.; Stahlbock, R.; Crone, S.F. Optimizing hyperparameters of support vector machines by genetic algorithms. In Proceedings of the IC-AI; 2005; p. 82. [Google Scholar]
- Lorenzo, P.R.; Nalepa, J.; Kawulok, M.; Ramos, L.S.; Pastor, J.R. Particle swarm optimization for hyper-parameter selection in deep neural networks. In Proceedings of the Proceedings of the genetic and evolutionary computation conference, 2017; pp. 481–488.
- Ilievski, I.; Akhtar, T.; Feng, J.; Shoemaker, C. Efficient hyperparameter optimization for deep learning algorithms using deterministic rbf surrogates. In Proceedings of the Proceedings of the AAAI conference on artificial intelligence; 2017. [Google Scholar]
- Claesen, M.; Simm, J.; Popovic, D.; Moreau, Y.; De Moor, B. Easy hyperparameter search using optunity. arXiv, 2014; preprint. arXiv:1412.1114. [Google Scholar]
- Witt, C. Worst-case and average-case approximations by simple randomized search heuristics. In Proceedings of the STACS 2005: 22nd Annual Symposium on Theoretical Aspects of Computer Science, Stuttgart, Germany, 24–26 February 2005; Proceedings 22. pp. 44–56. [Google Scholar]
- Hutter, F.; Kotthoff, L.; Vanschoren, J. Automated machine learning: methods, systems, challenges; Springer Nature: 2019.
- Nguyen, V. Bayesian optimization for accelerating hyper-parameter tuning. In Proceedings of the 2019 IEEE second international conference on artificial intelligence and knowledge engineering (AIKE); 2019; pp. 302–305. [Google Scholar]
- Sanders, S.; Giraud-Carrier, C. Informing the use of hyperparameter optimization through metalearning. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM); 2017; pp. 1051–1056. [Google Scholar]
- Hazan, E.; Klivans, A.; Yuan, Y. Hyperparameter optimization: A spectral approach. arXiv, 2017; preprint. arXiv:1706.00764. [Google Scholar]
- Hutter, F.; Hoos, H.H.; Leyton-Brown, K. Sequential model-based optimization for general algorithm configuration. In Proceedings of the Learning and Intelligent Optimization: 5th International Conference, LION 5, Rome, Italy, 17–21 January 2011; Selected Papers 5. pp. 507–523. [Google Scholar]
- Dewancker, I.; McCourt, M.; Clark, S. Bayesian optimization primer.[online] Available: https://sigopt. com/static/pdf SigOpt_Bayesian_Optimization_Primer. pdf. 2015. [Google Scholar]
- Gogna, A.; Tayal, A. Metaheuristics: review and application. Journal of Experimental & Theoretical Artificial Intelligence 2013, 25, 503–526. [Google Scholar]
- Itano, F.; de Sousa, M.A.d.A.; Del-Moral-Hernandez, E. Extending MLP ANN hyper-parameters Optimization by using Genetic Algorithm. In Proceedings of the 2018 International joint conference on neural networks (IJCNN); 2018; pp. 1–8. [Google Scholar]
- Kazimipour, B.; Li, X.; Qin, A.K. A review of population initialization techniques for evolutionary algorithms. In Proceedings of the 2014 IEEE congress on evolutionary computation (CEC); 2014; pp. 2585–2592. [Google Scholar]
- Rahnamayan, S.; Tizhoosh, H.R.; Salama, M.M. A novel population initialization method for accelerating evolutionary algorithms. Computers & Mathematics with Applications 2007, 53, 1605–1614. [Google Scholar]
- Lobo, F.G.; Goldberg, D.E.; Pelikan, M. Time complexity of genetic algorithms on exponentially scaled problems. In Proceedings of the Proceedings of the 2nd annual conference on genetic and evolutionary computation, 2000; pp. 151–158.
- Shi, Y.; Eberhart, R.C. Parameter selection in particle swarm optimization. In Proceedings of the Evolutionary Programming VII: 7th International Conference, EP98, San Diego, CA, USA, 25–27 March 1998; Proceedings 7. pp. 591–600. [Google Scholar]
- Yan, X.-H.; He, F.-Z.; Chen, Y.-L. 基于野草扰动粒子群算法的新型软硬件划分方法. 计算机科学技术学报 2017, 32, 340–355. [Google Scholar]
- Min-Yuan, C.; Kuo-Yu, H.; Merciawati, H. Multiobjective Dynamic-Guiding PSO for Optimizing Work Shift Schedules. 2018.
- Wang, H.; Wu, Z.; Wang, J.; Dong, X.; Yu, S.; Chen, C. A new population initialization method based on space transformation search. In Proceedings of the 2009 Fifth International Conference on Natural Computation; 2009; pp. 332–336. [Google Scholar]
- Sun, S.; Cao, Z.; Zhu, H.; Zhao, J. A survey of optimization methods from a machine learning perspective. IEEE transactions on cybernetics 2019, 50, 3668–3681. [Google Scholar] [CrossRef] [PubMed]
- McCarl, B.A.; Spreen, T.H. Applied mathematical programming using algebraic systems. Cambridge, MA, 1997. [Google Scholar]
- Bubeck, S. Konvex optimering: Algoritmer och komplexitet. Foundations and Trends® in Machine Learning 2015, 8, 231–357. [Google Scholar] [CrossRef]
- Shahriari, B.; Bouchard-Côté, A.; Freitas, N. Unbounded Bayesian optimization via regularization. In Proceedings of the Artificial intelligence and statistics; 2016; pp. 1168–1176. [Google Scholar]
- Diaz, G.I.; Fokoue-Nkoutche, A.; Nannicini, G.; Samulowitz, H. An effective algorithm for hyperparameter optimization of neural networks. IBM Journal of Research and Development 2017, 61, 9:1–9:11. [Google Scholar] [CrossRef]
- Gambella, C.; Ghaddar, B.; Naoum-Sawaya, J. Optimization problems for machine learning: A survey. European Journal of Operational Research 2021, 290, 807–828. [Google Scholar] [CrossRef]
- Sparks, E.R.; Talwalkar, A.; Haas, D.; Franklin, M.J.; Jordan, M.I.; Kraska, T. Automating model search for large scale machine learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015; pp. 368–380.
- Nocedal, J.; Wright, S.J. Numerical optimization; Springer: 1999.
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 2011, 12, 2825–2830. [Google Scholar]
- Chen, C.; Yan, C.; Li, Y. A robust weighted least squares support vector regression based on least trimmed squares. Neurocomputing 2015, 168, 941–946. [Google Scholar] [CrossRef]
- Yang, L.; Muresan, R.; Al-Dweik, A.; Hadjileontiadis, L.J. Image-based visibility estimation algorithm for intelligent transportation systems. IEEE Access 2018, 6, 76728–76740. [Google Scholar] [CrossRef]
- Zhang, J.; Jin, R.; Yang, Y.; Hauptmann, A. Modified logistic regression: An approximation to SVM and its applications in large-scale text categorization. 2003.
- Soliman, O.S.; Mahmoud, A.S. A classification system for remote sensing satellite images using support vector machine with non-linear kernel functions. In Proceedings of the 2012 8th International Conference on Informatics and Systems (INFOS); 2012; pp. BIO-181–BIO-187. [Google Scholar]
- Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics 1991, 21, 660–674. [Google Scholar] [CrossRef]
- Manias, D.M.; Jammal, M.; Hawilo, H.; Shami, A.; Heidari, P.; Larabi, A.; Brunner, R. Machine learning for performance-aware virtual network function placement. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM); 2019; pp. 1–6. [Google Scholar]
- Yang, L.; Moubayed, A.; Hamieh, I.; Shami, A. Tree-based intelligent intrusion detection system in internet of vehicles. In Proceedings of the 2019 IEEE global communications conference (GLOBECOM); 2019; pp. 1–6. [Google Scholar]
- Injadat, M.; Salo, F.; Nassif, A.B.; Essex, A.; Shami, A. Bayesian optimization with machine learning algorithms towards anomaly detection. In Proceedings of the 2018 IEEE global communications conference (GLOBECOM); 2018; pp. 1–6. [Google Scholar]
- Arjunan, K.; Modi, C.N. An enhanced intrusion detection framework for securing network layer of cloud computing. In Proceedings of the 2017 ISEA Asia Security and Privacy (ISEASP); 2017; pp. 1–10. [Google Scholar]
- Dietterich, T.G. Ensemble methods in machine learning. In Proceedings of the Multiple Classifier Systems: First International Workshop, MCS 2000 Cagliari, Italy, 21–23 June 2000, Proceedings 1, 2000; pp. 1–15.



| Factors | Classes | Class Percentage % | Landslide Percentage % | Reclassification |
|---|---|---|---|---|
| Slope (°) | Very Gentle Slope < 5° | 17.36 | 21.11 | Geometrical interval reclassification |
| Gentle Slope 5°–15° | 20.87 | 28.37 | ||
| Moderately Steep Slope 15°–30° | 26.64 | 37.89 | ||
| Steep Slope 30°–45° | 24.40 | 10.90 | ||
| Escarpments > 45° | 10.71 | 1.73 | ||
| Aspect | Flat (−1) | 22.86 | 7.04 | Remained unmodified (as in source data). |
| North (0–22) | 21.47 | 7.03 | ||
| Northeast (22–67) | 14.85 | 5.00 | ||
| East (67–112) | 8.00 | 11.86 | ||
| Southeast (112–157) | 5.22 | 14.3 | ||
| South (157–202) | 2.84 | 14.40 | ||
| Southwest (202–247) | 6.46 | 12.41 | ||
| West (247–292) | 7.19 | 16.03 | ||
| Northwest (292–337) | 11.07 | 11.96 | ||
| Land Cover | Dense Conifer | 0.38 | 12.73 | |
| Sparse Conifer | 0.25 | 12.80 | ||
| Broadleaved, Conifer | 1.52 | 10.86 | ||
| Grasses/Shrubs | 25.54 | 10.3 | ||
| Agriculture Land | 5.78 | 10.40 | ||
| Soil/Rocks | 56.55 | 14.51 | ||
| Snow/Glacier | 8.89 | 12.03 | ||
| Water | 1.06 | 16.96 | ||
| Geology | Cretaceous sandstone | 13.70 | 6.38 | |
| Devonian-Carboniferous | 12.34 | 5.80 | ||
| Chalt Group | 1.43 | 8.43 | ||
| Hunza plutonic unit | 4.74 | 10.74 | ||
| Paragneisses | 11.38 | 11.34 | ||
| Yasin group | 10.80 | 10.70 | ||
| Gilgit complex | 5.80 | 9.58 | ||
| Trondhjemite | 15.65 | 9.32 | ||
| Permian massive limestone | 6.51 | 6.61 | ||
| Permanent ice | 12.61 | 3.51 | ||
| Quaternary alluvium | 0.32 | 8.65 | ||
| Triassic massive limestone and dolomite | 1.58 | 7.80 | ||
| snow | 3.08 | 2.00 | ||
| Proximity to Stream (meter) | 0–100 m | 19.37 | 18.52 | Geometrical interval reclassification |
| 100–200 | 10.26 | 21.63 | ||
| 200–300 | 10.78 | 25.16 | ||
| 300–400 | 13.95 | 26.12 | ||
| 400–500 | 18.69 | 6.23 | ||
| >500 | 26.92 | 2.34 | ||
| Proximity to Road (meter) |
0–100 m | 81.08 | 25.70 | |
| 100–200 | 10.34 | 25.19 | ||
| 200–300 | 6.72 | 27.09 | ||
| 300–400 | 1.25 | 12.02 | ||
| 400–500 | 0.60 | 10.00 | ||
| Proximity to Fault (meter) | 000–1000 m | 29.76 | 27.30 | |
| 2000–3000 | 36.25 | 37.40 | ||
| >3000 | 34.15 | 35.03 |
| ML Model | Hyper-Parameter | Type | Search Space |
|---|---|---|---|
| RF Classifier | n_ estimators | Discrete | [10,100] |
| max _depth | Discrete | [5,50] | |
| min _samples _split | Discrete | [2,11] | |
| min_ samples_ leaf | Discrete | [1,11] | |
| criterion | Categorical | [’gini’, ’entropy’] | |
| max _features | Discrete | [1,64] | |
| SVM Classifier | C | Continuous | [0.1,50] |
| Kernel | Categorical | [’linear’, ’poly’, ’rbf’, ’sigmoid’] | |
| KNN Classifier | n_ neighbors | Discrete | [1,20] |
| HPO Method | Strengths | Limitations | Time Complexity |
|---|---|---|---|
| GS | straightforward | Inefficient without categorical HPs and time-consuming. | O() |
| RS | It is more effective than GS and supports parallelism. | Not taking into account prior outcomes. Ineffective when used with conditional HPs. |
O(n) |
| BO-GP | For continuous HPs, fast convergence speed. | Poor parallelization ability; ineffective with conditional HPs | |
| BO-TPE | Effective with all HP kinds. Maintain conditional dependencies. |
Poor parallelization ability. | |
| GA | All HPs are effective with it, and it doesn't need excellent initialization. | Poor parallelization ability. | |
| PSO | Enable parallelization; be effective with all sorts of HPs. | needs to be initialized properly. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).