Submitted:
16 December 2025
Posted:
17 December 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
3. Materials and Methods
- The full dataset is randomly permuted.
- The data is then divided into, for example, ten mutually disjoint and equally sized subsets.
- In each of the ten iterations, one subset serves as the validation set, while the remaining nine form the training set.
- each element of the dataset appears in the validation set multiple times,
- variability resulting from random data partitioning is significantly reduced,
- model performance estimates are more stable and reliable.
- The training set can be scaled using methods such as Standard Scaling or Min-Max Scaling.
- The scaler parameters are then applied to transform the validation set.
- K-Means,
- Gaussian Mixture Model,
- Agglomerative Clustering.
- For each algorithm, cluster labels are generated only for the training data.
- Based on these labels, the silhouette coefficient is calculated (also only on the training data).
- The algorithm with the highest silhouette value is selected as the optimal clustering method for the given fold.
- if the algorithm had a prediction mechanism (e.g., K-Means, GMM), its .predict() method was used;
- pseudo-labels using the Agglomerative algorithm were determined based on the nearest centroid of the clusters formed in the training set.
-
Supervised classification models are trained on:
- normalized training data,
- pseudo-labels obtained from clustering.
- The models are then evaluated on the validation set, using the pseudo-labels assigned to the test data of the fold.
- for each classification model, the mean and standard deviation of accuracy are calculated,
- the frequency of each clustering algorithm being selected as the “best” in a given fold is analyzed,
- final clustering is performed on the full dataset using the globally selected best algorithm.
- ᵢ - SHAP value for feature i
- - the set of all features
- - a subset of features not containing feature i
- || - the number of elements in set S
- || - the total number of features
- - the model prediction using only the features in set S
- - the model prediction using the features in set S plus feature i
- - values of the features in set S
4. Results
- –
- Green (Above): 10 candidates
- –
- Green (Below): 29 candidates
- –
- Yellow (Above): 0 candidates
- –
- Yellow (Below): 47 candidates
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Faroozan, A. The Evolving Role of Artificial Intelligence in Recruitment: Efficiency, Bias Mitigation, and Ethical Challenges. International Journal For Multidisciplinary Research 2025, 7. [Google Scholar]
- HRCI History of AI in HR 2025. Available online: https://www.hrci.org/community/blogs-and-announcements/hr-leads-business-blog/hr-leads-business/2025/08/14/history-of-ai-in-hr#:~:text=In%20the%201980s%20and%201990s%2C,could%20achieve%20in%20human%20resources (accessed on 1 December 2025).
- Fabris, A.; Baranowska, N.; Dennis, M.J.; Graus, D.; Hacker, P.; Saldivar, J.; Zuiderveen Borgesius, F.; Biega, A.J. Fairness and Bias in Algorithmic Hiring: A Multidisciplinary Survey. ACM Trans Intell Syst Technol 2025, 16, 1–54. [Google Scholar]
- Lavanchy, M.; Reichert, P.; Narayanan, J.; Savani, K. Applicants’ Fairness Perceptions of Algorithm-Driven Hiring Procedures. Journal of Business Ethics 2023, 188, 125–150. [Google Scholar] [CrossRef]
- Liu, Q.; Wan, H.; Yu, H. The Application of Deep Learning in Human Resource Management: A New Perspective on Employee Recruitment and Performance Evaluation. Academic Journal of Management and Social Sciences 2023, 3, 101–104. [Google Scholar] [CrossRef]
- Hemalatha, A.; Kumari, P.B.; Nawaz, N.; Gajenderan, V. Impact of Artificial Intelligence on Recruitment and Selection of Information Technology Companies. In Proceedings of the 2021 international conference on artificial intelligence and smart systems (ICAIS), 2021; pp. 60–66. [Google Scholar]
- Madanchian, M. From Recruitment to Retention: AI Tools for Human Resource Decision-Making. Applied Sciences 2024, 14, 11750. [Google Scholar] [CrossRef]
- Ali, A.; Rafi, N. Enhancing Human Resource Management Through Advanced Decision-Making Strategies: Harnessing The Power Of Artificial Intelligence For Strategic, Data-Driven, And Judicious Choices. Journal of Human Resource Management 2024, 21, 881–889. [Google Scholar]
- HireBee AI in HR Statistics 2024. Available online: https://hirebee.ai/blog/ai-in-hr-statistics/#:~:text=1.%2045,AI%20adoption%20rate%20for%20HRM (accessed on 1 December 2025).
- Zhang, G.; Pan, L.; Tang, F.; Yao, F. Explainable Artificial Intelligence in the Talent Recruitment Process-a Literature Review. Cogent Business & Management 2025, 12, 2570881. [Google Scholar] [CrossRef]
- Chen, Z. Ethics and Discrimination in Artificial Intelligence-Enabled Recruitment Practices. Humanit Soc Sci Commun 2023, 10, 1–12. [Google Scholar] [CrossRef]
- Agbasiere, C.L.; Nze-Igwe, G.R. Algorithmic Fairness in Recruitment: Designing AI-Powered Hiring Tools to Identify and Reduce Biases in Candidate Selection. Path of Science 2025, 11, 5001–5021. [Google Scholar] [CrossRef]
- Dastin, J. Amazon Scraps Secret AI Recruiting Tool That Showed Bias against Women. In Reuters; 2018. [Google Scholar]
- Hunkenschroer, A.L.; Kriebitz, A. Is AI Recruiting (Un) Ethical? A Human Rights Perspective on the Use of AI for Hiring. AI and Ethics 2023, 3, 199–213. [Google Scholar]
- Society for Human Resource Management Fresh SHRM Research Explores Use of Automation and AI in HR 2024.
- Sykorová, Z.; Hague, D.; Dvoulet, O.; Procházka, D.A. Incorporating Artificial Intelligence (AI) into Recruitment Processes: Ethical Considerations. Vilakshan-XIMB Journal of Management 2024. [Google Scholar] [CrossRef]
- Qiang, R.E.N.; Jing, D.U. Harmonizing Innovation and Regulation: The EU Artificial Intelligence Act in the International Trade Context. Computer Law & Security Review 2024, 54, 106028. [Google Scholar] [CrossRef]
- Fisher Phillips European Industry Pushes Back on the EU AI Act Practical Impact 2024. Available online: https://www.fisherphillips.com/en/news-insights/european-industry-pushes-back-on-the-eu-ai-act.html#:~:text=Practical%20Impact (accessed on 1 December 2025).
- Marin Diaz, G.; Galán Hernández, J.J.; Galdón Salvador, J.L. Analyzing Employee Attrition Using Explainable AI for Strategic HR Decision-Making. Mathematics 2023, 11, 4677. [Google Scholar] [CrossRef]
- Thalpage, N. Unlocking the Black Box: Explainable Artificial Intelligence (XAI) for Trust and Transparency in Ai Systems. J. Digit. Art Humanit 2023, 4, 31–36. [Google Scholar] [CrossRef] [PubMed]
- Pinto, G.B.S.; Mello, C.E.; Garcia, A.C.B. Explainable AI in Labor Market Applications 2025. Available online: https://www.scitepress.org/Papers/2025/133841/133841.pdf (accessed on 1 December 2025).
- Nowak, M.; Rabczun, A.; Łopatka, P. Impact of Electrification on African Development-Analysis with Using Grey Systems Theory. Energies (Basel) 2021, 14, 5181. [Google Scholar] [CrossRef]
- Kaliappan, J.; Bagepalli, A.R.; Almal, S.; Mishra, R.; Hu, Y.-C.; Srinivasan, K. Impact of Cross-Validation on Machine Learning Models for Early Detection of Intrauterine Fetal Demise. Diagnostics 2023, 13, 1692. [Google Scholar] [CrossRef]
- Ufeli, C.P.; Sattar, M.U.; Hasan, R.; Mahmood, S. Enhancing Customer Segmentation Through Factor Analysis of Mixed Data (FAMD)-Based Approach Using K-Means and Hierarchical Clustering Algorithms. Information 2025, 16, 441. [Google Scholar] [CrossRef]
- Lodygowski, T.; Szrama, S. Unsupervised Classification and Remaining Useful Life Prediction for Turbofan Engines Using Autoencoders and Gaussian Mixture Models: A Comprehensive Framework for Predictive Maintenance. Applied Sciences 2025, 15, 7884. [Google Scholar] [CrossRef]
- Contreras, J.M.; Molina Portillo, E.; Fernández Luna, J.M. Evaluation of Hierarchical Clustering Methodologies for Identifying Patterns in Timeout Requests in EuroLeague Basketball. Mathematics 2025, 13, 2414. [Google Scholar] [CrossRef]
- Phatcharathada, B.; Srisuradetchai, P. Randomized Feature and Bootstrapped Naive Bayes Classification. Applied System Innovation 2025, 8, 94. [Google Scholar] [CrossRef]
- Nowak, M.; Pawłowska-Nowak, M. Dynamic Pricing Method in the E-Commerce Industry Using Machine Learning. Applied Sciences (2076-3417) 2024, 14. [Google Scholar] [CrossRef]
- Gajowniczek, K.; Zabkowski, T. Interactive Decision Tree Learning and Decision Rule Extraction Based on the ImbTreeEntropy and ImbTreeAUC Packages. Processes 2021, 9, 1107. [Google Scholar] [CrossRef]
- Nowak, M.; Zajkowski, R. An Integrated Structural Equation Modelling and Machine Learning Framework for Measurement Scale Evaluation—Application to Voluntary Turnover Intentions. AppliedMath 2025, 5, 105. [Google Scholar] [CrossRef]
- Nikolić, M.; Nikolić, D.; Stefanović, M.; Koprivica, S.; Stefanović, D. Mitigating Algorithmic Bias Through Probability Calibration: A Case Study on Lead Generation Data. Mathematics 2025, 13, 2183. [Google Scholar] [CrossRef]
- Amamra, S.-A. Random Forest-Based Machine Learning Model Design for 21,700/5 Ah Lithium Cell Health Prediction Using Experimental Data. Physchem 2025, 5, 12. [Google Scholar] [CrossRef]
- Airlangga, G.; Liu, A. A Hybrid Gradient Boosting and Neural Network Model for Predicting Urban Happiness: Integrating Ensemble Learning with Deep Representation for Enhanced Accuracy. Mach Learn Knowl Extr 2025, 7, 4. [Google Scholar] [CrossRef]
- Mosca, E.; Szigeti, F.; Tragianni, S.; Gallagher, D.; Groh, G. SHAP-Based Explanation Methods: A Review for NLP Interpretability. In Proceedings of the Proceedings of the 29th international conference on computational linguistics, 2022; pp. 4593–4603. [Google Scholar]
- Dul, J.; Hauff, S.; Bouncken, R.B. Necessary Condition Analysis (NCA): Review of Research Topics and Guidelines for Good Practice. Review of Managerial Science 2023, 17, 683–714. [Google Scholar] [CrossRef]



| No | Criterion | Criterion Evaluation Scale | |||||
|---|---|---|---|---|---|---|---|
| f1 | Education | primary | secondary | post-secondary | higher (bachelor/engineer) | higher (master's) | PhD or higher |
| f2 | Completion of a degree program related to Data Science (Data Science, Computer Science, Artificial Intelligence, Mathematics) | yes | no | ||||
| f3 | Experience in Data Science or related fields | Number of years | |||||
| f4 | Total professional experience | Number of years | |||||
| f5 | Knowledge of Python | 1 | 2 | 3 | 4 | 5 | |
| f6 | Knowledge of R | 1 | 2 | 3 | 4 | 5 | |
| f7 | Knowledge of SQL | 1 | 2 | 3 | 4 | 5 | |
| f8 | Knowledge of NoSQL databases (MongoDB, Cassandra) | ||||||
| f9 | Knowledge of ML tools (scikit-learn, TensorFlow, Pytorch) | 1 | 2 | 3 | 4 | 5 | |
| f10 | Knowledge of statistics | 1 | 2 | 3 | 4 | 5 | |
| f11 | Dashboard creation skills (PowerBi, Tableau) | 1 | 2 | 3 | 4 | 5 | |
| f12 | Knowledge of Big Data technologies (Spark, Hadoop) | 1 | 2 | 3 | 4 | 5 | |
| f13 | Knowledge of cloud tools (AWS, Azure) | 1 | 2 | 3 | 4 | 5 | |
| f14 | Knowledge of machine learning algorithms | 1 | 2 | 3 | 4 | 5 | |
| f15 | English language proficiency | 1 | 2 | 3 | 4 | 5 | |
| f16 | Knowledge of version control systems (e.g. GIT) | 1 | 2 | 3 | 4 | 5 | |
| f17 | Knowledge of MLOps/CI-CD (Mlflow, Docker, Airflow) | 1 | 2 | 3 | 4 | 5 | |
| f18 | Published scientific articles | yes | no | ||||
| Criterion | f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 |
| Minimum value | Higher (bachelor/engineer) | YES | 2 years | 2 years | 3 | 1 | 2 | 1 | 2 |
| Criterion | f10 | f11 | f12 | f13 | f14 | f15 | f16 | f17 | f18 |
| Minimum value | 3 | 1 | 1 | 1 | 2 | 2 | 2 | 1 | YES |
| Algorithm | Number of iterations in which the algorithm was the most effective based on the silhouette coefficient |
|---|---|
| K-Means | 96 |
| Gaussian Mixture Model | 4 |
| Agglomerative Clustering | 0 |
| Algorithm | Average Accuracy | Standard Deviation of Accuracy |
|---|---|---|
| Naive Bayes Classifier | 0.7933 | 0.23 |
| Linear Support Vector Machine | 0.9750 | 0.09 |
| Nonlinear Support Vector Machine | 0.9717 | 0.09 |
| Decision Trees | 0.8475 | 0.19 |
| k-NN Algorithm | 0.9525 | 0.11 |
| Logistic Regression | 0.9883 | 0.06 |
| Random Forests | 0.9517 | 0.11 |
| Gradient Boosting | 0.8633 | 0.20 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).