Submitted:
17 January 2025
Posted:
17 January 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. Preprocessing and Keyword Extraction
2.3. Embedding Models
2.4. Similarity Computation
2.5. Evaluation Metrics
2.5.1. Shannon Entropy
2.5.2. Gini coefficient
2.6. Software and Environment
3. Results
3.1. Case study
- Article-Level Similarity: For each model, we sought the top ten articles (out of those 5,000) most similar to the test article and identified the journals where they were published.
- Journal-Level Similarity: We computed the average similarity of each journal’s articles to the test article, then ranked the journals accordingly.
3.1.1. Article-Level Findings
3.1.2. Journal-Level Suggestions
3.1.3. Overlap Among Model Recommendations
3.2. Quantitative comparison
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Welch, S.J. Selecting the Right Journal for Your Submission. J Thorac Dis 2012, 4, 336–338. [Google Scholar] [CrossRef]
- Nicholas, D.; Herman, E.; Clark, D.; Boukacem-Zeghmouri, C.; Rodríguez-Bravo, B.; Abrizah, A.; Watkinson, A.; Xu, J.; Sims, D.; Serbina, G.; et al. Choosing the ‘Right’ Journal for Publication: Per-ceptions and Practices of Pandemic-era Early Career Researchers. Learned Publishing 2022, 35, 605–616. [Google Scholar] [CrossRef]
- Larsen, P.O.; von Ins, M. The Rate of Growth in Scientific Publication and the Decline in Coverage Provided by Science Citation Index. Scientometrics 2010, 84, 575–603. [Google Scholar] [CrossRef] [PubMed]
- Kreutz, C.K.; Schenkel, R. Scientific Paper Recommendation Systems: A Literature Review of Recent Publications. International Journal on Digital Libraries 2022, 23, 335–369. [Google Scholar] [CrossRef] [PubMed]
- Park, D.H.; Kim, H.K.; Choi, I.Y.; Kim, J.K. A Literature Review and Classification of Recommender Systems Research. Expert Syst Appl 2012, 39, 10059–10072. [Google Scholar] [CrossRef]
- Qaiser, S.; Ali, R. Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. Int J Comput Appl 2018, 181, 25–29. [Google Scholar] [CrossRef]
- Wang, D.; Liang, Y.; Xu, D.; Feng, X.; Guan, R. A Content-Based Recommender System for Computer Science Publications. Knowl Based Syst 2018, 157, 1–9. [Google Scholar] [CrossRef]
- Medvet, E.; Bartoli, A.; Piccinin, G. Publication Venue Recommendation Based on Paper Abstract. In Proceedings of the 2014 IEEE 26th International Conference on Tools with Artificial Intelligence; IEEE, November 2014; pp. 1004–1010. [Google Scholar]
- Yang, Z.; Davison, B.D. Venue Recommendation: Submitting Your Paper with Style. In Proceedings of the 2012 11th International Conference on Machine Learning and Applications; IEEE, December 2012; pp. 681–686. [Google Scholar]
- Beel, J.; Gipp, B.; Langer, S.; Breitinger, C. Research-Paper Recommender Systems: A Literature Survey. International Journal on Digital Libraries 2016, 17, 305–338. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:, arXiv:1810.04805 2018.
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv Neural Inf Process Syst 2017, 30. [Google Scholar]
- Liu, Q.; Kusner, M.J.; Blunsom, P. A Survey on Contextual Embeddings. arXiv preprint arXiv:, arXiv:2003.07278 2020.
- Noh, J.; Kavuluru, R. Improved Biomedical Word Embeddings in the Transformer Era. J Biomed Inform 2021, 120, 103867. [Google Scholar] [CrossRef] [PubMed]
- Michail, S.; Ledet, J.W.; Alkan, T.Y.; İnce, M.N.; Günay, M. A Journal Recommender for Article Submission Using Transformers. Scientometrics 2023, 128, 1321–1336. [Google Scholar] [CrossRef]
- Galli, C.; Donos, N.; Calciolari, E. Performance of 4 Pre-Trained Sentence Transformer Models in the Semantic Query of a Systematic Review Dataset on Peri-Implantitis. Information 2024, 15, 68. [Google Scholar] [CrossRef]
- Stankevičius, L.; Lukoševičius, M. Extracting Sentence Embeddings from Pretrained Transformer Models. Applied Sciences 2024, 14, 8887. [Google Scholar] [CrossRef]
- Issa, B.; Jasser, M.B.; Chua, H.N.; Hamzah, M. A Comparative Study on Embedding Models for Keyword Extraction Using KeyBERT Method. In Proceedings of the 2023 IEEE 13th International Conference on System Engineering and Technology (ICSET); IEEE; 2023; pp. 40–45. [Google Scholar]
- Jin, Q.; Leaman, R.; Lu, Z. PubMed and beyond: Biomedical Literature Search in the Age of Artificial Intelligence. EBioMedicine 2024, 100. [Google Scholar] [CrossRef]
- Mckinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the Proceedings of the 9th Python in Science Conference; van der Walt, S., Millman, J., Eds.; 2010; pp. 51–56.
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; Facebook, Z.D.; Research, A.I.; Lin, Z.; Desmaison, A.; Antiga, L.; et al. Automatic Differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems; 2017. [Google Scholar]
- Godden, J.W.; Bajorath, J. Analysis of Chemical Information Content Using Shannon Entropy. Reviews in computational chemistry 2007, 23, 263–289. [Google Scholar]
- Vajapeyam, S. Understanding Shannon’s Entropy Metric for Information. arXiv preprint arXiv:, arXiv:1405.2061 2014.
- Dorfman, R. A Formula for the Gini Coefficient. Rev Econ Stat 1979, 61, 146. [Google Scholar] [CrossRef]
- Bisong, E. Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners; Bisong, E., Ed.; Apress: Berkeley, CA, 2019; ISBN 978-1-4842-4470-8. [Google Scholar]
- Chapman, B.; Chang, J. Biopython: Python Tools for Computational Biology. ACM Sigbio Newsletter 2000, 20, 15–19. [Google Scholar] [CrossRef]
- Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array Programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in Pytorch. 2017.
- Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput Sci Eng 2007, 9. [Google Scholar] [CrossRef]
- Waskom, M. Seaborn: Statistical Data Visualization. J Open Source Softw 2021, 6. [Google Scholar] [CrossRef]
- Kiersnowska, Z.; Lemiech-Mirowska, E.; Ginter-Kramarczyk, D.; Kruszelnicka, I.; Michałkiewicz, M.; Marczak, M. Problems of Clostridium Difficile Infection (CDI) in Polish Healthcare Units. Annals of Agricultural and Environmental Medicine 2021, 28, 224–230. [Google Scholar] [CrossRef] [PubMed]
- Khanafer, N.; Voirin, N.; Barbut, F.; Kuijper, E.; Vanhems, P. Hospital Management of Clostridium Difficile Infection: A Review of the Literature. Journal of Hospital Infection 2015, 90, 91–101. [Google Scholar] [CrossRef] [PubMed]
- Legenza, L.; Barnett, S.; Rose, W.; Safdar, N.; Emmerling, T.; Peh, K.H.; Coetzee, R. Clostridium Difficile Infection Perceptions and Practices: A Multicenter Qualitative Study in South Africa. Anti-microb Resist Infect Control 2018, 7, 125. [Google Scholar] [CrossRef]
- Nordling, S.; Anttila, V.J.; Norén, T.; Cockburn, E. The Burden of Clostridium Difficile (CDI) Infection in Hospitals, in Denmark, Finland, Norway And Sweden. Value in Health 2014, 17, A670. [Google Scholar] [CrossRef]
- Luo, R.; Barlam, T.F. Ten-Year Review of Clostridium Difficile Infection in Acute Care Hospitals in the USA, 2005–2014. Journal of Hospital Infection 2018, 98, 40–43. [Google Scholar] [CrossRef] [PubMed]
- Xu, X.; Xie, J.; Sun, J.; Cheng, Y. Factors Affecting Authors’ Manuscript Submission Behaviour: A Systematic Review. Learned Publishing 2023, 36, 285–298. [Google Scholar] [CrossRef]
- Gaston, T.E.; Ounsworth, F.; Senders, T.; Ritchie, S.; Jones, E. Factors Affecting Journal Submission Numbers: Impact Factor and Peer Review Reputation. Learned Publishing 2020, 33, 154–162. [Google Scholar] [CrossRef]
- Worth, P.J. Word Embeddings and Semantic Spaces in Natural Language Processing. Int J Intell Sci 2023, 13, 1–21. [Google Scholar] [CrossRef]
- Yao, Z.; Sun, Y.; Ding, W.; Rao, N.; Xiong, H. Dynamic Word Embeddings for Evolving Semantic Discovery. WSDM, Proceedings of the 11th ACM International Conference on Web Search and Data Mining 2018, 2018-Febua, 673–681. [CrossRef]
- Si, Y.; Wang, J.; Xu, H.; Roberts, K. Enhancing Clinical Concept Extraction with Contextual Embed-dings. Journal of the American Medical Informatics Association 2019, 26, 1297–1304. [Google Scholar] [CrossRef] [PubMed]
- Gutiérrez, L.; Keith, B. A Systematic Literature Review on Word Embeddings. Proceedings of the Trends and Applications in Software Engineering: Proceedings of the 7th International Conference on Software Process Improvement (CIMPS 2018) 7Springer, 2019; pp. 132–141.


| Model | Top Article | Top Avg Journal | Avg Journal Similarity | Journal in Top 10 | Encoding time |
|---|---|---|---|---|---|
| minilm-l12 | Journal of preventive medicine and hygiene | International journal of environmental research and public health | 0.852531 | False | 9” |
| minilm-l6 | The Journal of hospital infection | Risk management and healthcare policy | 0.848661 | False | 9” |
| mpnet | International journal of environmental research and public health | International journal of environmental research and public health | 0.926116 | False | 98” |
| multi-qa-distilbert | European review for medical and pharmacological sciences | Risk management and healthcare policy | 0.858523 | False | 54” |
| roberta | The Journal of hospital infection | International journal of environmental research and public health | 0.886216 | False | 52” |
| Model | Top Article | MaxSimilarity | MeanSim | Shannon’s entropy | Gini | Journal in top 10 |
|---|---|---|---|---|---|---|
| minilm-l12 | 0.799 | 0.765548 | 0.67±0.04 | 3.24 | 0.036077 | 12.5 |
| minilm-l6 | 0.775 | 0.746310 | 0.66±0.04 | 3.24 | 0.034712 | 14.5 |
| mpnet | 0.831 | 0.794203 | 0.71±0.04 | 3.24 | 0.029997 | 14.5 |
| multi-qa-distilbert | 0.755 | 0.717609 | 0.63±0.04 | 3.24 | 0.034632 | 12.5 |
| roberta | 0.803 | 0.773727 | 0.68±0.04 | 3.24 | 0.033623 | 12.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).