Preprint
Article

This version is not peer-reviewed.

Topic Analysis of the Literature Reveals Research Trends: The Case of Periodontics

A peer-reviewed article of this preprint also exists.

Submitted:

21 October 2024

Posted:

21 October 2024

You are already at the latest version

Abstract
Periodontics is a complex field characterized by a constantly growing body of research, posing a challenge for researchers and stakeholders striving to stay abreast of its evolving literature. Tradi-tional bibliometric surveys, while accurate, are labor-intensive and not scalable to meet the demands of such rapidly expanding domains. The aim of this study was to explore the uses of a top-ic-modeling algorithm to investigate the structure of the periodontics research field. To this purpose, we employed BERTopic to automatically sift through a substantial corpus of periodontics manu-scripts published in MEDLINE from 2009 to 2024 and identify topics using two sets of settings, to get a broader and high detailed view of the field, respectively. BERTopic identified and categorized 31 topics, unveiling the internal structure and the relations between publication macroareas. When we used the high granularity settings, we identified 2428 topics, which revealed hotspots within the field. Specific regression highlighted novel research areas such as artificial intelligence and exo-somes. In conclusion, BERTopic offers a flexible tool to provide insights into periodontics, its most recent trends, and serve as a useful instrument to approach the field and direct future research.
Keywords: 
;  ;  ;  ;  

1. Introduction

Periodontics is a discipline that aims at preserving - and possibly restoring - the integrity of the supporting structures of the teeth [1]. This specialized branch of dentistry operates at the crossroad of various scientific fields, including oral medicine, oral surgery and tissue regeneration [2]. Drawing knowledge and techniques from such diverse areas, periodontics has to rely on multidisciplinary approaches to address the intricate pathophysiology of periodontal tissues [3].
The field of periodontics finds itself amidst a rapid evolution, marked by a significant inflation in its scientific literature over the past decade [4]. This surge, while indicative of progress, also presents a formidable challenge to scholars— the sheer volume of published articles makes the task of retrieving pertinent information and staying abreast of cutting-edge innovations increasingly challenging [5]. This hyperpublication trend, however, mirrors a similar phenomenon that is occurring across all medical disciplines [6], fueled by a multitude of factors that include, besides scientific advancements, a surge in the global scientific community, career incentives toward publication, and the emergence of novel publishing models that support these hypertrophic publishing habits [7].
Investigating such a dynamic and expansive terrain entails then understanding evolving research themes, and the trends that are animating the scientific community. Narrative reviews are and are presumably going to remain – in the foreseeable future - the prime tool to get an overview of any specific topic in the field, but new methods are required to understand the epistemic structure of this whole area of science. Traditional search methodologies, such as manual searches on peer-reviewed journals or popular databases like MEDLINE [8], risk to become less effective when used alone against this overwhelming volume of information traffic, or, using an effective expression by A. Appadurai, against such a hectic infoscape [9]. Automated procedures, on the other hand, e.g. those relying on topic modeling, represent a useful tool for a more comprehensive analysis of the scientific output [10].
Topic modeling is a machine learning task that consists in extracting the subject (the ‘about’) from unlabelled documents [11], i.e., in our case, scientific articles. This allows to automatically screen large datasets of publications, classifying them according to their topics, which could even be useful for faster identifications of articles of interest [12,13]. While various quantitative methods have been employed for this task in the past [14], recent advancements in deep learning, particularly the use of embeddings, have opened new frontiers in neural approaches to topic modeling [15]. Deep learning models, trained on extensive corpora, assign vectorial representations to words or even sentences based on contextual proximity, yielding dense embeddings that encapsulate semantic similarities to hitherto unattained levels of performance [16].
To try an analysis of the – quite intricate - periodontics field, its most relevant lines of research and their diachronic development, the present investigation relied on BERTopic, an advanced algorithm implemented by Grootendorst in 2022 [17], leveraging BERT (Bidirectional Encoder Representations from Transformers) embeddings. BERT, introduced by Google in 2018 [18], is built around the mechanism of attention, and has been quickly able to surpass previous embedding algorithms - e.g. Word2vec [19] or Glove [20] - in several tasks [21,22]. To improve the quality of the topic representation in terms of human readability, we used the OpenHermes-2.5-Mistral Large Language Model, a form of artificial intelligence that is capable of expressing the topic as a brief phrase, instead of chaining a few representative keywords into a topic label, as BERTopic would do by default [23,24,25].
The purpose of this study is therefore to use topic modeling to map out the past and current research pursuits in periodontics, to identify prevailing trends, and contribute to a more profound understanding of this very dynamic field. The overarching goal of our investigation is to provide a tool to dynamically monitor a whole field of dental research, which could prove very useful in directing novel research efforts and allocating resources to promising new areas.

2. Materials and Methods

2.1. Dataset

The production, manipulation, and examination of data were executed utilizing Google Colab Pro notebooks powered by Python 3.10.12 [26] and running on a T4 GPU [27]. The compilation of the corpus was achieved employing the Biopython library [28] through a query-driven exploration of MEDLINE facilitated by the Entrez.esearch function. The query utilized for this exploration was:
periodont*[All Fields] OR parodont[All Fields] OR periodont*[MeSH Terms]
To obtain the data we needed, we implemented an iterative process through Entrez.efetch spanning publication years from 2009 to 2024, and, within each year, querying the database month by month. We thus retrieved PubMed ID (PMID), title, publication year, authors, abstract, and Mesh keywords for all publications, and organized them into a structured pandas dataframe [29].
The analysis of the publications was conducted on their titles. Our decision to place a primary focus on titles stemmed from the recognition that they serve as succinct and concentrated summaries [30]. Authors intentionally craft titles to encapsulate the core theme or essence of their work, making them useful to capture the fundamental topics of each publication [31].

2.2. Data Analysis

The dataset we retrieved from MEDLINE underwent only minimal preprocessing, mainly removing entries without titles. Unlike previous publications, we did not lowercase the titles [32]. Notably, stopwords—common grammatical words devoid of semantic meaning—were intentionally retained to leverage BERTopic's advanced capability in creating contextual embeddings [33].
Embeddings, within natural language processing, are dense vectors that encapsulate the semantic meaning of words within a multidimensional space [34]. Unlike previous embedding algorithms such as word2vec [35], BERT (Bidirectional Encoder Representations from Transformers) can discern the context of a word and generate distinct embeddings for a term based on its contextual meaning [36,37]. This sophistication is particularly useful to understand the - often nuanced - semantics of titles.
Subsequently, we relied on BERTopic to generate a comprehensive list of topics from the titles of publications in the dataset. BERTopic comprises a series of steps that include transformer embedding models, dimensionality reduction, clustering, and cluster tagging through cTF-IDF [29]. BERTopic can accommodate a wide range of embedding models, and for this study, we selected Huggingface’s ‘all-mpnet-base-v2’, which is a freely available model pre-trained on a corpus of over 1 billion token pairs, which we previously used on biomedical datasets [38].
As these embedding models are multidimensional, they need to undergo a dimensionality reduction algorithm, for more efficient clustering. We relied on UMAP (Uniform Manifold Approximation and Projection), which is based on mathematical concepts from topology [39], by constructing a high-dimensional graph representing the original data, and then optimizing the layout of this graph in lower-dimensional space. Documents were then clustered with HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise), which was applied to detect areas of higher semantic density in the meaning space, creating clusters (and thus topics) in an unsupervised fashion [40]. cTf-Idf was then applied to highlight topic keywords of higher saliency in the cluster. Tf-Idf, a well-known algorithm in computational linguistics, is commonly used to generate document vectors by increasing the weight given to specific words and smoothing the importance of very common words, such as grammatical words [41]. cTf-Idf differs from Tf-Idf as it is weighted by the term frequency in a cluster of documents rather than in a single document [42].
BERTopic allows for topic fine tuning, through the use of additional representation models, and we adopted the KeyBERT and Maximal Marginal Relevance (MMR) models. KeyBERT is an algorithm developed by Maarten Grootendorst – BERTopic’s creator – that uses the transformers library to extract keywords efficiently [43]. Briefly, once the documents to analyze have been embedded using a pre-trained BERT model, keywords and n-grams are extracted from them using Bag of Words approaches (such as Tf-Idf) [44], embedded too, and then the similarity between the document embeddings and the keyword embeddings is compared and the keywords with the highest degree of similarity are selected. The MMR model is designed to select appropriate keywords with higher overall diversity [45].

2.3. LLM labeling

We also used the OpenHermes-2.5-Mistral Large Language Model [46] to generate more readable labels for the topics, instead of a sequence of keywords. A large language model (LLM) is a form of artificial intelligence engineered to understand, interpret, and produce human language in a manner that is coherent, contextually appropriate, and similar to human-like responses [47]. As these models are often quite massive and require a vast use of computational resources, more recently, quantized LLMs have appeared [48] and we chose the OpenHermes-2.5-Mistral-7B-GGUF/openhermes-2.5-mistral-7b.Q4_K_M.gguf quantization, freely available on Huggingface.com. With quantization, the set of values of a model’s parameters are mapped to a smaller set, in our case 32-bit parameters are reduced to 4 bit parameters, with a relatively small loss of performance [49]. To work, LLMs need a prompt, or input from the users [50]. This prompt acts as a starting point for the model to generate relevant text based on the information provided. We set the following prompt:
""" Q:
I have a topic that contains the following documents:
[DOCUMENTS]
The topic is described by the following keywords: '[KEYWORDS]'.
Based on the above information, can you give a short label of the topic of at most 5 words?
A:
"""
Stopwords were subsequently removed, but only after topic creation, using the sklearn Countvectorizer function [51].
To get an overview of the structure of periodontal research, after some manual tuning, we decided to use the following set-up for UMAP dimensionality reduction and HDBSCAN clustering:
- UMAP metric: cosine distance;
- size of the neighborhood: 25;
- number of components: 10;
- HDBSCAN clustering metric: Euclidean;
- minimum cluster size: 250.
To get a high granularity insight into periodontics topics, we changed BERTopic parameters as follows:
- UMAP metric: cosine distance;
- size of the neighborhood: 25;
- number of components: 5;
- HDBSCAN clustering metric: Euclidean;
- minimum cluster size: 5.
Data were visualized using BERTopic’s inbuilt functions, and the matplotlib [52] and seaborn libraries [53]. The chordplot was created using the bokeh library and the chord library by Holoviews [54]. To investigate trends in research topics, we applied a linear regression model through the scipy Linregress library [55], with the publication year in different time intervals as the independent variable and the number of papers in a given topic as the dependent variable. The slope was considered an indicator of how quickly the number of papers belonging to a given topic rose in the time interval.

3. Results

3.1 Low granularity analysis – setting the stage
The generated dataset comprised 93971 articles published from 2009 to 2024. We analyzed the articles published in 2024 separately, because the number of published papers was not comparable to that of the preceding years. Unsurprisingly, the distribution of papers over the years showed a progressive increase in the number of publications in the field (Figure 1), which corresponds to what is generally known about the life science and biomedical sectors.
Topic modeling algorithms can be tuned by operators using many parameters, which affect the number of topics the system can identify. To get a macroscopic overview of the research space in periodontics, we first set the minimum cluster size at n=250, which means that the HDBSCAN clustering step only identified density areas of at least 250 papers as a distinct group. With these settings, BERTopic identified 31 low granularity topics (Supplementary Table S1). Each topic was indicated by an integer number from 0 to 29, including a “-1” null topic, where all the unclassified documents were collected. This group alone contained 43271 titles, which is almost half of the total number of papers. Far from representing a failure of the algorithm, this high number of unclassified papers is the direct consequence of setting such a high threshold for clusters, i.e. it means that these 43271 articles did not contain topics that consisted of at least 250 manuscripts.
As a default, BERTopic creates document clusters based on the similarity in their embedding representations and uses the four most frequent keywords that are found in each cluster to describe them, and this then constitutes a topic. Besides this crude method, we included additional topic representations obtained by applying the KeyBERT algorithm to improve keyword quality, the MMR algorithms that is used to increase keyword diversity, thus possibly capturing more nuances of the topic, and the LLM label, whose purpose is to get a more immediate, comprehensive, and overall ‘human’ description of the topic ( Table S1).
The most common topic (indicated as #0) refers, probably unsurprisingly so, to periodontal regeneration and stem cells (#0 Periodontal Stem Cells Regeneration), while topic #1 Peri-Implant Soft Tissue Stability revolves around implants, an area that is admittedly only tangentially related to periodontics (but close enough for the Pubmed search to pick it up and include it in the dataset we used) , while #2 Oral Health & Quality of Life focuses on oral health and periodontal diseases. Simply going through a ranking of topics, however, does not provide the kind of insight we were aiming to, and we set off to investigate the topological structure of research topics in periodontics within the semantic space. This is made possible by the very nature of embeddings, which, once properly reduced to 2 dimensional vectors, can be used as cartesian coordinates.

3.2. Low magnification overview of the research landscape

Figure 2, Figure 3, Figure 4 and Figure 5 illustrate the distribution of these low granularity topic clusters within a cartesian semantic space, as identified by BERTopic low granularity settings, with their LLM labels. LLM was preferred over BERTopic’s default labels in our iconography because it provides a more readable rendering of the topics. Each topic in Fig. 2-5 is represented as a grey circle, and its size is proportionate to the number of items it includes. The proximity of the circles indicates their semantic closeness, with the distance between clusters reflecting topic divergence. Some topics appear to be very closely related - even overlapping -, so that 4 main topic groupings or clusters can be identified (Fig. 2A).
The first cluster (Fig. 2B) contains 7 topics, including a broad and vast one (#2 Oral Health & Quality of Life, n=6649), about the epidemiology of periodontal diseases, as suggested by its descriptor keywords:
KeyBERT ['periodontal health', 'oral health', 'periodontal diseases', 'periodontal disease', 'periodontitis', 'periodontal therapy', 'dental care', 'periodontal', 'oral hygiene', 'aggressive periodontitis'],
and as confirmed by browsing through the titles in this topic, e.g.
Disparities and social determinants of periodontal diseases [56],
Validity of individual self-report oral health measures in assessing periodontitis for causal research applications [57],
Periodontal Health Knowledge and Oral Health-Related Quality of Life in Caribbean Adults [58]
This cluster contains also topics related to the prevention of periodontal disease, i.e. #23 Toothbrush Plaque Removal, which includes papers such as:
Effects of professional toothbrushing among patients with gingivitis [59],
and #28 Dentin Hypersensitivity Management, e.g.:
Effect of milk as a mouthwash on dentin hypersensitivity after non-surgical periodontal treatment [60]
This topic cluster further includes research on the effects of health conditions or habits on periodontal disease, i.e. topics #25 HIV and Periodontal Disease, #19 COVID-19 and Dental Practice, or #13 Smoking and Periodontal Disease.
Interestingly, this cluster of topics contains also topic #14 Periodontal disease and pregnancy complications, which again has marked epidemiological traits, e.g.:
A Six-Month Single-Center Study in 2021 on Oral Manifestations during Pregnancy in Bhubaneswar, India [61],
Unfavourable beliefs about oral health and safety of dental care during pregnancy: a systematic review [62], or
Periodontal pathogens of the interdental microbiota in a 3 month pregnant population with an intact periodontium [63]
Cluster 2 (Fig. 3A) contains 10 topics that are more closely related to surgery, surgical methods, and its outcomes. A closer look at its topics (Fig. 3B) indicates that this topic cluster includes a vast topic (n=7847) related to Peri-Implant soft tissues (#1 Peri-Implant Soft Tissue Stability). If we examine its descriptors more carefully, it becomes apparent that LLM may have only partially captured the nature of this topic, as its KeyBERT keywords are:
['implant placement', 'dental implant', 'dental implants', 'implants placed', 'immediate implant', 'peri implant', 'peri implantitis', 'ridge augmentation', 'alveolar ridge', 'implant supported'],
indicating that the articles of this topic unit are more generally related to implants and implant-related challenges and not as focused on soft tissue stability as LLM suggested. A quick glance at randomly chosen titles from this group confirm this impression, e.g.:
Predictive factors for the treatment success of peri-implantitis: a protocol for a prospective cohort study [64],
Evaluation of Peri-Implant Parameters and Functional Outcome of Immediately Placed and Loaded Mandibular Overdentures: A 5-year Follow-up Study [65],
Immediate implant placement into infected and noninfected extraction sockets: a pilot study [66].
It is thus less surprising that BERTopic identifies another closely associated topic, #15 Sinus Augmentation, in the same cluster, together with #21 Titanium Surface Studies. This cluster also comprises topic #7 Bond Strength of Dental Restorations, which is described with a surprisingly specific label by LLM, though KeyBERT and MMR once again reveal a broader scope:
KeyBERT ['ceramic crowns', 'resin composites', 'dental prostheses', 'composite resin', 'resin composite', 'composite restorations', 'denture', 'resins', 'zirconia crowns', 'resin based']
MMR ['zirconia', 'resin', 'composite', 'ceramic', 'restorations', 'strength', 'crowns', 'bond', 'bond strength', 'adhesive']
and title examination confirms this impression:
The use of zirconium and feldspathic porcelain in the management of the severely worn dentition: a case report [67], or
Influence of hydrothermal aging on the shear bond strength of 3D printed denture-base resin to different relining materials [68].
This topic, when considered as a whole, appears thus to contain mostly material-centered reports.
This same cluster contains another substantial topic (#3 Giant Cell Granuloma Cases, n=3796), which, despite the focus on giant cell granuloma, as highlighted in the LLM-generated description, actually contains reports on a much vaster array of oral diseases:
KeyBERT ['cell granuloma', 'granuloma', 'pyogenic granuloma', 'gingival fibromatosis', 'cell carcinoma', 'ossifying fibroma', 'giant cell', 'squamous cell', 'ameloblastoma', 'gingival']
A representative selection of titles supports this view:
Diagnosis and management of exuberant palatal pyogenic granuloma in a systemically compromised patient - Case report [69],
Radicular Cyst: A Cystic Lesion Involving the Hard Palate [70],
Management of Chronic Inflammatory Gingival Enlargement: A Short Review and Case Report.”, which firmly situates this topic group in the Oral Surgery field [71].
This at least partially explains why this topic is semantically close to #6 Cleft Lip and Palate Treatment, as they are both eminently surgical topics. This topic cluster contains 5 additional topics (#4 Antimicrobial Photodynamic Therapy with Diode Laser, #12 Gingival Recession Treatment, #9 Root Canal Therapy Outcomes, #29 Smile Esthetics, and #4 Cone Beam Computed Tomography Applications), which are mostly centered on surgical approaches to periodontal tissue and tend to concentrate in the right part of the diagram.
Figure 4 shows the structure of the third low granularity topic cluster, which contains 5 topics, broadly related to the microbiology of periodontal diseases (#6 Porphyromonas Gingivalis Effects, #16 Periodontal pathogens and inflammation, #11 Oral Microbiome and Health), mouthwashes (#8 Chlorhexidine & herbal mouthwash), which are intuitively associated to the reduction of the microbiological load, but also a more general topic on periodontal health and probiotics (#17 Probiotics Periodontal Health). This group too contains two closely related topics, #6 and #16, and, judging by the LLM label alone, it could be assumed that topic #6 could be a subset of topic #16.
It is therefore once again necessary to investigate the content of these two topics by comparing their keyword descriptors; topic #16 is characterized by the following keywords:
KeyBERT ['aggregatibacter actinomycetemcomitans', 'actinomycetemcomitans fusobacterium', 'induced aggregatibacter', 'actinomycetemcomitans leukotoxin', 'actinomycetemcomitans infection', 'pathogen aggregatibacter', 'actinomyces', 'actinobacillus', 'actinomycosis', 'actinomycetemcomitans']
The great majority of these keywords revolve around one single bacterial species and would suggest that this topic could be most aptly labelled after it. However, the MMR algorithm is specifically designed to increase the diversity in the keywords used for the topic representation, and, in this case, it yielded:
MMR ['aggregatibacter', 'actinomycetemcomitans', 'aggregatibacter actinomycetemcomitans', 'fusobacterium', 'nucleatum', 'fusobacterium nucleatum', 'prevotella', 'jp2', 'leukotoxin', 'serotype'].
MMR thus reveals that this topic contains what could be broadly considered as articles about oral microbiology. A glance at a selection of titles for this group confirms that reports on Aggregatibacter actinomycetemcomitans are indeed very numerous, but other bacterial species have been investigated as well:
The prevalence of Fusobacterium nucleatum subspecies in the oral cavity stratifies by local health status [72],
Bacteriome analysis of Aggregatibacter actinomycetemcomitans-JP2 genotype-associated Grade C periodontitis in Moroccan adolescents [73],
The role of NLRP3 in regulating gingival epithelial cell responses evoked by Aggregatibacter actinomycetemcomitans [74],
Works on Porphyromonas Gingivalis are, on the contrary, decisively predominant in topic #6, as in:
Gingival fibroblast activation by Porphyromonas gingivalis is driven by TLR2 and is independent of the LPS-TLR4 axis [75], or
Emergence of Antibiotic-Resistant Porphyromonas gingivalis in United States Periodontitis Patients [76].
Even the MMR algorithm does not detect any extra bacterial species among the keywords for this group:
MMR ['porphyromonas', 'porphyromonas gingivalis', 'gingivalis', 'lipopolysaccharide', 'gingivalis lipopolysaccharide', 'cells', 'induced', 'human', 'expression', 'response'].
Interestingly, works on P. Gingivalis have been clustered in a topic of their own, and do not thus represent just another bacterial species in a larger microbiology group. This is most likely a consequence of the extreme abundance of this literature (n=1792 in this dataset), which made possible for BERTopic to create a topic just for these works, isolating them from the rest of the microbiological literature (topic #16 comprises only 664 articles).
Figure 5 shows the last topic group, which mostly contains Periomedicine topics (#10 Diabetes and Periodontal Disease, #18 Periodontal-Cardiovascular Disease Link, #20 Rheumatoid Arthritis and Periodontal Disease) and topics centered on the association of periodontal disease with systemic conditions (and possibly more specifically the effects of periodontal disease on systemic or diseases or diseases localized in other regions of the organism), such as #24 Periodontitis-CKD association, #27 Bisphosphonate-related Osteonecrosis, #26 Vitamin D and Periodontitis, but also, and maybe unexpectedly, a very large topic #0 Periodontal Stem Cells Regeneration (n=11715!). At a closer inspection, its keywords are as follows:
KeyBERT ['periodontal regeneration', 'periodontal tissue', 'human periodontal', 'periodontal ligament', 'bone regeneration', 'osteogenic differentiation', 'stem cells', 'periodontal disease', 'stem cell', 'tissue regeneration']
MMR ['cells', 'periodontal', 'stem', 'stem cells', 'ligament', 'periodontal ligament', 'bone', 'human', 'regeneration', 'periodontitis']
which confirms that this topic is mostly related to periodontal regeneration and its cellular basis. Representative titles conform to this view:
Multipotent adult progenitor cells acquire periodontal ligament characteristics in vivo [77],
Novel gene-activated matrix with embedded chitosan/plasmid DNA nanoparticles encoding PDGF for periodontal tissue engineering [78],
Cementogenesis and the induction of periodontal tissue regeneration by the osteogenic proteins of the transforming growth factor-beta superfamily [79],
although in vitro or wet lab subjects are also present in this group, as:
The spatial transcriptomic landscape of human gingiva in health and periodontitis [80], or
Emerging roles of exosomes in oral diseases progression [81].
It can be speculated that the focus on cellular and molecular mechanisms might have caused BERTopic to cluster this topic in this otherwise differently oriented group.
3.3 Relations between low granularity topics
To gain deeper insights into how these research areas are related to each other, we ran a cosine similarity check on embeddings of the titles in the dataset, after averaging them by topic, and we identified all topics that exhibited a similarity >0.9.
We represented the associations as a chord plot in Figure 6, which visually represents how papers in topic # Probiotics Periodontal Health on average unsurprisingly display a high degree of similarity to papers in both topics #11 Oral Microbiome and Health and #8 Clorhexidine and herbal mouthwash, given the strong accent on the microbiological dimension of periodontics. Similarly intuitive is the association between topic #18 Periodontal-Cardiovascular disease link and both topics #24 Periodontal-CKD association and #10 Diabetes and Periodontal disease, as they are all centered on the link between periodontal health and systemic conditions. Papers in topic #23 Toothbrush Plaque Removal display high similarities to papers in topic #2 Oral Health Quality of Life, while only apparently surprising is the association between #9 Root Canal Therapy Outcomes and topic #4 Cone Beam Computed Tomography Applications (which is also related to topic #1 Peri-Implant Soft Tissue Stability), because topic #4 also includes radiographic studies, which focus on techniques that are necessary to endodontics diagnosis.
3.4 Zooming in on research niches – increasing granularity
This low level of detail, however, is insufficient to provide useful insights into the development of periodontal research, and it leaves many articles unclustered in the null topic group. To address that, we changed the settings of the dimensionality reduction protocol, by making it more sensitive to the local context (UMAP n_neighbors=5) and we reduced the minimum acceptable cluster size in the HDBSCAN algorithm to 5. As a result, we obtained 2552 topics, including a -1 null topic group which now consisted of 27793 articles.
Having so many topics makes a comprehensive analysis more difficult and poses some caveats. If we increase the sensitivity of the algorithm to more subtle areas of semantic density, we risk picking up local structures in the data that go beyond substantial differences in topics and may be generated by e.g. lexical differences. Just at a brief visual inspection, it becomes apparent that exploding the number of topics may lead to having very close topics, such as topic #13 Bisphosphonate-related Osteonecrosis and topic #72 Osteonecrosis Medication Jaw. In such cases, comparing the descriptor keywords may again help to understand whether BERTopic is picking up real differences or if it is just noise. BERTopic does possess a function to merge desired topics, if the investigators deem it necessary, but that, on the other hand, is also risky, as it may cancel subtle meaning nuances offered by the algorithm.
To approach this problem systematically, we encoded the LLM tags with the all-mpnet-base-v2 sentence transformer model and we ran a cosine similarity check using the util.community_detection() tool of the sentence_transformers library, with a threshold of 0.9, which yielded a list of very similar LLM topics. After examining the topics and their KeyBERT keyword descriptors, we decided that 85 mergings were appropriate and thus obtained 2428 new topics, plus a -1 null topic. Table 1 shows the top 10 Topics by size with this higher granularity analysis after merging, while Supplementary table 2 contains the list of the merged topics.
With these parameters, the largest topic of investigation by total number of publications is #0 Periodontitis and Porphyromonas Gingivalis, with 1870 articles, followed by #1 Sinus Floor Elevation with 899 titles and #2 Periodontitis and Pregnancy Outcomes with 742 papers, which are topics that, unsurprisingly, were already present in our previous low granularity analysis.
Instead of focusing on assessing the semantic structure of the whole dataset of 2428 topics, we decided to investigate their evolution to identify research trends that allowed us to cast some lights into the dynamics of the field. The number of topics increased progressively, from 1198 Topics in 2009 to 1745 topics in 2023 (Fig. S1), meaning that some topics have emerged in recent years, but also that not all topics survived until 2023, because not all 2537 labelled topics are represented in 2023.
3.5 A changing mosaic: the evolution of research topics
Most topics comprise articles that were published in the whole range of years from 2009 to 2023, but some topics are very recent, and they only started to be investigated more recently than 2009, as shown in Fig. 7A, which summarizes the number of topics that appeared by year.
Table 2 summarizes the 12 most recent research areas that appeared in 2021 or 2020, as identified by BERTopic. Topics #1438, #1440, and #167 (COVID Periodontitis, Mouse SARS-CoV- Infection Models and Periodontitis COVID- Associations) are, unsurprisingly, among the youngest topics, because their oldest papers date to 2020, when the COVID-19 pandemics appeared.
Similarly, that same year, articles were first published on topic #1421 Artificial Intelligence in Dentistry, #2212 Mediterranean Diet and Gingival Health, on topic #2216 Photobiomodulation in Orthodontics, #1540 Ferroptosis and Periodontitis: Research Progress, #2348 NLRP inflammasome role in bone diseases and vascular calcification, #2224 Oral health and work productivity in Japanese workers but also on #1846 Submucosal injection for orthodontic acceleration. Some topics, however, are even younger and appeared only in 2021, i.e. the implant-related #2113 Nanocurcumin Implant Healing.
Similarly, some topics have been abandoned, over the year (Figure 7B). The last papers of topic #2119 Dentine Hypersensitivity Studies in China that BERTopic identified in our dataset date back to 2012, and BERTopic did not identify any article belonging to topic #1680 Fcgamma Receptors and Periodontitis Polymorphisms, #2094 Nicotine-induced periodontitis, #2376 Periodontal wound healing with rhGDF after 2013. Over the years, most topics have exhibited some growth, although with noticeable oscillations. Figure 8 shows the number of publications appeared in academic journals and indexed in MEDLINE for the top 10 Topics of our dataset.
Topic #0 Periodontitis and Porphyromonas Gingivalis, and #1 Sinus Floor Elevation have slightly but steadily grown in the analyzed time frame, and particularly so in the last 5 years, while topic #2 Periodontitis and Pregnancy Outcomes has experienced a publication peak around 2013, has then decreased, and only in the last few years has started to show some increase in number of publications per year. Unsurprisingly, #4 COVID Dentistry has appeared only in 2020. Noticeably however, it peaked in 2021 and has then experienced a downturn, with a strong decrement in the number of publications in 2023. Other topics, e.g. #5 Cleft Lip or #7 Aggregatibacter Actinomycetemcomitans have remained remarkably stable over the years.
3.6 Uncovering trends
To systematically examine publication trends in periodontal research over the whole dataset, articles from 2009 to 2023 were analyzed for each topic. Linear regression was employed to find the best-fitting model, with the slope of the equation representing the rate of growth for a specific topic. This systematic approach provides valuable insights into the dynamic trajectories of various research topics within the field of periodontics over the two-decade period.
Supplementary Table 3 shows the top 20 topics, sorted by their slope, in the 2009-2023 interval. The exploration of COVID-19's impact stands out as the most rapidly expanding subject in periodontics, evidenced by a steep slope of 11.87 (Pearson’s r = 0.79) and 449 associated papers (Figure 9A). As already evidenced in Figure 8, however, the number of articled published in this field peaked around the year 2021, and is now dropping; this slope value, therefore, should be interpreted with due caution. Following closely is the domain of deep learning, #154 Deep Learning Dentistry Diagnosis (Fig. 9B), comprising 71 papers, with a notable growth rate indicated by a slope of 4.97 (Pearson’s r = 0.91). BERTopic identified another A.I.-related topic, which stands out for its rapid growth, i.e. #174 AI in Dentistry (n=67).
The two topics are admittedly close, and we must refer to their descriptor keywords to get some insights into their differences. As for #154 Deep Learning Dentistry Diagnosis we have the following keywords:
KeyBERT ['detection periodontal', 'prediction periodontal', 'dental imaging', 'tooth segmentation', 'periodontitis deep', 'classification dental', 'automatic dental', 'lesions dentnet', 'learning dental', 'deep convolutional']
Which indicates that this group is mostly composed by articles that investigate the use of artificial neural network for image analysis in periodontitis, e.g.
A two-stage deep learning architecture for radiographic staging of periodontal bone loss [82]
For #174 AI in Dentistry, we have the following keywords:
KeyBERT ['intelligence dentistry', 'dentistry artificial', 'intelligence periodontal', 'dental research', 'intelligence dental', 'intelligence periodontology', 'dental diagnosis', 'assisted dental', 'dental monitoring', 'endodontics qualitative']
Which suggests the inclusion of more general manuscripts on the different applications of artificial intelligence, such as e.g.
Artificial intelligence in dental education: ChatGPT's performance on the periodontic in-service examination [83]
Interestingly, the fifth most rapidly growing topic is again centered on the pandemics (#167 Periodontitis COVID- Associations) with a slope of 3.30 (Pearson’s r= 0.74).
Another swiftly advancing topic is exosomes, with topic #102 Stem cell-derived exosomes for bone regeneration (Fig.9C), a relatively recent area of interest with 96 papers and a slope of 3.70, showcasing a strong correlation (Pearson’s r = 0.99). Topic #282 Extracellular Vesicle Regeneration is also related to exosomes, though it forms a smaller niche of just 48 articles focused on the applications of exosomes for therapy and it ranks 11th for fast growth. While employing a linear model for regression offers simplicity, it may not fully capture the nuances in annual paper production over a 15-year interval, especially considering that many of the growing topics are younger. To address this, we opted to delve into a shorter time interval for a comprehensive understanding of the evolving trends.
Table 3 zeroes in on a more specific timeframe, covering articles only from 2019 to 2023. Covid-19 (Fig. 9D) still exhibits the higher slope value (20.9 but its Pearson’s r = 0.60, which reflects the decrease in the number of papers belonging to this topic in the last 2 years), trailed at some distance by #174 AI in Dentistry (slope = 7.70, Pearson’s r = 0.93, Fig. 9E), which now precedes topic #154 Deep Learning Dentistry Diagnosis, which, however, still has a high slope (6.1 with pearson’s r=0.92). Topic #21 Periodontitis-Alzheimers Disease (226 papers) emerges as the third fastest-growing topic with a slope of 7.70 (Pearson’s r = 0.90, Fig. 9F), followed by research on probiotics (topic #23 Probiotics and periodontal health, slope=6.3, r=0.92) and two closely ranking topics about periodontal soft tissues, topic #26 Gingival Thickness Assessment and topic #6 Gingival Recession Treatment, respectively with slopes= 5.1 and 5. When only this short period is considered, a few other topics stand out, such as #18 Subgingival Microbiome, which comprises the non-negligible number of 252 articles, and is growing with a slope=4.3 (Pearson’s r=0.68), the considerably big #10 Oral Microbiome Analysis (n= 285), but also the smaller #218 Clear aligners periodontal health, which focuses on periodontal considerations in patients that undergo this orthodontic treatment and includes titles such as:
Periodontal health during clear aligners treatment: a systematic review [84], or
Periodontal and restorative considerations with clear aligner treatment to establish a more favorable restorative environment [85].
When analyzed separately, articles published in 2024 are 444, with 144 papers belonging to the -1 Unlabelled topic group. The most represented topic is #0 Periodontitis and Porphyromonas Gingivalis with 9 articles, followed by 5 articles belonging to cluster #102 Stem cell-derived exosomes for bone regeneration and #21 Periodontitis-Alzheimers Disease, 4 articles belonging to cluster #3 Diabetes and Periodontal Disease, and 3 papers belonging to topics #17 Medicinal Plant Extracts for Oral Care, #135 Diet and Periodontal Disease, #23 Probiotics and periodontal health, #336 3D printed denture base properties, #5 Cleft Lip Nasoalveolar Molding , and #119 Pulmonary Periodontitis Association (Table 4).

4. Discussion

Topic modeling algorithms open up significant possibilities for data analysis, particularly as novel tools such as sentence transformers become more capable of capturing subtle meaning differences within analyzed text corpora [38]. The tool of choice significantly influences the type and quality of outcomes, and in this context, BERTopic emerges with a plethora of parameter customization options, which range from the selection of sentence transformers, dimensionality reduction and clustering preferences to the incorporation of supervised, semi-supervised, or purely guided approaches, thus granting considerable flexibility in refining the modeling process [17]. BERTopic stands out for its ability to handle diverse paper corpora without requiring extensive preprocessing, or text cleaning, enabling investigators to quickly analyze significantly larger datasets compared to the past [15,16].
However, BERTopic presents some challenges, particularly in dealing with a substantial number of unclassified documents marked as -1 in the algorithm output, which exceeded 43000 titles when using large group clustering, and 22000 when we increased the granularity of our algorithm. These unclassified documents derive from our choice of using HDBSCAN as clustering algorithm for BERTopic. HDBSCAN uses points density to compute clusters in an unsupervised fashion and has the clear advantage of not requiring operators to pre-set the number of clusters, unlike well-known unsupervised algorithms such as K-Means. On the other hand, HDBSCAN will not force, as K-Means would do, every point into a cluster, and will keep data points that do not fit into any cluster as unlabelled. So, these unlabelled articles come either from the inability of BERTopic/HDBSCAN to assign these documents to an existing topic group because of their semantic ambiguity, or the inability to form new clusters, despite their semantics, because these would not contain enough manuscripts to meet the threshold. In the first case, the semantics of these titles may not be completely captured by the sentence transformers, and it is possible that better trained encoders might become able to provide better results; in the latter case, however, the -1 null topic group contains articles that investigate an area that is still (or has remained) unexplored. Scaling down the minimum topic size from 250 articles to 5 articles, brought 21000 articles out of the -1 null topic group, indicating that at least some of the papers of this group belonged to small niches. This would advocate for further reducing the topic size, possibly pushing it to its limits, with n=2. This, however, introduces a potential risk of fragmenting the research landscape into an exceedingly high number of groups, which do not substantially differ if not for their lexical choices, or subtleties that fail to provide real insight but only background noise. Investigators should therefore strike a balance between the need for granularity and an excessive parcellation of the documents.
To circumvent this issue and make the readers more aware of the caveats that this method entails, together with its significant potential, we used two sets of settings. The first yielded few, clearly distinguished big groups that could be easily mapped within the research landscape, although they were mostly over-arching themes in periodontics. The second set was an attempt to get an in-depth view of smaller research niches, with the purpose of delving deeper into the vibrant underwood of emerging research trends.
According to the low granularity setting, Periodontics research is arranged along 4 big topic axes, which could be tentatively labelled: 1) Patient management and hygiene; 2) Periodontal (and implant) surgery; 3) Oral microbiology, and 4) Periomedicine. The first and the last topic groups may be more challenging to characterize based on our data, because they appear to overlap to some extent. The first topic cluster, though it contains a topic on Periodontal disease and low birth weight in pregnant women, which would be intuitively perceived as closer to Periomedicine and thus belong to cluster #4, is mostly centered on patient management and the epidemiological aspects of periodontal disease. Two semantic polarities can actually be identified in this first cluster, with one pole comprising topics about toothbrushing and hygiene management (#23 Toothbrush Plaque Removal and #28 Dentin Hypersensitivity Management), and one pole that revolves around patient management by the periodontists, including therapy and the challenges that the recent sanitary emergencies have raised (e.g. #19 COVID-19 and Dental Practice). We have chosen the Periomedicine label for topic cluster #4 to highlight the presence of topics investigating the links between Periodontal disease and systemic diseases, although this cluster also comprises a very large topic on periodontal regeneration and stem cells. It is presumable that the semantic proximity of these groups lies in their focus on the cellular mechanisms of bone physiology and its relation to general metabolism and immune system, as the underlying theme that runs across all these topics. When taken together, the cluster #4 contains two areas of topic density too, the former that is mostly focused on the associations between periodontal disease and systemic diseases (#10 Diabetes and Periodontal Disease, #18 Periodontal-Cardiovascular Disease Link, # 20 Rheumatoid Arthritis and Periodontal Disease, and #24 Periodontitis-CKD association) and one that is centered on bone, bone regeneration, bone osteonecrosis, vitamin D, with topics #0, #27, and #26.
The second topic cluster is hybrid in nature, and it highlights the deep connections that tie periodontics with implant dentistry. Many topics in this cluster focus on endosseous implants and related research (e.g. #21 Titanium Surface Studies), although periodontal surgery, and maxillo-facial surgery of the lip and palate form a constellation of topics gravitating around a core of implant research. Actually, also in the case of this cluster, two main areas of semantic addensation can be identified, with implant dentistry on one side and oral surgery on the other (Fig. 3B).
The third topic cluster, unlike the previous ones, cannot really be divided into further poles of semantic attraction, but its layout rather suggests a continuity of meaning that ranges from topic #6 Porphyromonas Gingivalis Effects to studies on the Oral Microbiome and Priobiotics to studies on Aggregatibacter and other pathogenic bacterial species’ infections to #8 Chlorhexidine and herbal mouthwash. The apparently disconcerting semantic distance between topic #6 Porphyromonas Gingivalis Effects and the apparently close topic #16 Aggregatibacter Actinomycetemcomitans Infections
This low level of granularity highlighted broad themes, a sort of high-level internal structure in periodontics research, but proved insufficiently equipped to reveal research niches and inner dynamics in areas of knowledge expansion. To this purpose, we decided to increase the granularity of our analysis by allowing HDBSCAN to identify small clusters, formed by even just 5 articles; while this did not impose artificially high numbers of topics per se (which could be done in BERTopics), it provided the clustering algorithm with improved sensitivity to identify many more topics, up to 2550 (2428 after merging), and assign over 20000 articles that had been previously classified as -1 null topic to new topics. Applying these new settings, while not without drawbacks, was necessary to uncover small research niches, such as topic #2113 Nanocurcumin Implant Healing, which first appeared in 2021, and isolate growth trends. On the other hand, the strong increase in the number of topics also generated topics that frankly appeared overlapping, both by their LLM descriptors and their keyowrds. Setting an acceptable threshold of similarity is arbitrary and depends on the specific purposes of the investigators, who could be more interested in the focus of the literature on specific clinical or preclinical areas, or even in the way a certain area is epistemologically understood, conceived, and presented to the scientific community. We decided to rely on a relatively hard criterium of similarity, i.e. the cosine similarity in the embeddings of the LLM description tags of the topics, to merge identical topics that exceeded 0.9 similarity. This did not eliminate potentially overlapping topics, but we preferred to maintain a certain degree of similarity to get also an insight into the choice of wordings that have been preferred by investigators over time.
Trends are of particular interest, as they can act as indicators of those areas that are subjected to more intense investigation and are increasingly attracting the attention of the scientific community. Unsurprisingly, the topics related to COVID-19 and dentistry and the association between periodontal disease and COVID-19 pandemics have experienced a steep surge over the last 4 years, as the pandemics broke out and raged all over the world, forcing practitioners, scholars, and legislators alike to re-think the whole healthcare system. However, even in this area, different subsets were identified, with different dynamics. The trend in the last 5 years, as assessed by linear regression, is overall positive, the number of articles dedicated to the Sars-Cov-2 infection and its associations, either epidemiological or pathological to dentistry and periodontitis decreased over the course of 2023, as the emergency slowly subsided.
Technological innovations have strongly increased in the last years too, with topic #174 AI in Dentistry and #154 Deep Learning Dentistry Diagnosis taking the center stage in the middle of the A.I. revolution that is permeating much part of human activities, both in work and leisure realities. A closer inspection of the articles in these two topics reveals that the differences between these groups are indeed as subtle as they sound. Deep learning is a strategy of machine learning that relies on algorithms known as artificial neural networks (ANN). ANNs are structured over several layers of neurons – referred to as deep layers, hence the term deep learning – and they are the basis of the more general research area called Artificial Intelligence, which aims at creating algorithms capable of performing tasks intelligently, i.e. algorithms that can learn and be trained and respond appropriately, as human intelligence would do. Articles in the topic #74 A.I. in Dentistry are usually quite general in nature and include reviews of the different applications for intelligent algorithms in all the areas of dentistry, but some papers in this group actually explore applications based on deep learning architectures. The divide between topic #174 and #154 is therefore sometimes blurred from the technical point of view and is mostly supported by a different wording chosen by the authors, which does, however, reflect a change in attitude that has been occurring in the last few years. When we considered the longer, 15-year interval, topic #154 Deep Learning Dentistry Diagnosis has a higher slope, a sign that the increase in number of publications on ANNs applications in Dentistry comes from afar and corresponds to the (re)birth of Deep Learning after years of hopelessness over the potential of this techniques. However, when we considered a shorter, 5-year window, topic #174 AI in dentistry is leading, as this choice of word is becoming popular in the context of the wider cultural phenomenon of the A.I. revolution and is possibly considered an increasingly appealing way to title a paper, last but not least, to increase its chances of publication, but also because, as in the case with this manuscript, AI is a more wide-encompassing term to define not only the underlying neural architecture (which is becoming a ‘detail’) but also the broader nature of the tools that have been used, e.g. LLMs, which can be fully considered AI.
Topic #102 Stem cell-derived exosomes for bone regeneration is also on the rise, especially in the 15-year time frame, where it ranks 3rd, before #21 Periodontitis-Alzheimers Disease and #23 Probiotics and periodontal health, which have experienced a steep increase in the last 5 years, where they rank in 3rd and 4th position respectively. And when only the papers published in 2024 are considered, which are obviously much less numerous at the time of writing this manuscript, topic #0 Periodontitis and Porphyromonas Gingivalis is leading, followed by Periodontitis-Alzheimers Disease, Stem cell-derived exosomes for bone regeneration and Diabetes and Periodontal Disease, but also Medicinal Plant Extracts for Oral Care, highlighting how the most recent trends are focusing on broadening the context in which periodontal diseases are to be understood, but also on further elucidating the intimate mechanisms that underlie the pathological processes, and evaluating novel treatment venues, including exosomes.

5. Conclusions

The present work demonstrated the use of BERTopic in combination with different representation models, including a LLM, to explore a research field. By using specific sets of settings, we were able to highlight the internal structure of periodontics research, the main areas, and the relations between these research areas. We were also able to zoom in and increase the literature dataset at a finer granularity, which allowed us to isolate individual themes of research and investigate their trend over time, highlighting more recent topics or topics which disappeared. Although this methods still has significant limitations, as a portion of articles are not labelled and are collected in a generic null group, this kind of analysis turns out to be fast, simple and offer potential insights into the composition and dynamics of a whole research field.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org. Table S1: List of the low granularity topics in the 2009-2023 period sorted by topic size (Count= number of paper in the cluster). The table includes the KeyBERT and MMR keywords, and the LLM-generated label; Table S2: List of the low granularity topics in the 2009-2023 period sorted by topic size (Count= number of paper in the cluster). The table includes the KeyBERT and MMR keywords, and the LLM-generated label; Table S3: List of the top 20 topics in the 2009-2023 period sorted by slope of the linear regression of the number of publications/year and the goodness of fit, measured by Pearson’s r.

Author Contributions

Conceptualization, C.G., M.M. and E.C..; methodology, C.G.; software, C.G.; formal analysis, C.G. and M.T.C.; data curation, S.G. and M.T.C.; writing—original draft preparation, C.G. and M.M.; writing—review and editing, E.C.; All the authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Dentino, A.; Lee, S.; Mailhot, J.; Hefti, A.F. Principles of Periodontology. Periodontol 2000 2013, 61, 16–53. [Google Scholar] [CrossRef] [PubMed]
  2. Raj, S.C.; Tabassum, S.; Mahapatra, A.; Patnaik, K. Interdisciplinary Periodontics. In Periodontology-Fundamentals and Clinical Features; IntechOpen, 2021 ISBN 1838806792.
  3. Lyons, K.M.; Darby, I. Interdisciplinary Periodontics: The Multidisciplinary Approach to the Planning and Treatment of Complex Cases. Periodontol 2000 2017, 74, 7–10. [Google Scholar] [CrossRef] [PubMed]
  4. Landhuis, E. Scientific Literature: Information Overload. Nature 2016, 535, 457–458. [Google Scholar] [CrossRef] [PubMed]
  5. Stephens, K.S.; White, B.P. Keeping Up With the Literature: New Solutions for an Old Problem. J Pharm Pract 2022, 08971900221131907. [Google Scholar] [CrossRef]
  6. Larsen, P.; Von Ins, M. The Rate of Growth in Scientific Publication and the Decline in Coverage Provided by Science Citation Index. Scientometrics 2010, 84, 575–603. [Google Scholar] [CrossRef]
  7. Clapham, P. Publish or Perish. Bioscience 2005, 55, 390–391. [Google Scholar] [CrossRef]
  8. Bramer, W.M.; Rethlefsen, M.L.; Kleijnen, J.; Franco, O.H. Optimal Database Combinations for Literature Searches in Systematic Reviews: A Prospective Exploratory Study. Syst Rev 2017, 6, 1–12. [Google Scholar] [CrossRef]
  9. Appadurai, A. Modernity at Large: Cultural Dimensions of Globalization; U of Minnesota Press, 1996; Vol. 1; ISBN 145290006X.
  10. Delen, D.; Crossland, M.D. Seeding the Survey and Analysis of Research Literature with Text Mining. Expert Syst Appl 2008, 34, 1707–1720. [Google Scholar] [CrossRef]
  11. Vayansky, I.; Kumar, S.A.P. A Review of Topic Modeling Methods. Inf Syst 2020, 94, 101582. [Google Scholar] [CrossRef]
  12. Kavvadias, S.; Drosatos, G.; Kaldoudi, E. Supporting Topic Modeling and Trends Analysis in Biomedical Literature. J Biomed Inform 2020, 110, 103574. [Google Scholar] [CrossRef]
  13. Cao, Q.; Cheng, X.; Liao, S. A Comparison Study of Topic Modeling Based Literature Analysis by Using Full Texts and Abstracts of Scientific Articles: A Case of COVID-19 Research. Library Hi Tech 2023, 41, 543–569. [Google Scholar] [CrossRef]
  14. Abdelrazek, A.; Eid, Y.; Gawish, E.; Medhat, W.; Hassan, A. Topic Modeling Algorithms and Applications: A Survey. Inf Syst 2023, 112, 102131. [Google Scholar] [CrossRef]
  15. Kherwa, P.; Bansal, P. Topic Modeling: A Comprehensive Review. ICST Transactions on Scalable Information Systems 2018, 0, 159623. [Google Scholar] [CrossRef]
  16. Basmatkar, P.; Maurya, M. An Overview of Contextual Topic Modeling Using Bidirectional Encoder Representations from Transformers. In; 2022; pp. 489–504.
  17. Grootendorst, M. BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure. arXiv preprint arXiv:2203.05794, arXiv:2203.05794 2022.
  18. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Proceedings of the 2019 Conference of the North; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar]
  19. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781, arXiv:1301.3781 2013.
  20. Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing 2014, 1532–1543. [Google Scholar]
  21. Yuan, W.; Lei, Y.; Guo, X. Research on Text Similarity Calculation Based on BERT and Word2Vec. In Proceedings of the ICETIS 2022; 2022, 7th International Conference on Electronic Technology and Information Science; VDE; pp. 1–4.
  22. Shen, Y.; Liu, J. Comparison of Text Sentiment Analysis Based on Bert and Word2vec. In Proceedings of the 2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer (ICFTIC); IEEE; 2021; pp. 144–147. [Google Scholar]
  23. Thirunavukarasu, A.J.; Ting, D.S.J.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S.W. Large Language Models in Medicine. Nat Med 2023, 29, 1930–1940. [Google Scholar] [CrossRef]
  24. Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y. A Survey on Evaluation of Large Language Models. ACM Trans Intell Syst Technol 2023. [Google Scholar] [CrossRef]
  25. Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D. Emergent Abilities of Large Language Models. arXiv preprint arXiv:2206.07682, arXiv:2206.07682 2022.
  26. Bassi, S. A Primer on Python for Life Science Researchers. PLoS Comput Biol 2007, 3, e199. [Google Scholar] [CrossRef]
  27. Jia, Z.; Maggioni, M.; Smith, J.; Scarpazza, D.P. Dissecting the NVidia Turing T4 GPU via Microbenchmarking. arXiv preprint arXiv:1903.07486, arXiv:1903.07486 2019.
  28. Cock, P.J.A.; Antao, T.; Chang, J.T.; Chapman, B.A.; Cox, C.J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B. Biopython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics. Bioinformatics 2009, 25, 1422–1423. [Google Scholar] [CrossRef]
  29. Mckinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the Proceedings of the 9th Python in Science Conference; pp. 201051–56.
  30. Cook, D.A.; Beckman, T.J.; Bordage, G. A Systematic Review of Titles and Abstracts of Experimental Studies in Medical Education: Many Informative Elements Missing. Med Educ 2007, 41, 1074–1081. [Google Scholar] [CrossRef]
  31. Hartley, J. Planning That Title: Practices and Preferences for Titles with Colons in Academic Articles. Libr Inf Sci Res 2007, 29, 553–568. [Google Scholar] [CrossRef]
  32. Guizzardi, S.; Colangelo, M.T.; Mirandola, P.; Galli, C. Modeling New Trends in Bone Regeneration, Using the BERTopic Approach. Regenerative Med 2023, 18, 719–734. [Google Scholar] [CrossRef] [PubMed]
  33. Saif, H.; Fernandez, M.; He, Y.; Alani, H. On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter. 2014.
  34. Gutiérrez, L.; Keith, B. A Systematic Literature Review on Word Embeddings. In Proceedings of the Trends and Applications in Software Engineering: Proceedings of the 7th International Conference on Software Process Improvement (CIMPS 2018) 7; pp. 2019132–141.
  35. Wang, S.; Zhou, W.; Jiang, C. A Survey of Word Embeddings Based on Deep Learning. Computing 2020, 102, 717–740. [Google Scholar] [CrossRef]
  36. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv Neural Inf Process Syst 2017, 30. [Google Scholar]
  37. Liu, Q.; Kusner, M.J.; Blunsom, P. A Survey on Contextual Embeddings. arXiv preprint arXiv:2003.07278, arXiv:2003.07278 2020.
  38. Galli, C.; Donos, N.; Calciolari, E. Performance of 4 Pre-Trained Sentence Transformer Models in the Semantic Query of a Systematic Review Dataset on Peri-Implantitis. Information 2024, 15, 68. [Google Scholar] [CrossRef]
  39. McInnes, L.; Healy, J.; Melville, J. Umap: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426, arXiv:1802.03426 2018.
  40. McInnes, L.; Healy, J.; Astels, S. Hdbscan: Hierarchical Density Based Clustering. J. Open Source Softw. 2017, 2, 205. [Google Scholar] [CrossRef]
  41. Qaiser, S.; Ali, R. Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. Int J Comput Appl 2018, 181, 25–29. [Google Scholar] [CrossRef]
  42. Xu, D.D.; Wu, S.B. An Improved TFIDF Algorithm in Text Classification. Applied Mechanics and Materials 2014, 651, 2258–2261. [Google Scholar] [CrossRef]
  43. Issa, B.; Jasser, M.B.; Chua, H.N.; Hamzah, M. A Comparative Study on Embedding Models for Keyword Extraction Using KeyBERT Method. In Proceedings of the 2023 IEEE 13th International Conference on System Engineering and Technology (ICSET); IEEE; 2023; pp. 40–45. [Google Scholar]
  44. Zhang, Y.; Jin, R.; Zhou, Z.-H. Understanding Bag-of-Words Model: A Statistical Framework. International journal of machine learning and cybernetics 2010, 1, 43–52. [Google Scholar] [CrossRef]
  45. Bennani-Smires, K.; Musat, C.; Hossmann, A.; Baeriswyl, M.; Jaggi, M. Simple Unsupervised Keyphrase Extraction Using Sentence Embeddings. arXiv preprint arXiv:1801.04470, arXiv:1801.04470 2018.
  46. Teknium Teknium/OpenHermes-2.5-Mistral-7B Available online:. Available online: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B (accessed on 10 February 2024).
  47. Thirunavukarasu, A.J.; Ting, D.S.J.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S.W. Large Language Models in Medicine. Nat Med 2023, 29, 1930–1940. [Google Scholar] [CrossRef]
  48. Park, S.; Choi, J.; Lee, S.; Kang, U. A Comprehensive Survey of Compression Algorithms for Language Models. arXiv preprint arXiv:2401.15347, arXiv:2401.15347 2024.
  49. Kaddour, J.; Harris, J.; Mozes, M.; Bradley, H.; Raileanu, R.; McHardy, R. Challenges and Applications of Large Language Models. arXiv preprint arXiv:2307.10169, arXiv:2307.10169 2023.
  50. Meskó, B. Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial. J Med Internet Res 2023, 25, e50638. [Google Scholar] [CrossRef]
  51. Akre, P.; Malu, R.; Jha, A.; Tekade, Y.; Bisen, W. Sentiment Analysis Using Opinion Mining on Customer Review. International Journal of Engineering and Management Research 2023, 13, 41–44. [Google Scholar]
  52. Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput Sci Eng 2007, 9. [Google Scholar] [CrossRef]
  53. Waskom, M. Seaborn: Statistical Data Visualization. J Open Source Softw 2021, 6. [Google Scholar] [CrossRef]
  54. Lavanya, A.; Gaurav, L.; Sindhuja, S.; Seam, H.; Joydeep, M.; Uppalapati, V.; Ali, W.; SD, V.S. Assessing the Performance of Python Data Visualization Libraries: A Review. 2023.
  55. Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
  56. Albandar, J.M. Disparities and Social Determinants of Periodontal Diseases. Periodontol 2000 2024. [Google Scholar] [CrossRef]
  57. Bond, J.C.; Casey, S.M.; McDonough, R.; McLone, S.G.; Velez, M.; Heaton, B. Validity of Individual Self-report Oral Health Measures in Assessing Periodontitis for Causal Research Applications. J Periodontol 2024. [Google Scholar] [CrossRef]
  58. Collins, J.R.; Rivas-Tumanyan, S.; Santosh, A.B.R.; Boneta, A.E. Periodontal Health Knowledge and Oral Health-Related Quality of Life in Caribbean Adults. Oral Health Prev Dent 2024, 22, 9–22. [Google Scholar]
  59. Noh, M.; Kim, E.; Sakong, J.; Park, E.Y. Effects of Professional Toothbrushing among Patients with Gingivitis. Int J Dent Hyg 2023, 21, 611–617. [Google Scholar] [CrossRef]
  60. Salari, A.; Alavi, F.N.; Aliaghazadeh, K.; Nikkhah, M. Effect of Milk as a Mouthwash on Dentin Hypersensitivity after Non-Surgical Periodontal Treatment. Journal of Advanced Periodontology & Implant Dentistry 2022, 14, 104. [Google Scholar]
  61. Bhuyan, R.; Pati, T.; Panda, N.R.; Mohanty, J.N.; Bhuyan, S.K. A Six-Month Single-Center Study in 2021 on Oral Manifestations during Pregnancy in Bhubaneswar, India. Iran J Med Sci 2023, 48, 350. [Google Scholar]
  62. Kamalabadi, Y.M.; Campbell, M.K.; Zitoun, N.M.; Jessani, A. Unfavourable Beliefs about Oral Health and Safety of Dental Care during Pregnancy: A Systematic Review. BMC Oral Health 2023, 23, 762. [Google Scholar] [CrossRef] [PubMed]
  63. Carrouel, F.; Kanoute, A.; Lvovschi, V.-E.; Bourgeois, D. Periodontal Pathogens of the Interdental Microbiota in a 3 Months Pregnant Population with an Intact Periodontium. Front Microbiol 2023, 14. [Google Scholar] [CrossRef] [PubMed]
  64. Zhu, Y.; Lu, H.; Yang, S.; Liu, Y.; Zhu, P.; Li, P.; De Waal, Y.C.M.; Visser, A.; Tjakkes, G.-H.E.; Li, A. Predictive Factors for the Treatment Success of Peri-Implantitis: A Protocol for a Prospective Cohort Study. BMJ Open 2024, 14, e072443. [Google Scholar] [CrossRef] [PubMed]
  65. AlHelal, A.A.; Alzaid, A.A.; Almujel, S.H.; Alsaloum, M.; Alanazi, K.K.; Althubaitiy, R.O.; Al-Aali, K.A. Evaluation of Peri-Implant Parameters and Functional Outcome of Immediately Placed and Loaded Mandibular Overdentures: A 5-Year Follow-up Study. Oral Health Prev Dent 2024, 22, 23–30. [Google Scholar] [PubMed]
  66. Chang, S.-W.; Shin, S.-Y.; Hong, J.-R.; Yang, S.-M.; Yoo, H.-M.; Park, D.-S.; Oh, T.-S.; Kye, S.-B. Immediate Implant Placement into Infected and Noninfected Extraction Sockets: A Pilot Study. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology 2009, 107, 197–203. [Google Scholar] [CrossRef]
  67. Malkoc, M.A.; Sevimay, M.; Yaprak, E. The Use of Zirconium and Feldspathic Porcelain in the Management of the Severely Worn Dentition: A Case Report. Eur J Dent 2009, 3, 75–78. [Google Scholar] [CrossRef]
  68. Lee, C.-G.; Jin, G.; Lim, J.-H.; Liu, Y.; Afrashtehfar, K.I.; Kim, J.-E. Influence of Hydrothermal Aging on the Shear Bond Strength of 3D Printed Denture-Base Resin to Different Relining Materials. J Mech Behav Biomed Mater 2024, 149, 106221. [Google Scholar] [CrossRef]
  69. Ventura, J.V.L.; Vogel, J. de O.; Cortezzi, E.B. de A.; de Arruda, J.A.A.; Cunha, J.L.S.; Andrade, B.A.B. de; Tenório, J.R. Diagnosis and Management of Exuberant Palatal Pyogenic Granuloma in a Systemically Compromised Patient–Case Report. Special Care in Dentistry.
  70. Rathi, N.; Reche, A.; Agrawal, S.; Agrawal, S.R. Radicular Cyst: A Cystic Lesion Involving the Hard Palate. Cureus 2023, 15. [Google Scholar] [CrossRef]
  71. Sandhu, A.; Jyoti, D.; Malhotra, R.; Phull, T.; Sidhu, H.S.; Nayak, S. Management of Chronic Inflammatory Gingival Enlargement: A Short Review and Case Report. Cureus 2023, 15. [Google Scholar] [CrossRef]
  72. Krieger, M.; AbdelRahman, Y.M.; Choi, D.; Palmer, E.A.; Yoo, A.; McGuire, S.; Kreth, J.; Merritt, J. The Prevalence of Fusobacterium Nucleatum Subspecies in the Oral Cavity Stratifies by Local Health Status. bioRxiv 2023, 2010–2023. [Google Scholar]
  73. Molli, V.L.P.; Kissa, J.; Baraniya, D.; Gharibi, A.; Chen, T.; Al-Hebshi, N.N.; Albandar, J.M. Bacteriome Analysis of Aggregatibacter Actinomycetemcomitans-JP2 Genotype-Associated Grade C Periodontitis in Moroccan Adolescents. Frontiers in Oral Health 2023, 4. [Google Scholar] [CrossRef] [PubMed]
  74. Demirel, K.J.; Wu, R.; Guimaraes, A.N.; Demirel, I. The Role of NLRP3 in Regulating Gingival Epithelial Cell Responses Evoked by Aggregatibacter Actinomycetemcomitans. Cytokine 2023, 169, 156316. [Google Scholar] [CrossRef] [PubMed]
  75. Schuster, A.; Nieboga, E.; Kantorowicz, M.; Lipska, W.; Kaczmarzyk, T.; Potempa, J.; Grabiec, A.M. Gingival Fibroblast Activation by Porphyromonas Gingivalis Is Driven by TLR2 and Is Independent of the LPS-TLR4 Axis. Eur J Immunol 2024, 2350776. [Google Scholar] [CrossRef]
  76. Rams, T.E.; Sautter, J.D.; van Winkelhoff, A.J. Emergence of Antibiotic-Resistant Porphyromonas Gingivalis in United States Periodontitis Patients. Antibiotics 2023, 12, 1584. [Google Scholar] [CrossRef]
  77. Kramer, P.R.; Kramer, S.F.; Puri, J.; Grogan, D.; Guan, G. Multipotent Adult Progenitor Cells Acquire Periodontal Ligament Characteristics in Vivo. Stem Cells Dev 2009, 18, 67–76. [Google Scholar] [CrossRef]
  78. Peng, L.; Cheng, X.; Zhuo, R.; Lan, J.; Wang, Y.; Shi, B.; Li, S. Novel Gene-activated Matrix with Embedded Chitosan/Plasmid DNA Nanoparticles Encoding PDGF for Periodontal Tissue Engineering. Journal of Biomedical Materials Research Part A: An Official Journal of The Society for Biomaterials, The Japanese Society for Biomaterials, and The Australian Society for Biomaterials and the Korean Society for Biomaterials 2009, 90, 564–576. [Google Scholar] [CrossRef]
  79. Ripamonti, U.; Petit, J.; Teare, J. Cementogenesis and the Induction of Periodontal Tissue Regeneration by the Osteogenic Proteins of the Transforming Growth Factor-β Superfamily. J Periodontal Res 2009, 44, 141–152. [Google Scholar] [CrossRef]
  80. Shen, Z.; Zhang, R.; Huang, Y.; Chen, J.; Yu, M.; Li, C.; Zhang, Y.; Chen, L.; Huang, X.; Yang, J. The Spatial Transcriptomic Landscape of Human Gingiva in Health and Periodontitis. Sci China Life Sci 2023, 1–13. [Google Scholar] [CrossRef]
  81. Wang, J.; Jing, J.; Zhou, C.; Fan, Y. Emerging Roles of Exosomes in Oral Diseases Progression. Int J Oral Sci 2024, 16, 4. [Google Scholar] [CrossRef]
  82. Jiang, L.; Chen, D.; Cao, Z.; Wu, F.; Zhu, H.; Zhu, F. A Two-Stage Deep Learning Architecture for Radiographic Staging of Periodontal Bone Loss. BMC Oral Health 2022, 22, 106. [Google Scholar] [CrossRef]
  83. Danesh, A.; Pazouki, H.; Danesh, F.; Danesh, A.; Vardar-Sengul, S. Artificial Intelligence in Dental Education: ChatGPT’s Performance on the Periodontic In-service Examination. J Periodontol 2024. [Google Scholar] [CrossRef] [PubMed]
  84. Rossini, G.; Parrini, S.; Castroflorio, T.; Deregibus, A.; Debernardi, C.L. Periodontal Health during Clear Aligners Treatment: A Systematic Review. Eur J Orthod 2014, 37, 539–543. [Google Scholar] [CrossRef] [PubMed]
  85. Boyd, R.L. Periodontal and Restorative Considerations with Clear Aligner Treatment to Establish a More Favorable Restorative Environment. Compendium 2009, 30, 280–291. [Google Scholar] [PubMed]
Figure 1. Histogram representing the distribution of publications in the analyzed corpus. As the database search was conducted at the beginning of 2024, the number of publications for this period was remarkably lower than the preceding years.
Figure 1. Histogram representing the distribution of publications in the analyzed corpus. As the database search was conducted at the beginning of 2024, the number of publications for this period was remarkably lower than the preceding years.
Preprints 121844 g001
Figure 2. A) Low-magnification view of the distribution of topics within the semantic space. Four clusters can be identified by their spatial arrangement. B) High magnification view of the topics that compose cluster #1. The main topic of this cluster is highlighted in red in both A) and B).
Figure 2. A) Low-magnification view of the distribution of topics within the semantic space. Four clusters can be identified by their spatial arrangement. B) High magnification view of the topics that compose cluster #1. The main topic of this cluster is highlighted in red in both A) and B).
Preprints 121844 g002
Figure 3. A) Low-magnification view of the distribution of topics within the semantic space. Four clusters can be identified by their spatial arrangement. B) High magnification view of the topics that compose cluster #2. The main topic of this cluster is highlighted in red in both A) and B).
Figure 3. A) Low-magnification view of the distribution of topics within the semantic space. Four clusters can be identified by their spatial arrangement. B) High magnification view of the topics that compose cluster #2. The main topic of this cluster is highlighted in red in both A) and B).
Preprints 121844 g003
Figure 4. A) Low-magnification view of the distribution of topics within the semantic space. Four clusters can be identified by their spatial arrangement. B) High magnification view of the topics that compose cluster #3. The same topic is highlighted in red in both A) and B).
Figure 4. A) Low-magnification view of the distribution of topics within the semantic space. Four clusters can be identified by their spatial arrangement. B) High magnification view of the topics that compose cluster #3. The same topic is highlighted in red in both A) and B).
Preprints 121844 g004
Figure 5. A) Low-magnification view of the distribution of topics within the semantic space. Four clusters can be identified by their spatial arrangement. B) High magnification view of the topics that compose cluster #4. The main topic of this cluster is highlighted in red in both A) and B).
Figure 5. A) Low-magnification view of the distribution of topics within the semantic space. Four clusters can be identified by their spatial arrangement. B) High magnification view of the topics that compose cluster #4. The main topic of this cluster is highlighted in red in both A) and B).
Preprints 121844 g005
Figure 6. Chord plot representing semantic similarity between topics, calculated as the cosine similarity between the means of the embeddings of the titles in a topic group.
Figure 6. Chord plot representing semantic similarity between topics, calculated as the cosine similarity between the means of the embeddings of the titles in a topic group.
Preprints 121844 g006
Figure 7. A) Histogram displaying the distribution of topics by the year of publication of their oldest paper; B) Histogram displaying the distribution of topics by the year of publication of their last paper.
Figure 7. A) Histogram displaying the distribution of topics by the year of publication of their oldest paper; B) Histogram displaying the distribution of topics by the year of publication of their last paper.
Preprints 121844 g007
Figure 8. Line chart tracking the number of papers published by year in the top 10 topics of our corpus, identified by topic ID.
Figure 8. Line chart tracking the number of papers published by year in the top 10 topics of our corpus, identified by topic ID.
Preprints 121844 g008
Figure 9. Scatter plots illustrating the publication trends in the A-C) 2009-2023 interval or in the D-F) 2019 to 2023 interval. The best-fitting line obtained by linear regression is also shown in the graphs. Displayed topics are: A) #4 COVID Dentistry; B) #154 Deep Learning Dentistry Diagnosis; C) #102 Stem cell-derived exosomes for bone regeneration; D) #4 COVID Dentistry; E) #174 AI in Dentistry; F) #21 Periodontitis-Alzheimers Disease.
Figure 9. Scatter plots illustrating the publication trends in the A-C) 2009-2023 interval or in the D-F) 2019 to 2023 interval. The best-fitting line obtained by linear regression is also shown in the graphs. Displayed topics are: A) #4 COVID Dentistry; B) #154 Deep Learning Dentistry Diagnosis; C) #102 Stem cell-derived exosomes for bone regeneration; D) #4 COVID Dentistry; E) #174 AI in Dentistry; F) #21 Periodontitis-Alzheimers Disease.
Preprints 121844 g009
Table 1. List of the top 10 high granularity topics in the 2009-2023 period sorted by topic size (Count= number of paper in the cluster). The table includes the degault label by BERTopic and the LLM-generated label.
Table 1. List of the top 10 high granularity topics in the 2009-2023 period sorted by topic size (Count= number of paper in the cluster). The table includes the degault label by BERTopic and the LLM-generated label.
Topic Count Name LLM
0 1870 0_porphyromonas_porphyromonas gingivalis_gingivalis_gingivalis lipopolysaccharide Periodontitis and Porphyromonas Gingivalis
1 899 1_sinus_maxillary sinus_sinus floor_floor Sinus Floor Elevation
2 742 2_pregnancy_birth_pregnant_preterm Periodontitis and Pregnancy Outcomes
3 512 3_diabetes_glycemic_mellitus_diabetes mellitus Diabetes and Periodontal Disease
4 449 4_covid_covid 19_19_pandemic COVID Dentistry
5 421 5_cleft_cleft lip_lip palate_palate Cleft Lip Nasoalveolar Molding Treatment
6 420 6_gingival recession_recession_recessions_gingival recessions Gingival Recession Treatment
7 418 7_aggregatibacter_actinomycetemcomitans_aggregatibacter actinomycetemcomitans_jp2 Aggregatibacter Actinomycetemcomitans
8 400 8_toothbrush_toothbrushes_manual_powered Toothbrush Efficacy
9 353 9_peri implantitis_implantitis_implant diseases_peri Peri-Implant Diseases Prevalence
Table 2. List of the research areas appeared in 2021 and 2020, as identified by BERTopic.
Table 2. List of the research areas appeared in 2021 and 2020, as identified by BERTopic.
Topic Year Name
2113 2021 Nanocurcumin Implant Healing
4 2020 COVID Dentistry
167 2020 Periodontitis COVID- Associations
1421 2020 Artificial Intelligence in Dentistry
1438 2020 COVID Periodontitis
1440 2020 Mouse SARS-CoV- Infection Models
1540 2020 Ferroptosis and Periodontitis: Research Progress
1846 2020 Submucosal injection for orthodontic acceleration
2212 2020 Mediterranean Diet and Gingival Health
2216 2020 Photobiomodulation in Orthodontics
Table 3. List of the top 20 topics in the 2019-2023 period sorted by slope of the linear regression of the number of publications/year and the goodness of fit, measured by Pearson’s r.
Table 3. List of the top 20 topics in the 2019-2023 period sorted by slope of the linear regression of the number of publications/year and the goodness of fit, measured by Pearson’s r.
Topic LLM Slope Pearson
4 COVID Dentistry 19.90 0.57
174 AI in Dentistry 7.70 0.93
21 Periodontitis-Alzheimers Disease 7.70 0.90
23 Probiotics and periodontal health 6.30 0.92
154 Deep Learning Dentistry Diagnosis 6.10 0.92
1 Sinus Floor Elevation 5.70 0.92
26 Gingival Thickness Assessment 5.10 0.95
6 Gingival Recession Treatment 5.00 0.88
2 Periodontitis and Pregnancy Outcomes 4.50 0.83
18 Subgingival Microbiome 4.30 0.68
102 Stem cell-derived exosomes for bone regeneration 4.20 0.99
1514 Comparison of mild pericoronitis treatments 4.00 0.99
53 Guided Implant Surgery Accuracy 4.00 0.84
0 Periodontitis and Porphyromonas Gingivalis 3.90 0.57
10 Oral Microbiome Analysis 3.70 0.65
557 Peri-Implantitis Biomarkers 3.60 0.97
519 Post COVID Mucormycosis 3.50 0.99
17 Medicinal Plant Extracts for Oral Care 3.40 0.75
218 Clear aligners periodontal health 3.30 0.91
167 Periodontitis COVID- Associations 3.30 0.74
Table 4. List of the most represented topics by number of publications appeared in 2024.
Table 4. List of the most represented topics by number of publications appeared in 2024.
Topic LLM N. Publications
0 Periodontitis and Porphyromonas Gingivalis 9
21 Periodontitis-Alzheimers Disease 5
102 Stem cell-derived exosomes for bone regeneration 5
3 Diabetes and Periodontal Disease 4
17 Medicinal Plant Extracts for Oral Care 3
336 3D printed denture base properties 3
5 Cleft Lip Nasoalveolar Molding 3
119 Pulmonary Periodontitis Association 3
23 Probiotics and periodontal health 3
0 Periodontitis and Porphyromonas Gingivalis 9
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated