1. Introduction
Progressive neurological disorders (PNDs) are associated with significant disabilities and decreased life expectancy [
1]. Approximately 15% of the world’s population is affected by PNDs, which are a leading cause of disability and illness [
2]. Common PNDs include Parkinson’s disease (PD), motor neuron disease, multiple sclerosis, amyotrophic lateral sclerosis (ALS), and Huntington’s disease. People with PNDs face profound challenges with significant and progressive impacts on mobility and communication. One of the most prevalent of these conditions is PD. The hallmark pathology of PD is dopamine depletion, which results in tremors, postural instability, and bradykinesia. Communication impairments emerge in the initial stages of PD, and studies report that 78% of people with early-stage PD have some form of vocal impairment [
3] as well as changes in speech production. Similarly, impairments in communication can also emerge as early indicators of other PNDs, highlighting their role in early diagnosis and disease monitoring. Changes in voice and speech characteristics are heterogeneous depending on the neuropathophysiological features of PND. For example, in PD, factors such as reduced loudness (hypophonia), diminished voice quality (dysphonia), decreased pitch variation, and limited range of articulatory movements are common. This group of symptoms is collectively known as hypokinetic dysarthria (Orozco-Arroyave, 2016). The heterogeneous nature of voice and speech deficits in PND indicates that the onset, underlying etiology, or progression of these deficits is poorly understood. Delayed symptom detection inhibits early diagnosis and treatment and can negatively impact quality of life and social participation.
Speech-language pathologists (SLPs) typically assess speech and voice changes, with ear, nose, and throat specialists contributing to voice assessment. Using both subjective and instrumental measures is considered the best practice for assessing voice [
4]. Instrumental assessments may include endoscopic laryngeal imaging and acoustic and aerodynamic measures [
5]. However, clinicians may not have access to a wide range of instrumental assessments. Thus, subjective auditory-perceptual assessment methods are commonly used because of diagnostic utility [
6]. Similarly, speech is assessed using a combination of subjective and objective methods to assess parameters beyond phonation (voice) such as articulation, resonance, fluency, and prosody. Standardized speech tasks often involve speaking single words containing specific consonants, sentence repetition, reading paragraphs, and spontaneous speech to determine an individual’s precision of articulation, focus of resonance, rate of speech and fluency, intonation patterns, intelligibility, and severity. Instrumental measures gained through acoustic analysis software and kinematic assessments can provide insights into speech function. However, access to these tools is often limited by resource availability and clinician training.
In recent years, the healthcare field has experienced significant advancements in the use of artificial intelligence (AI) to diagnose and assess complex medical conditions. Studies have highlighted significant insights into AI techniques employed in medicine, specifically in heart disease, brain injury, prostate health, liver conditions, and kidney disease [
7,
8]. However, there remains a need for additional inquiries within the domain of AI-driven diagnostic systems to improve precision, accuracy, and clinical relevance. Various AI models have been developed to reduce the number of hypotheses that a program must evaluate and integrate pathophysiological logic [
9]. This advancement enables the program to assess instances in which one condition impacts the manifestation of another. Prototypes incorporating this logic can justify their decisions by using user-friendly medical terminology. However, additional research and development are needed to improve proficiency. Alowais et al. [
10] offered a thorough and timely examination of the current landscape of AI in clinical settings, detailing its use in disease detection, treatment advice, and patient involvement. Additionally, challenges such as ethical and legal implications, as well as the necessity for human input, are also highlighted. This study enhances the understanding of the significance of AI in the healthcare sector and supports healthcare organizations in effectively integrating AI tools. Hussain and Nazir (2024) developed AI-powered predictive analytics models that can evaluate individual risk profiles and provide guidance on personalized treatment decisions, improving patient outcomes and optimizing resource allocation for both acute and chronic neurological conditions.
Furthermore, AI-driven approaches have revolutionized medicine by enhancing risk assessment, disease prognosis, treatment selection, and monitoring. Utilizing machine learning (ML) algorithms, a variety of data sources, such as electrocardiograms, echocardiograms, and wearable sensors, are analyzed to detect abnormalities, predict adverse events, and customize interventions to suit the unique needs of each patient. By leveraging big data and innovative analytics, precision medicine strategies are leading toward more precise and efficacious therapies in the field of cardiology. Nevertheless, there are significant obstacles to overcome in the adoption of AI-driven precision medicine, including issues related to data privacy, regulatory compliance, and the necessity for interdisciplinary collaboration among healthcare professionals. Raghavendra et al. [
11] proposed the notion that using a computer-aided diagnosis (CAD) system, which is educated through an extensive amount of patient data alongside physiological signals and images through the skilled incorporation of advanced signal processing and AI/ML techniques in an automated manner, has the potential to aid neurologists, neurosurgeons, radiologists, and other healthcare professionals in enhancing clinical judgments.
Speech-based AI assessment techniques have been investigated for a range of medical conditions, including dysphonia [
12], pulmonary disease [
13], and emotion recognition [
14]. Speech-based models typically follow a five-stage process involving the acquisition and labelling of speech datasets, signal pre-processing and normalization, feature extraction, model training using ML/deep learning (DL) architectures, and evaluation via standardized performance metrics (i.e., accuracy, sensitivity, and specificity). Given that up to 70% of people with PD present with voice and speech impairment [
15]. AI technologies are likely to be useful tools for detecting disease, monitoring progression, and guiding intervention. Incorporating AI into the healthcare sector presents promising opportunities to enhance disease diagnosis, treatment selection, and clinical laboratory testing. AI technology can synthesize vast amounts of data to identify patterns, outperforming human capabilities in multiple facets of health care. The use of AI in allied healthcare is emerging, but recent studies have highlighted its utility in improving diagnostic accuracy and service feasibility [
16].
The application of AI can lead to improved precision, decreased expenses, increased time efficiency, and reduced human error. The subsequent sections provide a contextual foundation for this research.
Section 2 elaborates on the search strategy and the methodologies employed in this study.
Section 3 presents the findings of this study, with particular emphasis on the performance metrics and ML techniques implemented.
Section 4 includes the discussion and limitations of this review and suggests directions for future research.
Related Work
This section provides an overview of the research conducted on the contribution of AI in the assessment of speech and voice in PNDs.
Patel et al. [
17] investigated the unmonitored facets of ML and explored how they can be leveraged in precision neurology to enhance patient outcomes. This paper discusses various AI options, previous studies, results, advantages and drawbacks of AI, efficient accessibility, and the future of AI while considering the current burden of neurological disorders. Specific examples include a sophisticated system for tracking tremors and identifying their characteristics to improve the effectiveness of deep brain stimulation, applications for assessing fine motor skills, AI-powered electroencephalogram analysis for diagnosing epilepsy and psychological non-epileptic seizures, forecasting the results of seizure surgeries, detecting autonomic instability patterns to prevent sudden unexpected death in epilepsy, recognizing intricate algorithms in neuroimaging that categorize cognitive deficiencies, distinguishing and categorizing concussion variations, smartwatches that monitor atrial fibrillation to reduce the risk of strokes, and predicting dementia prognosis. These applications represent the groundbreaking use of AI in the field of neurology. Patel et al. [
17] demonstrated the feasibility and effectiveness of implementing systematic data collection procedures to support research efforts and to promote the incorporation of a P3 approach into clinical practice using AI models [
18]. The primary goals of the project included 1) streamlining the data collection process among all participating centers, 2) organizing standardized datasets specific to each disease, and 3) enhancing the understanding of disease progression through the utilization of ML analysis.
Lima et al. [
19] presented a comprehensive review that provided valuable information on neurological diseases and the classification algorithms used in their diagnosis to guide researchers interested in studying neurological diseases and techniques used in this field. This article addressed the challenges faced in detecting various stages of these disorders, including the limited availability of labelled and unlabeled datasets. Measuring neurological disorders through voice analysis is an emerging area of research that shows potential for discreet and widespread monitoring of disorders on a substantial scale. The processes for data recording and analysis are integral components for efficiently extracting pertinent data from the participants. Hecker [
20] conducted a review of practices in various neurological disorders and highlighted emerging patterns using PRISMA-based searches in PubMed, Web of Science, and IEEE Xplore to identify publications containing original datasets. They examined disorders that included psychiatric and neurological conditions, such as bipolar disorder, depression, stress, amyotrophic lateral sclerosis, Alzheimer’s disease, PD, and speech impairments (aphasia, dysarthria, and dysphonia). Among the 43 studies analyzed, PD stood out, with 19 datasets. Free speech and read-speech tasks are commonly employed across conditions. In addition to popular feature extraction tools, custom feature sets have been utilized in several studies. The correlation between the acoustic features and neurological disorders was also explored. Statistical analysis of feature significance and predictive modelling techniques, particularly support vector machines and a few artificial neural networks, are commonly used for analysis. A growing trend and suggestion for future research is to gather data in real-life settings for longitudinal data collection and to capture participant behavior more authentically. Another emerging trend is the incorporation of additional modalities to voice data, potentially enhancing analytical performance.
De la Fuente Garcia et al. [
21] provided an overview of current research findings on the application of AI, speech, and language processing as predictive tools for cognitive decline in Alzheimer’s disease. Similarly, Idrisoglu [
22] conducted an in-depth analysis of research on voice-affecting disorders by utilizing ML techniques to diagnose and monitor voice samples. Of particular focus in this study were systematic conditions, non-laryngeal aerodigestive disorders, and neurological disorders. This systematic literature review revealed significant interest from various countries in utilizing ML techniques for the diagnosis and monitoring of voice-affecting disorders, with PD being the most extensively researched disorder. However, the review also identified several areas for improvement, such as the limited and uneven utilization of datasets in studies, and a predominant focus on diagnostic testing rather than disorder-specific monitoring. Despite the constraints of exclusively including peer-reviewed publications in English, this review offers valuable insights into the current landscape of research on ML-based diagnosis and monitoring of voice-affecting disorders as well as pinpointing areas for future research.
Bechiche and Djezzar [
23] developed an automated diagnostic support system for early identification of PD using speech signals. The primary goal was to differentiate between patients with PD pathology and healthy individuals. The proposed system comprises two key components: feature extraction and ML classification. The selected discriminatory features included the mel-frequency cepstral coefficient (MFCC) and Excitation Source Parameters. The classification process involves three supervised ML algorithms: K-Nearest Neighbor (KNN), Support Vector Machines (SVM), and Decision Tree (DT). Feature extraction was performed using MATLAB MathWorks software, and classification was conducted using the same software. The PD database (MDVR-KCL) was used in this study. Performance metrics such as accuracy, sensitivity, specificity, F1 Score, and the SVM receiver operating curve were used to evaluate the system, and the results were found to be favorable. Following the analysis conducted in the survey by Saravanan [
24], it was determined that numerous ML and DL algorithms show potential for optimization, highlighting the opportunity for further research to improve accuracy and expedite decision-making processes.
Our review examines AI-driven speech and voice assessments in PNDs, emphasizing standardized datasets and multimodal data integration for greater precision. Unlike prior reviews on general AI applications [
17] or data collection [
18], we explore AI’s role beyond diagnosis to continuous monitoring. Timely detection of PD can be enhanced by integrating DL-based models with expert medical professionals. In the current healthcare landscape, these sophisticated models have demonstrated significant benefits. It is imperative to refine DL models to achieve heightened diagnostic accuracy for PD. Additionally, we propose the exploration of alternative metrics beyond specificity and sensitivity to further support healthcare professionals in the diagnosis of PD Implementation of these recommendations has the potential to address existing challenges and enhance the accuracy of PD classification.
Considering the recent progress in the utilization of voice and speech markers for the evaluation of PNDs, there is still a noticeable void in the literature concerning the integration of AI in the precise and timely identification of speech irregularities linked to these conditions.
Figure 2 illustrates the comparison and unique contributions of our review to existing published review papers. This study aimed to assess the precision, efficacy, application, and extent of AI in this domain. Three members of the research team systematically examined the abstracts and full manuscripts to determine their relevance to the study.
Figure 1 provides a comparison of our study with existing review papers.
Figure 1.
A distinctive comparison of our review study with existing review papers.
Figure 1.
A distinctive comparison of our review study with existing review papers.
Figure 2.
Overview of the PRISMA guidelines applied in the article selection process for the systematic review.
Figure 2.
Overview of the PRISMA guidelines applied in the article selection process for the systematic review.
2. Materials and Methods
To identify the most applicable articles involving the use of AI in the context of voice and speech assessment in PNDs, we adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [
25].
Search Strategy
We searched PubMed, Scopus, and Web of Science using keywords related to neurological diseases, diagnostic methods, AI, and speech and voice analyses. The main search terms included
Amyotrophic Lateral Sclerosis (ALS),
Motor Neurone Disease,
Huntington’s disease,
Multiple Sclerosis, and
Parkinson’s disease. Diagnostic-related terms comprised
diagnosis,
early diagnosis,
early detection,
early recognition,
early identification,
detection,
recognition,
identification, and
classification. AI–related keywords included
artificial intelligence (AI),
machine learning,
deep learning,
neural networks,
convolutional neural networks (CNNs),
recurrent neural networks (RNNs),
long short-term memory networks (LSTMs),
bidirectional LSTMs (BLSTMs),
transformers,
BERT,
GPT,
support vector machines,
decision trees,
random forests,
naïve Bayes classifiers, and
k-nearest neighbor (k-NN). Speech and voice analysis terms included
speech,
voice disorders,
voice assessment,
vocal analysis,
speech analysis,
acoustic analysis,
phonatory assessment,
voice biomarkers, and
speech biomarkers. We excluded conference proceedings, non-English publications, articles that were not peer-reviewed, review studies, book chapters, abstracts, articles published before 2013, and articles not accessible in a full-text format. A total of 102 papers were included in this review (
Figure 2).
3. Results
In this review, we examined a diverse range of studies that utilized ML and hybrid techniques to analyze speech signals for the detection, classification, and severity assessment of PD and other PNDs. While some studies have focused on PD discrimination based on acoustic voice and speech characteristics (e.g., variation in fundamental frequency), others have focused on voice and speech measures (i.e., determining dysarthria severity).
These studies encompassed a broad geographical spectrum, including contributions from researchers across multiple countries. In addition, there has been significant growth in the number of papers in this area over the last decade (
Figure 3).
The datasets used recordings of speech and voice samples from patients at various stages of PD (and other PNDs), as well as from healthy controls. ranging from basic acoustic measures, such as pitch and intensity, to more complex features derived from advanced signal processing and feature extraction techniques. Traditional ML approaches, such as SVMs and random forests (RF), have been employed alongside more contemporary DL techniques, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). A notable trend was the hybrid models combining different features or model architectures to enhance predictive performance.
Performance metrics across the studies revealed a wide range of accuracy levels, with some models achieving high PD classification accuracy and others demonstrating modest results. Metrics such as sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC) have been frequently reported, providing insight into the models’ ability to correctly identify and classify PD and related neurological conditions. Additionally, the effectiveness of these models in distinguishing between the various stages of PD or between PD and other PNDs was evaluated.
The variability in the results can be attributed to several factors, including differences in dataset quality and size, feature selection methods, and model training procedures. Some studies achieved high performance with small, well-curated datasets, whereas others demonstrated robustness with larger, more diverse datasets. This review highlights both the progress made in applying ML and DL to PD detection and the ongoing challenges, such as the need for standardized datasets and validation protocols to ensure the generalizability and reliability of the models.
Overall, this review underscores the potential of ML and DL techniques to advance PD research and diagnosis. Integrating these computational methods into clinical practice holds promise for improving early detection, monitoring disease progression, and tailoring personalized treatment plans for patients with PD and related neurological conditions.
3.1. Accuracy and Performance Metrics
The reviewed studies demonstrated a wide range of accuracies, with reported values ranging from 67.43% to 99%. For instance, Yildirim et al. [
26] achieved a remarkable 98.19% accuracy using voice data to discriminate PD from non-PD subjects with a combination of hybrid techniques (DL and ML). Similarly
, Alalayah et al. [
27] reported high classification accuracies for classifying people with and without PD, with 98% accuracy. Conversely, Akbal et al. [
28] reported a lower accuracy of 77.48% in an international cohort study using SVM for speech data, highlighting variability in performance across different datasets and methods. Zhao et al. [
29] used a KNN-based ML approach with speech data for PD diagnosis and achieved 96.50% accuracy. Conversely, Zhang et al. [
30] used different ML classifiers to classify PD or no PD using voice data and achieved 92.59% accuracy. Yao et al. [
31] used convolutional neural networks using speech data to classify PD or no PD and achieved 95.77% classification accuracy.
3.2. Machine Learning Models
SVM was the most frequently used ML model across studies, demonstrating reliable performance in both speech and voice data analyses. For example, Amato et al. [
32] and Benba et al. [
33] utilized SVM to identify the presence or absence of PD in voice measures, achieving accuracies of 98% and 87.50%. respectively. RF has also achieved promising results. Alalayah et al. [
27] reported 98% accuracy in classifying voice signals for people with PD. Similarly, Elen and Avuclu [
34] obtained their best test results with an RF model (85.81%) for classifying PD using voice data. Zhang et al. [
30] used different ML classifiers including SVM, RF, and EDF-EMD using voice data to differentiate PD and no PD, and the authors achieved an accuracy of 96.54%.
3.3. Deep Learning Models
DL techniques, particularly CNNs, have been employed in several studies to enhance performance. Bhatt et al. [
35] used a deep neural network (DNN) with spectrograms of speech signals to detect PD with up to 96% accuracy. Similarly, Kaur et al. [
7] used a DL model with grid search hyperparameter tuning to achieve a mean classification precision of 91.69% for PD.
3.4. Hybrid Approaches
Hybrid models combining ML and DL techniques show promise for achieving high accuracy and robustness. For example, Hires et al. [
36] utilized a hybrid approach involving CNNs and ensemble methods to achieve 99% accuracy in PD detection using speech data. Similarly, Jain et al. [
37] employed a hybrid model that integrated RNN and SVM for PD classification, with a promising accuracy of 82.35%. The incorporation of hybrid models, combining both DL and ML approaches, was also noted in studies by Costantini et al. [
38], where a CNN was used for PD classification with an accuracy of 82.25%.
Figure 4 depicts the accuracy of the various AI models.
3.5. Geographical Distribution
The studies reviewed were conducted across various countries, including Australia, China, Saudi Arabia, and India. This geographical diversity suggests growing global interest; however, generalizability remains limited by dataset heterogeneity.
3.6. Category-Specific Performance
When focusing on dysarthria detection and severity classification, Ali et al. [
39] demonstrated high accuracy. For PD detection, models like those proposed by Bhatt et al. [
35] and Azadi et al. [
40] achieved accuracies above 95%, highlighting the potential of these approaches in clinical applications.
3.7. Public Datasets
Speech and voice analysis has gained increasing prominence in PD research, as the condition disrupts motor control and frequently results in speech and voice changes. Numerous publicly accessible datasets include speech recordings from individuals with PD, enabling the development of ML models for early detection, symptom assessment, and tracking of disease progression.
Table 1 provides an overview of the key datasets available for PD research that use speech and voice data. These datasets face limitations, such as small sample sizes, missing values, and difficulties in training DL models effectively. Pre-processing is necessary to address data gaps, and a limited number of instances restricts model performance. Expanding the availability of larger and more diverse datasets is essential for improving the accuracy and reliability of PD research.
4. Discussion
This systematic review provides a comprehensive analysis of research conducted over the past decade regarding the application of AI in assessing speech and voice within the context of PNDs. These findings underscore the substantial promise that AI, particularly in ML and DL, holds in enhancing the diagnosis, monitoring, and treatment of these medical conditions. The reviewed studies employed a wide array of AI methodologies ranging from conventional ML models such as SVMs, RNNs, and RFs to more sophisticated frameworks such as CNNs and advanced RNNs. Notably, many investigations have adopted hybrid approaches that integrate multiple model architectures to bolster predictive accuracy. These methodologies were utilized to analyze the spectrum of speech and voice parameters, encompassing fundamental acoustic metrics, as well as intricate features derived through advanced signal processing techniques. Variability in approaches reflects the continuous refinement and exploration of strategies within this domain.
In the healthcare sector that focuses on speech and voice assessment, particularly within the field of speech-language pathology, AI-driven tools have the potential to assist clinicians in recognizing changes in speech and voice that may signal the early stages of PNDs, with the most evidence currently available for PD. Implementing such tools may facilitate timely diagnoses and interventions, ultimately leading to improved patient outcomes. The precision of the analyzed models exhibited significant variability. The performance of AI models in identifying PD demonstrated an accuracy range of 67.43% to 99%.
Several investigations indicated accuracies exceeding 95%, underscoring the potential of these methodologies as dependable tools in clinical environments. The factors influencing variability in model accuracy encompassed differences in dataset quality and volume, feature selection criteria, and model training protocols. Certain studies achieved high accuracy with limited datasets, whereas others successfully illustrated accurate outcomes by utilizing larger and more diverse datasets. It is expected that the quality and attributes of the training data for the models will significantly influence their performance. The precision of AI models in the evaluation of speech and voice could enhance diagnostic accuracy, thereby reducing the dependence on subjective assessments. This advancement holds the promise of improving the oversight of intervention strategies and empowering clinicians to effectively track and monitor disease progression. Furthermore, AI tools may facilitate the customization of intervention techniques by recognizing nuanced variations in patient presentations that may not be discernible through other existing assessment methods. The incorporation of AI models into the clinical evaluation of voice and speech data presents an opportunity to enhance the efficiency of assessments, thereby optimizing the utilization of clinician time and resources. Furthermore, AI-driven instruments can facilitate improved access to care and ongoing monitoring of patients situated in remote or underserved regions, effectively diminishing geographical barriers to healthcare services. The body of research examined encompasses a variety of geographical regions, including Australia, China, Saudi Arabia, and India, indicating a global interest in the application of AI for detecting PNDs through speech and voice data. Furthermore, there is significant engagement from a diverse array of professionals, including healthcare, engineering, and computer science professionals. This interdisciplinary collaboration is essential to ensure that patients exhibiting alterations in their voice and speech are evaluated using methodologies that are clinically appropriate, accurate, and practical.
This study conducted a thorough examination of the application of DL, ML, and hybrid methodologies in the analysis of speech signals related to PNDs. Although the findings offer significant insights into this area, certain limitations warrant consideration. Notably, the scarcity of extensive public databases on PNDs restricts the generalizability of the developed models. Furthermore, the studies included in this review primarily utilized localized datasets, which may not adequately represent the necessary diversity, including ethnicity, background, and variations in disease progression. Additionally, although DL models, such as RNNs and CNNs, have demonstrated potential in effectively identifying intricate patterns in speech signals, their performance is contingent upon access to large datasets to enhance accuracy. Moreover, DL models pose challenges within clinical environments, particularly concerning their explainability and feasibility of integration into standard assessment protocols. The limitations and future directions of this study are as follows:
5. Limitations
In recent years, speech technology, including speech-signal interpretation, has become a vital component of modern healthcare systems. Despite its growing significance, the integration of speech technology into medical applications still encounters numerous challenges, necessitating ongoing advancements. In this section, we highlight the key issues and outline potential directions for future research to effectively address these challenges.
5.1. Lack of Datasets
AI-powered speech solutions hold promise for medical applications; however, their effectiveness in PNDs is hindered by the lack of high-quality datasets. Limited public databases restrict the generalizability of the models, and localized datasets often fail to capture the necessary diversity in ethnicity, background, and disease progression [
41]. Additionally, AI speech technologies encounter challenges with cross-linguistic transferability, which hinders the progress of multilingual research. Data collection is also challenging owing to patient discomfort and logistical barriers, with existing datasets being small, inconsistent, and limited in scope [
42]. Overcoming these challenges is essential for developing reliable and accurate speech-based solutions for PNDs.
5.2. Bias in Data
The effectiveness of speech recognition systems is heavily influenced by the quality and diversity of training data. If the data used for training are overly focused on specific populations or healthcare contexts, the system may struggle to perform well when applied to patients with a wide range of medical conditions. This lack of diversity in training data can lead to inaccurate or less reliable outcomes when the system encounters variations in speech patterns, accents, or unique medical scenarios [
43]. Ensuring a broader, more inclusive data set is crucial for improving the adaptability and performance of the system in diverse clinical settings.
5.3. Challenges of Adaptability and Robustness
Limited adaptability and robustness remain significant challenges for speech-based medical solutions. Much of the current research in this field has been conducted under control or ideal conditions. However, real-world medical environments are far more complex, often featuring background noise such as conversations between healthcare professionals or equipment alarms. Additionally, variations in pronunciation among different speakers and overlapping speech from multiple individuals can further compromise the accuracy of speech-recognition systems [
42]. These factors underscore the need for enhanced adaptability and stability to satisfy the stringent demands of medical application scenarios.
5.4. Privacy Issues
With the rise of digital healthcare solutions, the use of speech technology has introduced ethical and legal challenges. Speech data collected in medical contexts are at risk of unauthorized access and exploitation, potentially revealing sensitive information, such as personal identities, emotional states, and other confidential details [
43]. As these technologies become more prevalent, concerns regarding data security and privacy are growing, potentially hindering their adoption in clinical settings. To address these issues, a collaborative effort between technological innovation and legal regulations is essential. Robust privacy frameworks and advanced protective techniques must be developed to ensure the secure collection, transmission, storage, sharing, and usage of speech data in medical applications. Proactively addressing these challenges will help to build trust and promote the use of speech-based healthcare technologies.
6. Future Directions
Advancing speech technology for PNDs requires several key developments. Access to large and diverse multimodal datasets is crucial for building accurate AI models. Combining speech data with other medical information, such as electronic health records (EHRs) and imaging, will enhance diagnostic accuracy. Explainable AI (XAI) will improve transparency and trust, whereas uncertainty quantification (UQ) will help clinicians assess the reliability of AI predictions. Ensuring robust data privacy and security through techniques, such as encryption and federated learning, is vital for maintaining patient trust. Finally, although remote monitoring using wearable devices shows promise, challenges in data accuracy, patient access, and integration must be addressed for its successful implementation in clinical practice.
6.1. Access to Large and Multimodal Datasets
Access to extensive and diverse datasets is essential for developing accurate, comprehensive, and effective diagnostic models for healthcare. A wide range of data sources ensure the generalization, reliability, and precision required during the training, validation, and testing of AI applications. Multimodal datasets that include diverse types of information such as electronic health records, medical imaging, clinical notes, and physiological data are particularly valuable. Information fusion using multimodal data from different systems can provide vital insights and help identify early indicators of medical conditions [
44]. The integration of multimodal data with advanced AI, ML, DL, or hybrid approaches can significantly enhance the reliability, accuracy, and performance of clinical applications. However, the lack of publicly available multimodal datasets limits the scope of R&D in this area.
6.2. Explainable AI
Securing consistency and confidence are essential when developing DL models for medical diagnostics. In healthcare, understanding how and why a model makes decisions is crucial, especially when these decisions impact life-or-death outcomes. XAI techniques can clarify a model’s decision-making process and enhance its transparency [
45]. Applying advanced XAI methods to multimodal medical imaging will improve our understanding of how these models analyze complex data. To ensure widespread adoption, physicians must trust AI systems, which can be achieved by providing clear explanations of AI decision-making. This transparency will be the key to fostering confidence and safely integrating AI into clinical practice.
6.3. Uncertainty Quantification (UQ)
UQ assesses the unpredictability of ML and DL model outputs, which is crucial when data or models are affected by unknown factors or insufficient information. Uncertainty can arise from factors such as randomness, data noise, measurement errors, and model simplification. UQ helps evaluate the reliability of AI predictions, providing insights into model confidence, particularly in high-risk areas, such as healthcare [
46]. This enables stakeholders to make informed decisions, improve model calibration, and manage risks by forecasting potential outcomes. Integrating UQ also enhances model interpretability and trust in the AI-driven results.
6.4. Data Privacy and Security
As AI technologies expand across various fields, including healthcare, medical robotics, and human-machine interfaces, safeguarding patient and user data has become increasingly critical. Robust privacy protection is paramount to preventing unauthorized access and attacks on sensitive medical information. A comprehensive privacy-preserving system not only ensures data security but also upholds ethical standards and regulatory compliance. Such systems protect data from breaches, enhance cybersecurity, and maintain trust between patients and healthcare providers. Techniques such as federated learning [
47] and differential privacy [
48] are essential for achieving these goals, enabling secure data sharing and maintaining confidentiality without compromising the utility of the data for AI models.
6.5. Developing Accurate AI Architecture
Future advancements in the role of AI for the automated assessment of PNDs using speech signals should focus on developing accurate, robust, and adaptable AI architectures. Techniques such as Transfer Learning (TL) can address the scarcity of high-quality datasets by adapting pre-trained models [
49], whereas Generative Adversarial Networks (GANs) can help speech enhancement through noise elimination and augment datasets by synthesizing realistic speech samples [
50]. Advanced transformer models, known for their ability to capture long-term dependencies, hold great potential for analysing sequential speech data, while Kolmogorov-Arnold Networks (KANs) could provide novel approaches for approximating complex relationships between speech features and neurological conditions [
51]. Hybrid architectures combining CNNs for feature extraction with transformers for sequence modelling may further enhance the diagnostic accuracy. These advancements will drive the development of reliable AI systems, enabling the precise diagnosis and monitoring of PNDs through speech signals.
6.6. Remote Monitoring
In the future, the remote monitoring of PD using wearable devices could play a crucial role in providing continuous, objective data to help clinicians track symptoms and adjust treatment plans more effectively.
Figure 5 shows a cloud-based AI framework designed to evaluate speech and voice in the PNDs. However, to fully realize its potential, several challenges must be addressed. Future research should focus on improving patients’ accessibility to technology and ensuring that they are comfortable using it. Additionally, concerns regarding the accuracy and privacy of the data must be carefully evaluated and resolved. Integrating remote monitoring into clinical practice has the potential to enable real-time data processing, analysis, and personalized feedback, improving diagnosis and monitoring across diverse locations, while ensuring data security and scalability. To achieve this, it is essential to develop strategies for seamlessly implementing these systems within the existing healthcare frameworks. Exploring these challenges and opportunities is vital for advancing the use of wearable devices for managing PD.
References
- Pakpoor, J.; Goldacre, M. Neuroepidemiology: The increasing burden of mortality from neurological diseases. Nat. Rev. Neurol. 2017, 13, 518–519. [CrossRef]
- Van Schependom, J.; D’haeseleer, M. Advances in neurodegenerative diseases. J. Clin. Med. 2023, 12, 1709. [CrossRef]
- Rusz, J.; Cmejla, R.; Ruzickova, H.; Ruzicka, E. Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease. J. Acoust. Soc. Am. 2011, 129, 350–367. [CrossRef]
- American Speech-Language-Hearing Association. Voice disorders. ASHA Pract. Portal. n.d., Retrieved February 11, 2025. https://www.asha.org/practice-portal/clinical-topics/voice-disorders/.
- Mehta, D.D.; Hillman, R.E. Voice assessment: Updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods. Curr. Opin. Otolaryngol. Head Neck Surg. 2008, 16, 211–215. [CrossRef]
- Salgado, S.; Schils, S.A.; Childes, J.M.; Crino, C.; Palmer, A.D. Current practices in the assessment of voice: A comparison of providers across different clinical settings. J. Voice 2024, in press. [CrossRef]
- Kaur, S.; et al. Medical diagnostic systems using artificial intelligence (AI) algorithms: Principles and perspectives. IEEE Access 2020, 8, 228049–228069. [CrossRef]
- Tariq, M.; et al. Principles and perspectives in medical diagnostic systems employing artificial intelligence (AI) algorithms. Int. Res. J. Econ. Manag. Stud. 2024, 3, 1. [CrossRef]
- Szolovits, P.; et al. Artificial intelligence in medical diagnosis. Ann. Intern. Med. 1988, 108, 80–87.
- Alowais, S.A.; et al. Revolutionizing healthcare: The role of artificial intelligence in clinical practice. BMC Med. Educ. 2023, 23, 689. [CrossRef]
- Raghavendra, U.; et al. Artificial intelligence techniques for automated diagnosis of neurological disorders. Eur. Neurol. 2020, 82, 41–64. [CrossRef]
- Ishikawa, K.; Rao, M.B.; MacAuslan, J.; Boyce, S. Application of a landmark-based method for acoustic analysis of dysphonic speech. J. Voice 2020, 34, 645.e11–645.e18. [CrossRef]
- Alam, M.Z.; Simonetti, A.; Brillantino, R.; Tayler, N.; Grainge, C.; Siribaddana, P.; Nouraei, S.A.R.; Batchelor, J.; Rahman, M.S.; Mancuzo, E.V.; et al. Predicting pulmonary function from the analysis of voice: A machine learning approach. Front. Digit. Health 2022, 4, 750226. [CrossRef]
- Aftab, A.; Morsali, A.; Ghaemmaghami, S.; Champagne, B. Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition. arXiv 2021, arXiv:2110.03435. [CrossRef]
- Skodda, S.; Grönheit, W.; Mancinelli, N.; Schlegel, U. Progression of voice and speech impairment in the course of Parkinson’s disease: A longitudinal study. Parkinson’s Dis. 2013, 2013, 389195. [CrossRef]
- Girardi, A.M.; Cardell, E.A.; Bird, S.P. Artificial intelligence in the interpretation of videofluoroscopic swallow studies: Implications and advances for speech–language pathologists. Big Data Cogn. Comput. 2023, 7, 178. [CrossRef]
- Patel, U.K.; et al. Artificial intelligence as an emerging technology in the current care of neurological disorders. J. Neurol. 2021, 268, 1623–1642. [CrossRef]
- Malaguti, M.C.; et al. Artificial intelligence of imaging and clinical neurological data for predictive, preventive, and personalized (P3) medicine for Parkinson’s disease: The NeuroArtP3 protocol for a multi-center research study. PLoS ONE 2024, 19, e0300127. [CrossRef]
- Lima, A.A.; et al. A comprehensive survey on the detection, classification, and challenges of neurological disorders. Biology 2022, 11, 469. [CrossRef]
- Hecker, P.; et al. Voice analysis for neurological disorder recognition: A systematic review and perspective on emerging trends. Front. Digit. Health 2022, 4, 842301. [CrossRef]
- de la Fuente Garcia, S.; Ritchie, C.W.; Luz, S. Artificial intelligence, speech, and language processing approaches to monitoring Alzheimer’s disease: A systematic review. J. Alzheimer’s Dis. 2020, 78, 1547–1574. [CrossRef]
- Idrisoglu, A.; et al. Applied machine learning techniques to diagnose voice-affecting conditions and disorders: Systematic literature review. J. Med. Internet Res. 2023, 25, e46105. [CrossRef]
- Bechiche, A.; Djezzar, M.N. Automatic detection of Parkinson’s disease using human voice and artificial intelligence. University of Kasdi Merbah Ouargla, 2022.
- Saravanan, S.; et al. A systematic review of artificial intelligence (AI) based approaches for the diagnosis of Parkinson’s disease. Arch. Comput. Methods Eng. 2022, 29, 3639–3653. [CrossRef]
- Page, M.J.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [CrossRef]
- Yildirim, M.; Kiziloluk, S.; Aslan, S.; et al. A new hybrid approach based on AOA, CNN and feature fusion that can automatically diagnose Parkinson’s disease from sound signals: PDD-AOA-CNN. SIViP 2024, 18, 1227–1240. [CrossRef]
- Alalayah, K.M.; Senan, E.M.; Atlam, H.F.; Ahmed, I.A.; Shatnawi, H.S.A. Automatic and early detection of Parkinson’s disease by analyzing acoustic signals using classification algorithms based on recursive feature elimination method. Diagnostics 2023, 13, 1924. [CrossRef]
- Akbal, E.; Barua, P.D.; Tuncer, T.; Dogan, S.; Acharya, U.R. Development of novel automated language classification model using pyramid pattern technique with speech signals. Neural Comput. Appl. 2022, 34, 21319–21333. [CrossRef]
- Zhao, A.; Wang, N.; Niu, X.; Chen, M.; Wu, H. A triplet multimodel transfer learning network for speech disorder screening of Parkinson’s disease. Int. J. Intell. Syst. 2024, 8890592. [CrossRef]
- Zhang, T.; Zhang, Y.; Sun, H.; Shan, H. Parkinson disease detection using energy direction features based on EMD from voice signal. Biocybern. Biomed. Eng. 2021, 41, 127–141. [CrossRef]
- Yao, D.; Chi, W.; Khishe, M. Parkinson’s disease and cleft lip and palate of pathological speech diagnosis using deep convolutional neural networks evolved by IPWOA. Appl. Acoust. 2022, 199, 109003. [CrossRef]
- Amato, F.; Saggio, G.; Cesarini, V.; Olmo, G.; Costantini, G. Machine learning- and statistical-based voice analysis of Parkinson’s disease patients: A survey. Expert Syst. Appl. 2023, 219, 119651. [CrossRef]
- Benba, A.; Jilbab, A.; Sandabad, S.; Hammouch, A. Voice signal processing for detecting possible early signs of Parkinson’s disease in patients with rapid eye movement sleep behavior disorder. Int. J. Speech Technol. 2019, 22, 121–129. [CrossRef]
- Elen, A.; Avuclu, E. A comparison of classification methods for diagnosis of Parkinson’s. Int. J. Intell. Syst. Appl. Eng. 2020, 8, 164–170. [CrossRef]
- Bhatt, K.; Jayanthi, N.; Kumar, M. High-resolution superlet transform based techniques for Parkinson’s disease detection using speech signal. Appl. Acoust. 2023, 214, 109657. [CrossRef]
- Hireš, M.; Gazda, M.; Drotár, P.; Pah, N.D.; Motin, M.A.; Kumar, D.K. Convolutional neural network ensemble for Parkinson’s disease detection from voice recordings. Comput. Biol. Med. 2022, 141, 105021. [CrossRef]
- Jain, A.; Abedinpour, K.; Polat, O.; Çalışkan, M.M.; Asaei, A.; Pfister, F.M.J.; Fietzek, U.M.; Cernak, M. Voice analysis to differentiate the dopaminergic response in people with Parkinson’s disease. Front. Hum. Neurosci. 2021, 15, 667997. [CrossRef]
- Costantini, G.; Cesarini, V.; Di Leo, P.; Amato, F.; Suppa, A.; Asci, F.; Pisani, A.; Calculli, A.; Saggio, G. Artificial intelligence-based voice assessment of patients with Parkinson’s disease off and on treatment: Machine vs. deep-learning comparison. Sensors 2023, 23, 2293. [CrossRef]
- Ali, L.; Zhu, C.; Zhang, Z.; Liu, Y. Automated detection of Parkinson’s disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized neural network. IEEE J. Transl. Eng. Health Med. 2019, 7, 2000410. [CrossRef]
- Azadi, H.; Akbarzadeh-T., M.-R.; Shoeibi, A.; Kobravi, H.R. Evaluating the effect of Parkinson’s disease on jitter and shimmer speech features. Adv. Biomed. Res. 2021, 10, 54. [CrossRef]
- Kumar, Y. A comprehensive analysis of speech recognition systems in healthcare: Current research challenges and future prospects. SN Comput. Sci. 2024, 5, 137. [CrossRef]
- Zhang, J.; et al. Intelligent speech technologies for transcription, disease diagnosis, and medical equipment interactive control in smart hospitals: A review. Comput. Biol. Med. 2023, 153, 106517. [CrossRef]
- Feng, T.; et al. A review of speech-centric trustworthy machine learning: Privacy, safety, and fairness. APSIPA Trans. Signal Inf. Process. 2023, 12, 3. [CrossRef]
- Pahar, M.; et al. COVID-19 detection in cough, breath, and speech using deep transfer learning and bottleneck features. Comput. Biol. Med. 2022, 141, 105153. [CrossRef]
- Loh, H.W.; et al. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Comput. Methods Programs Biomed. 2022, 226, 107161. [CrossRef]
- Khare, V.; et al. Application of data fusion for automated detection of children with developmental and mental disorders: A systematic review of the last decade. Inf. Fusion 2023, 101, 898.
- Ducange, P.; et al. Federated learning of XAI models in healthcare: A case study on Parkinson’s disease. Cogn. Comput. 2024, 16, 3051–3076. [CrossRef]
- Kaissis, G.; et al. End-to-end privacy-preserving deep learning on multi-institutional medical imaging. Nat. Mach. Intell. 2021, 3, 473–484. [CrossRef]
- Karaman, S.; et al. Robust automated Parkinson disease detection based on voice signals with transfer learning. Expert Syst. Appl. 2021, 178, 115013.
- Wali, A.; et al. Generative adversarial networks for speech processing: A review. Comput. Speech Lang. 2022, 71, 101308. [CrossRef]
- Liu, X.; et al. KAN: Kolmogorov-Arnold networks. arXiv 2024, arXiv:2404.19756.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).