Submitted:
27 May 2026
Posted:
29 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- It introduces a new dataset of healthy young adults recorded in non-clinical, naturalistic settings—an age group largely underrepresented in existing voice-biomarker research.
- The utility of these healthy recordings is tested as an external control benchmark against voices with pathology.
1.1. Core Acoustic Measurements
1.2. Healthy Voice Baseline as an External Control
1.3. Sustained Vowel Phonation and Connected Speech Tasks
1.4. Source-Filter Theory for Voice Acoustics Analysis
1.5. Laryngitis Case Study
1.6. Research Scope
- What are the normative acoustic characteristics of sustained vowel phonations and short phrase productions in healthy young adults?
- How do acoustic profiles from healthy young adults diverge from those in the pathological acute laryngitis from the Saarbrucken Voice Database?
- How can paired vowel and phrase data enhance the robustness of voice outcome measures collected in non-clinical settings?
2. Materials and Methods
2.1. Participants and Ethical Approval
2.2. Data Collection
2.3. Signal Processing and Feature Extraction
2.4. Multivariate and Uninvariate Statistics
2.5. Intensity Profiles
2.6. Spectrograph
2.7. Saarbrücken Voice Database for Laryngitis Case Study
3. Results
3.1. Healthy Subject Results: Vowel Pronunciation
| CVS Male vs CVS Female |
CVS Male vs SVD Male |
CVS Female vs SVD Male |
|
|---|---|---|---|
| Wilks’ Λ | 0.1152 | 0.0672 | 0.1591 |
| F (6,9) | 11.516 | 27.752 | 10.576 |
| Pr>F (p-value) | 0.0009 | < 0.0001 | 0.0003 |
3.2. Healthy Subject Results: Phrase Pronunciation
3.3. Comparison of Healthy Subjects and SVD Subject with Laryngitis
3.4. F0 Contour Plots
3.5. Intensity Profiles
3.6. Spectrographs
3.7. MFCC
4. Discussion
- What are the normative acoustic characteristics of sustained vowel phonations and short phrase productions in healthy young adults?
- How do acoustic profiles from healthy young adults diverge from those in the pathological acute laryngitis from the Saarbrucken Voice Database?
- How can paired vowel and phrase data enhance the robustness of voice outcome measures collected in non-clinical settings?
4.1. Acoustical Outcomes and Significance
- Vocal Profile Distinctiveness: The MANOVA results for CVS Male vs. SVD Male yielded a Wilks’ Λ of 0.0672 (p < 0.0001), indicating that the vocal profiles are significantly different.
- F0 Variations: In vowel pronunciation, the SVD cohort (20.3 years old) exhibited a mean F0 of 131.78 Hz, which is notably higher than the CVS Male mean of 116.93 Hz.
- HNR and Stability: The SVD cohort showed a higher Harmonic-to-Noise Ratio (HNR) of 19.409 compared to the CVS Male (10.477) and CVS Female (13.581).
- Duration and Sustention: A marked difference was observed in Maximum Phonation Time (MPT); the SVD cohort averaged 1.353 seconds for vowel pronunciation, while CVS cohorts remained at or below 0.500 seconds.
- Phrase Production: Similar trends persisted in phrase productions, where the SVD cohort maintained a distinct profile from the CVS cohorts (p < 0.0001), suggesting that geographic or linguistic factors may influence normative "healthy" values.
4.2. Comparative Healthy vs. Pathological Vocal Characteristics
4.3. Paired Vowel and Phrase Data
4.4. Limitations and Challenges
- In the current study, data collection was conducted in a university conference room situated adjacent to a high-traffic public area. Background noise adds random energy that Praat interprets as "perturbation," thus increasing Jitter (to 0.8%)and Shimmer (to 5%). We acknowledge that the acoustic environment was not clinically isolated; consistent with prior literature, the presence of ambient noise in this setting may have artificially inflated the calculated values for jitter and shimmer.
- Discriminatory Power of Acoustics: As noted in recent literature [83], high variability in vowel acoustics (even for standard vowels such as /a/) suggests that acoustic measures alone may not be fully adequate for discriminating between healthy and disordered speech without supplementary modalities. A significantly larger dataset is required for accurately predict the potentially large number of parameters.
- Hardware and Environmental Variance: The reference dataset is subject to unknown variability in recording hardware and acoustic environments. These inconsistencies complicate direct comparisons with the locally collected healthy controls.
- Linguistic Mismatch: There is a linguistic divergence between the pathological dataset (German) and the control group (English). While the control group included diverse accents, the fundamental phonetic differences between languages may introduce confounding variables in formant and prosodic feature extraction. Within reported results can be difficult to differentiate between cultural accents and the detection and influence emotional state [72].
- Class Imbalance: The laryngitis subset comprises a relatively small fraction of the total pathological data, potentially limiting the statistical robustness of the analysis for this specific condition.
- Lack of Deep Phenotyping: The utility of the dataset for supervised machine learning is constrained by a lack of detailed clinical annotation. The data lacks metadata regarding severity, duration, or clinician-confirmed perceptual scores, preventing a deeper analysis of how acoustic features correlate with disease progression.
- Algorithm Performance: Our selection of the commonly used Pratt software provided convenience in streamlining analytic workflow, but research has shown that other software packages may not provide equivalent outcomes[35].
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| F0 | Fundamental frequency |
| F3 | Third formant frequency |
| Jitter | Jitter variance |
| Shimmer | Cycle to cycle variation in voice amplitude |
| HNR | Harmonic-to-noise ratio |
| Intensity | Energy transmitted by vocal vibrations |
| CPP | Cepstral Peak Prominence |
| MFCCS | Mel-Frequency Cepstral Coefficient |
| RSPL | Root-Mean-Square Sound Pressure Level |
| MPT | Maximum Phonation Time |
Appendix A
| Acoustic Parameter | Definition | Significance |
|---|---|---|
| Fundamental Frequency (F0) | Instantaneous vibration rate of the vocal folds (Hz) | Because human speech is dynamic, F0 changes constantly as during speech to create intonation, emphasis, and emotion. |
| Mean Fundamental Frequency | Average rate of vocal fold vibration across a sustained sound or speech sample. | It reflects overall pitch control and can shift with inflammation, strain, or other abnormalities. |
| Minimum Fundamental Frequency | Minimum Fundamental Frequency is the lowest pitch produced for a sample. | Indicates the lower limit of vocal fold vibration and drop w/ edema or impaired vibratory control. |
| Jitter Variance | Fundamental frequency variation over a period of time. | Measure of pitch stability; relevant to vowels, not phrases. Measured as a fraction or as a percentage. |
| Harmonics-to-Noise Ratio (HNR) | HNR quantifies how much harmonic (periodic) energy exists compared to noise. | Low HNR signals increased vocal irregularity and potential pathology. |
| Formant Frequencies (F1, F2, F3) | Formants are the resonant frequency peaks shaped by the vocal tract during speech. F1, F2, and F3 are the first three resonant frequency peaks that shape vowel sounds. | Reflect articulatory configuration and are key for distinguishing vowels and detecting pathological changes. They reveal articulatory placement and can shift in predictable ways for vocal tract disrupted by pathology. |
| Intensity | Refers to a physical measure as the energy transmitted by vocal vibrations. | It reflects the amplitude of vocal fold oscillations and is significant because variations in vocal loudness can indicate pathology. |
| Cepstral Peak Prominence (CPP) | An acoustic measure that quantifies the strength and clarity of the harmonic structure in the voice signal | It reflects vocal quality and stability, with lower CPP often linked to dysphonia or voice disorders. |
| Mel-Frequency Cepstral Coefficients (MFCC) | Features derived from the short-term power spectrum of speech related sound frequency perception. | MFCC’s represent how humans perceive sound frequencies, typically expressed through 13 coefficient, each representing unique vocal tract characteristics. |
| Root-Mean-Square Sound Pressure Level (RSPL) | Average acoustic energy of a voice signal, reflecting vocal loudness and stability over time | RSPL is significant as a biomarker because abnormal variations can indicate vocal fatigue, respiratory issues, or neurological disorders. |
| Maximum Phonation Time (MPT) | The longest duration a person can sustain a vowel sound on one breath. | Reflects respiratory support, vocal fold efficiency, and phonatory control, significant for assessing vocal function and detecting respiratory or laryngeal disorders. |
| Feature | Sustained Vowels | Connected Speech |
|---|---|---|
| Type of Vocal Task | Isolated phonation | Dynamic speech production |
| Primary Info Captured |
Vocal fold vibratory stability, laryngeal function | Articulatory coordination, prosody, natural voice use |
| Acoustic Parameters Typically Measured | F0, cepstral peak prominence (CPP), Jitter, Shimmer, HNR, SPL, and MPT | Fundamental frequency F0 variability, SPL range, CPP in context. |
| Advantages | Controlled, repeatable, less articulatory influence, useful for multilingual analysis | Reflects real-life communication, captures dynamic vocal attributes, may be more reliable for qualities like hoarseness |
| Limitations/ Considerations |
May not reflect natural voice use, less dynamic information, many measures east to extract. | More complex analysis, influenced by speaking rate, intonation, and articulation, thus feature extraction can be more difficult |





References
- Bensoussan, Y.; Sigaras, A.; Rameau, A.; Elemento, O.; Powell, M.; Dorr, D.; Payne, P.; Ravitsky, V.; Bélisle-Pipon, J.-C.; Johnson, A.; et al. Bridge2AI-Voice: An Ethically-Sourced, Diverse Voice Dataset Linked to Health Information.
- Lyberg-Åhlander, V.; Rydell, R.; Fredlund, P.; Magnusson, C.; Wilén, S. Prevalence of Voice Disorders in the General Population, Based on the Stockholm Public Health Cohort. J Voice 2019, 33, 900–905. [CrossRef]
- Skodda, S.; Grönheit, W.; Mancinelli, N.; Schlegel, U. Progression of Voice and Speech Impairment in the Course of Parkinson’s Disease: A Longitudinal Study. Parkinsons Dis 2013, 2013, 389195. [CrossRef]
- Solomon, C.; Valstar, M.; Morriss, R.; Crowe, J. Objective Methods for Reliable Detection of Concealed Depression. Frontiers in ICT 2015, 2. [CrossRef]
- Fagherazzi, G.; Fischer, A.; Ismael, M.; Despotovic, V. Voice for Health: The Use of Vocal Biomarkers from Research to Clinical Practice. Digit Biomark 2021, 5, 78–88. [CrossRef]
- Cordella, F. The Sounds of Health: Harnessing Vocal Biomarkers for Scalable Health Tracking Available online: https://www.eitdigital.eu/newsroom/grow-digital-insights/the-sounds-of-health-harnessing-vocal-biomarkers-for-scalable-health-tracking/ (accessed on 12 July 2025).
- O’Connell, K. 5 Vocal Biomarker Trends to Watch in 2025. Canary Speech 2025.
- Available online: https://stimmdb.coli.uni-saarland.de/ (accessed on 26 November 2025).
- Koreman, J. A German Database of Patterns of Pathological Vocal Fold Vibration. Phonus. Saarbrücken, Institut für … 1997.
- Pützer, M.; Barry, W.J. Saarbruecken Voice Database 2008.
- Brockmann-Bauser, M.; de Paula Soares, M.F. Do We Get What We Need from Clinical Acoustic Voice Measurements? Applied Sciences 2023, 13, 941. [CrossRef]
- Patel, R.R.; Awan, S.N.; Barkmeier-Kraemer, J.; Courey, M.; Deliyski, D.; Eadie, T.; Paul, D.; Švec, J.G.; Hillman, R. Recommended Protocols for Instrumental Assessment of Voice: American Speech-Language-Hearing Association Expert Panel to Develop a Protocol for Instrumental Assessment of Vocal Function. American Journal of Speech-Language Pathology 2018, 27, 887–905. [CrossRef]
- Saggio, G.; Costantini, G. Worldwide Healthy Adult Voice Baseline Parameters: A Comprehensive Review. Journal of Voice 2022, 36, 637–649. [CrossRef]
- Guimarães, I.; Abberton, E. Health and Voice Quality in Smokers: An Exploratory Investigation. Logopedics Phoniatrics Vocology 2005, 30, 185–191. [CrossRef]
- Mizuta, M.; Abe, C.; Taguchi, E.; Takeue, T.; Tamaki, H.; Haji, T. Validation of Cepstral Acoustic Analysis for Normal and Pathological Voice in the Japanese Language. Journal of Voice 2022, 36, 770–776. [CrossRef]
- Jetté, M. Toward an Understanding of the Pathophysiology of Chronic Laryngitis. Perspect ASHA Spec Interest Groups 2016, 1, 14–25. [CrossRef]
- O’Connell, N.S.; Dai, L.; Jiang, Y.; Speiser, J.L.; Ward, R.; Wei, W.; Carroll, R.; Gebregziabher, M. Methods for Analysis of Pre-Post Data in Clinical Research: A Comparison of Five Common Methods. J Biom Biostat 2017, 8, 1–8. [CrossRef]
- Thorlund, K.; Dron, L.; Park, J.J.H.; Mills, E.J. Synthetic and External Controls in Clinical Trials – A Primer for Researchers. Clin Epidemiol 2020, 12, 457–467. [CrossRef]
- De Los Reyes, A.; Kazdin, A. When the Evidence Says, “Yes, No, and Maybe So.” Current directions in psychological science 2008, 17, 47–51. [CrossRef]
- Gerratt, B.R.; Kreiman, J.; Garellek, M. Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech. J Speech Lang Hear Res 2016, 59, 994–1001. [CrossRef]
- Behlau, M.; Madazio, G.; Yamasaki, R. Dynamic Vocal Analysis: Vocal Functionality Evaluation. Codas 35, e20210083. [CrossRef]
- Goy, H.; Fernandes, D.N.; Pichora-Fuller, M.K.; Lieshout, P. van Normative Voice Data for Younger and Older Adults. Journal of Voice 2013, 27, 545–555. [CrossRef]
- Rodrigo, I.; Duñabeitia, J.A. Listening to the Mind: Integrating Vocal Biomarkers into Digital Health. Brain Sciences 2025, 15, 762. [CrossRef]
- Glaspey, A.M.; Wilson, J.J.; Reeder, J.D.; Tseng, W.-C.; MacLeod, A.A.N. Moving Beyond Single Word Acquisition of Speech Sounds to Connected Speech Development With Dynamic Assessment. Journal of Speech, Language, and Hearing Research 2022, 65, 508–524. [CrossRef]
- Lowie, W.; Verspoor, M. A Dynamic Systems Theory Approach to Second Language Acquisition. Bilingualism: Language and Cognition 2007, 10, 7–21. [CrossRef]
- Kent, R.D. Vocal Tract Acoustics. Journal of Voice 1993, 7, 97–117. [CrossRef]
- Jongman, A. Acoustic Phonetics II: Source-Filter Theory of Speech Production. Speech Prosody Studies Group 2023.
- Available online: https://www.wiley.com/en-us/The+Handbook+of+Phonetic+Sciences%2C+2nd+Edition-p-9781405145909 (accessed on 27 April 2026).
- Available online: https://www.izharishaksa.com/blog/harvard-sentences-complete-guide (accessed on 30 April 2026).
- Vogel, A.P.; Maruff, P. Comparison of Voice Acquisition Methodologies in Speech Research. Behav Res Methods 2008, 40, 982–987. [CrossRef]
- Awan, S.N.; Shaikh, M.A.; Awan, J.A.; Abdalla, I.; Lim, K.O.; Misono, S. Smartphone Recordings Are Comparable to “Gold Standard” Recordings for Acoustic Measurements of Voice. Journal of Voice 2025, 39, 1019–1032. [CrossRef]
- Acad. Transcr. Serv. 2022.
- Available online: https://www.fon.hum.uva.nl/praat/ (accessed on 27 November 2025).
- Burris, C.; Vorperian, H.; Fourakis, M.; Kent, R.; Bolt, D. Quantitative and Descriptive Comparison of Four Acoustic Analysis Systems: Vowel Measurements. Journal of Speech, Language, and Hearing Research 2014, 57, 26–45. [CrossRef]
- Amir, O.; Wolf, M.; Amir, N. A Clinical Comparison between Two Acoustic Analysis Softwares: MDVP and Praat. Biomedical Signal Processing and Control 2009, 4, 202–205. [CrossRef]
- Parsa, V.; Jamieson, D.G. A Comparison of High Precision F0 Extraction Algorithms for Sustained Vowels. J Speech Lang Hear Res 1999, 42, 112–126. [CrossRef]
- Ramadhina, D.; Magdalena, R.; Saidah, S. Individual Identification Through Voice Using Mel-Frequency Cepstrum Coefficient (MFCC) and Hidden Markov Models (HMM) Method. Journal of Measurements, Electronics, Communications, and Systems 2020, 7, 26. [CrossRef]
- Banuroopa, K.; Shanmuga Priyaa, D. MFCC Based Hybrid Fingerprinting Method for Audio Classification through LSTM. International Journal of Nonlinear Analysis and Applications 2021, 12, 2125–2136. [CrossRef]
- Alkhatib, B.; Eddin, M. Voice Identification Using MFCC and Vector Quantization. Baghdad Science Journal 2020, 17, 1019. [CrossRef]
- Tracey, B.; Volfson, D.; Glass, J.; Haulcy, R.; Kostrzebski, M.; Adams, J.; Kangarloo, T.; Brodtmann, A.; Dorsey, E.R.; Vogel, A. Towards Interpretable Speech Biomarkers: Exploring MFCCs. Sci Rep 2023, 13, 22787. [CrossRef]
- Vasquez-Serrano, P.; Reyes-Moreno, J.; Guido, R.C.; Sepúlveda-Sepúlveda, A. MFCC Parameters of the Speech Signal: An Alternative to Formant-Based Instantaneous Vocal Tract Length Estimation. Journal of Voice 2025, 39, 1431–1439. [CrossRef]
- Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 2000, 101, E215-220. [CrossRef]
- Cesari, U.; De Pietro, G.; Marciano, E.; Niri, C.; Sannino, G.; Verde, L. A New Database of Healthy and Pathological Voices. Computers & Electrical Engineering 2018, 68, 310–321. [CrossRef]
- de Felippe, A.C.N.; Grillo, M.H.M.M.; Grechi, T.H. Standardization of Acoustic Measures for Normal Voice Patterns. Brazilian Journal of Otorhinolaryngology 2006, 72, 659–664. [CrossRef]
- Hippargekar, P.; Bhise, S.; Kothule, S.; Shelke, S. Acoustic Voice Analysis of Normal and Pathological Voices in Indian Population Using Praat Software. Indian J Otolaryngol Head Neck Surg 2022, 74, 5069–5074. [CrossRef]
- Demirhan, E.; Unsal, E.M.; Yilmaz, C.; Ertan, E. Acoustic Voice Analysis of Young Turkish Speakers. J Voice 2016, 30, 378.e21-25. [CrossRef]
- Vreča, J.; Pilipović, R.; Biasizzo, A. Hardware–Software Co-Design of an Audio Feature Extraction Pipeline for Machine Learning Applications. Electronics 2024, 13, 875. [CrossRef]
- Ma, E.P.-M.; Love, A.L. Electroglottographic Evaluation of Age and Gender Effects during Sustained Phonation and Connected Speech. J Voice 2010, 24, 146–152. [CrossRef]
- Biever, D.M.; Bless, D.M. Vibratory Characteristics of the Vocal Folds in Young Adult and Geriatric Women. Journal of Voice 1989, 3, 120–131. [CrossRef]
- Brown, W.S.; Morris, R.J.; Michel, J.F. Vocal Jitter in Young Adult and Aged Female Voices. Journal of Voice 1989, 3, 113–119. [CrossRef]
- Ferrand, C.T. Harmonics-to-Noise Ratio: An Index of Vocal Aging. J Voice 2002, 16, 480–487. [CrossRef]
- Banh, J.; Naumenko, K.; Goy, H.; Van Lieshout, P.; Fernandes, D.; Pichora-Fuller, K. Establishing Normative Voice Characteristics of Younger and Older Adults. Canadian Acoustics - Acoustique Canadienne 2009, 37, 190–191.
- Dwire, A.; McCauley, R. Repeated Measures of Vocal Fundamental Frequency Perturbation Obtained Using the Visi-Pitch. J Voice 1995, 9, 156–162. [CrossRef]
- Brockmann, M.; Drinnan, M.J.; Storck, C.; Carding, P.N. Reliable Jitter and Shimmer Measurements in Voice Clinics: The Relevance of Vowel, Gender, Vocal Intensity, and Fundamental Frequency Effects in a Typical Clinical Task. J Voice 2011, 25, 44–53. [CrossRef]
- Teixeira, J.P.; Oliveira, C.; Lopes, C. Vocal Acoustic Analysis – Jitter, Shimmer and HNR Parameters. Procedia Technology 2013, 9, 1112–1122. [CrossRef]
- Lovato, A.; Colle, W.D.; Giacomelli, L.; Piacente, A.; Righetto, L.; Marioni, G.; Filippis, C. de Multi-Dimensional Voice Program (MDVP) vs Praat for Assessing Euphonic Subjects: A Preliminary Study on the Gender-Discriminating Power of Acoustic Analysis Software. Journal of Voice 2016, 30, 765.e1-765.e5. [CrossRef]
- Fernandes, J.; Teixeira, F.; Guedes, V.; Junior, A.; Teixeira, J. Harmonic to Noise Ratio Measurement - Selection of Window and Length. Procedia Computer Science 2018, 138, 280–285. [CrossRef]
- Sheena; Mary, B.B.; Aswin, V.A.; Suprent, A. Variation of Harmonics to Noise Ratio from the Age Range of 9–18 Years Old in Both the Genders. Indian J Otolaryngol Head Neck Surg 2022, 74, 5518–5523. [CrossRef]
- Orlikoff, R.F.; Kahane, J.C. Influence of Mean Sound Pressure Level on Jitter and Shimmer Measures. Journal of Voice 1991, 5, 113–119. [CrossRef]
- Bele, I.V. The Speaker’s Formant. Journal of Voice 2006, 20, 555–578. [CrossRef]
- Kent, R.D.; Vorperian, H.K. Static Measurements of Vowel Formant Frequencies and Bandwidths: A Review. J Commun Disord 2018, 74, 74–97. [CrossRef]
- Maurer, D. Acoustics of the Vowel; 2016; ISBN 978-3-0343-2391-8.
- Aalto, D.; Aaltonen, O.; Happonen, R.-P.; Jääsaari, P.; Kivelä, A.; Kuortti, J.; Luukinen, J.-M.; Malinen, J.; Murtola, T.; Parkkola, R.; et al. Large Scale Data Acquisition of Simultaneous MRI and Speech. Applied Acoustics 2014, 83, 64–75. [CrossRef]
- Tang, D.; Niziolek, C.A.; Parrell, B. Formant Variability Is Related to Vowel Duration across Speakers. JASA Express Lett. 2025, 5, 115202. [CrossRef]
- Buckley, D.P.; Abur, D.; Stepp, C.E. Normative Values of Cepstral Peak Prominence Measures in Typical Speakers by Sex, Speech Stimuli, and Software Type Across the Life Span. Am J Speech Lang Pathol 2023, 32, 1565–1577. [CrossRef]
- Murton, O.; Hillman, R.; Mehta, D. Cepstral Peak Prominence Values for Clinical Voice Evaluation. American Journal of Speech-Language Pathology 2020, 29, 1596–1607. [CrossRef]
- Anand, S.; Kopf, L.M.; Shrivastav, R.; Eddins, D.A. Using Pitch Height and Pitch Strength to Characterize Type 1, 2, and 3 Voice Signals. J Voice 2021, 35, 181–193. [CrossRef]
- Brewer, C. Norms For Voice, Motor Speech, & Resonance Assessments Available online: https://theadultspeechtherapyworkbook.com/norms-for-voice/ (accessed on 14 July 2025).
- Stathopoulos, E.T.; Huber, J.E.; Sussman, J.E. Changes in Acoustic Characteristics of the Voice Across the Life Span: Measures From Individuals 4–93 Years of Age. Journal of Speech, Language, and Hearing Research 2011, 54, 1011–1021. [CrossRef]
- Abraham, E.A.; Geetha, A. Acoustical and Perceptual Analysis of Voice in Individuals with Parkinson’s Disease. Indian J Otolaryngol Head Neck Surg 2023, 75, 427–432. [CrossRef]
- Burridge, J.; Vaux, B. Brownian Dynamics for the Vowel Sounds of Human Language. Phys. Rev. Research 2020, 2, 013274. [CrossRef]
- Rabiei, M.; Gasparetto, A. A Methodology for Recognition of Emotions Based on Speech Analysis, for Applications to Human-Robot Interaction. An Exploratory Study. Paladyn, Journal of Behavioral Robotics 2014, 5. [CrossRef]
- Patel, R.R.; Awan, S.N.; Barkmeier-Kraemer, J.; Courey, M.; Deliyski, D.; Eadie, T.; Paul, D.; Švec, J.G.; Hillman, R. Recommended Protocols for Instrumental Assessment of Voice: American Speech-Language-Hearing Association Expert Panel to Develop a Protocol for Instrumental Assessment of Vocal Function. American Journal of Speech-Language Pathology 2018, 27, 887–905. [CrossRef]
- Titze, I.R.; Baken, R.J.; Bozeman, K.W.; Granqvist, S.; Henrich, N.; Herbst, C.T.; Howard, D.M.; Hunter, E.J.; Kaelin, D.; Kent, R.D.; et al. Toward a Consensus on Symbolic Notation of Harmonics, Resonances, and Formants in Vocalization. J Acoust Soc Am 2015, 137, 3005–3007. [CrossRef]
- Kent, R. The MIT Encyclopedia of Communication Disorders; 2003; ISBN 978-0-262-27702-0.
- Anand, S.; Kopf, L.M.; Shrivastav, R.; Eddins, D.A. Using Pitch Height and Pitch Strength to Characterize Type 1, 2, and 3 Voice Signals. J Voice 2021, 35, 181–193. [CrossRef]
- Brinca, L.F.; Batista, A.P.F.; Tavares, A.I.; Gonçalves, I.C.; Moreno, M.L. Use of Cepstral Analyses for Differentiating Normal from Dysphonic Voices: A Comparative Study of Connected Speech versus Sustained Vowel in European Portuguese Female Speakers. J Voice 2014, 28, 282–286. [CrossRef]
- Tracey, B.; Volfson, D.; Glass, J.; Haulcy, R.; Kostrzebski, M.; Adams, J.; Kangarloo, T.; Brodtmann, A.; Dorsey, E.R.; Vogel, A. Towards Interpretable Speech Biomarkers: Exploring MFCCs. Sci Rep 2023, 13, 22787. [CrossRef]
| Cohort | Status | Sex | Age | ||
|---|---|---|---|---|---|
| M/F | Mean (%) | n (SD) | Range | ||
| CWRU | Healthy | M | 21 (65.6%) | 20.1 (1.5) | 18-24 |
| (N=32) | Healthy | F | 11 (34.4%) | 20.5 (0.7) | 19-21 |
| Sarbruucken | Healthy | M | N=21 | 20.3 (0.7) | 20-21 |
| Laryngitis | M | N=21 | 55.0 (4.0) | 50-60 | |
| Statistic | CVS Female Healthy Age: 20.5 (0.7) |
CVS Male Healthy Age: 20.1 (1.5) |
SVD Healthy Age 20.3 (0.7) |
|||
|---|---|---|---|---|---|---|
| F0 | 215.78 | (26.67) | 116.93 | (26.80) | 131.78 | (32.52) |
| F3 | 2784.38 | (423.12) | 2662.57 | (167.25) | 2482.14 | (330.84) |
| Jitter | 0.00567 | (< 0.001) | 0.00742 | (<0.001) | 0.00421 | (<1e-5) |
| Shimmer | 0.04550 | (<0.001) | 0.04291 | (<0.001) | 0.03200 | (0.02) |
| HNR | 13.581 | (2.87) | 10.477 | (3.52) | 19.409 | (3.53) |
| Intensity | 78.760 | (3.01) | 81.491 | (2.45) | 76.947 | (2.99) |
| CPP | 26.403 | (2.84) | 28.811 | (3.51) | 29.470 | (3.81) |
| MFCCS-1 | 167.67 | (28.98) | 112.329 | (35.95) | 216.196 | (42.07) |
| RSPL | 14.724 | (3.00) | 12.406 | (2.37) | 16.911 | (3.01) |
| MPT | 0.473 | (0.13) | 0.500 | (0.19) | 1.353 | (0.37) |
| Healthy General | Female Healthy | Male Healthy | |||
|---|---|---|---|---|---|
| Biever (1989) | 193.7 | Ferand (1997) | 209.7 | Gray (2008) | 125.0 |
| Brown (1989) | 211.0 | Bahn (2009) | 222.9 | Bahn (2009) | 177.8 |
| Fellippe (2006) | 162.8 | Hippargeka (2022) | 226.0 | Hippargeka (2022) | 131.6 |
| Ma (2010) | 224.1 | Davies (2015) | 125.0 | ||
| Average | 189.2 | 220.7 | 124.8 | ||
| Current work | 215.7 | 116.9 | |||
| Statistic | CVS Female Healthy Age: 20.5 (0.7) |
CVS Male Healthy Age: 20.1 (1.5) |
SVD Healthy Age 20.3 (0.7) |
|||
|---|---|---|---|---|---|---|
| F0 | 184.308 | (18.17) | 116.871 | (22.32) | 136.775 | (28.35) |
| Jitter | 0.0224 | (0.0039) | 0.10224 | (0.0118) | 0.02522 | (0.0062) |
| Shimmer | 0.0940 | (0.0012) | 4.8537 | (1.744) | 0.91726 | (0.0205) |
| HNR | 7.1518 | (1.540) | 76.071 | (2.055) | 10.173 | (2.35) |
| Intensity | 74.373 | (1.725) | 17.110 | (5.375) | 73.875 | (2.22) |
| CPP | 16.239 | (4.683) | 172.388 | (28.439) | 19.360 | (5.72) |
| MFCCS-1 | 215.626 | (25.010) | 17.499 | (2.212) | 298.069 | (31.10) |
| RSPL | 18.739 | (1.795) | 2.391 | (0.464) | 20.196 | (2.33) |
| MPT | 2.517 | (0.304) | 0.10224 | (0.0118) | 1.628 | (0.22) |
| CVS Male vs CVS Female |
CVS Male vs SVD Male |
CVS Female vs SVD Male |
|
|---|---|---|---|
| Wilks’ Λ | 0.1617 | 0.2088 | 0.1308 |
| F (6, 16) | 13.829 | 11.9982 | 17.7171 |
| Pr>F (p-value) | < 0.0001 | < 0.0001 | < 0.0001 |
| Healthy CVS Male vs SVD Male w/Laryngitis |
Healthy CVS Female vs SVD Male w/Laryngitis |
Healthy SVD Male vs SVD Male w/Laryngitis |
|
|---|---|---|---|
| Wilks’ Λ | 0.1106 | 0.1304 | 0.6109 |
| F | 16.089 | 13.338 | 1.592 |
| Pr>F (p-value) | < 0.0001 | 0.0001 | 0.2170 |
| Healthy CVS Male vs SVD Male w/Laryngitis |
Healthy CVS Female vs SVD Male w/Laryngitis |
Healthy SVD Male vs SVD Male w/Laryngitis |
|
|---|---|---|---|
| Wilks’ Λ | 0.2437 | 0.1152 | 0.8160 |
| F | 12.414 | 30.707 | 1.015 |
| Pr>F (p-value) | < 0.0001 | < 0.0001 | 0.4366 |
| Cohort | F1 | F2 | F3 |
|---|---|---|---|
| Theatre Group (age varies) [60] | 496 | 1368 | 2506 |
| Female Youth Healthy [61] | 625 | 2050 | 3050 |
| Female Adult Healthy [62] | 717 | 2501 | 3289 |
| CVS Healthy Female | 905 | 1393 | 2784 |
| Male Adult Healthy [63] | 269 | 2143 | 3182 |
| Male Adult Healthy [62] | 588 | 1952 | 2601 |
| Healthy Male [8] | 626 | 1145 | 2482 |
| Laryngitis Male [8] | 599 | 1114 | 2602 |
| CVS Healthy Male | 736 | 1243 | 2662 |
| Coefficient of variation (overall) | 0.28 | 0.31 | 0.11 |
| Cohort | Vowel | Phrase | ||
|---|---|---|---|---|
| CVS Female Healthy | 26.40 | (2.84) | 16.24 | (4.68) |
| CVS Male Healthy | 28.81 | (3.51) | 17.11 | (5.38) |
| SVD Male Healthy | 29.41 | (3.81) | 19.36 | (5.72) |
| SVD Male Laryngitis | 25.64 | (4.08) | 18.13 | (6.05) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.