Submitted:
16 September 2024
Posted:
16 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Materials and Methods
4. Results
5. Discussion and Conclusions
7. Future work
- Increase the number of samples in languages with smaller datasets.
- Collect more samples from mobile and landline communications to assess if the impact of face masks is also language dependent in these channels.
- Add more languages to the FMVD.
- Study the impact of face masks in more acoustic parameters.
- Finish the ongoing study about the impact of face masks on the FASR approach.
- Development of a mask detector tool from speech signals
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Waghmare, K.; Gawali, B. Speaker Recognition for forensic application: A Review. JPSP 2022, 6, 984–992. [Google Scholar]
- Hari, V.S.S.S.; Annavarapu, A.K.; Shesamsetti, V.; Nalla, S. Comprehensive Research on Speaker Recognition and its Challenges. Proceedings of 2023 3rd International Conference on Smart Data Intelligence (ICSMDI), Trichy, India, 30-31 March 2023; pp. 149–152. [Google Scholar] [CrossRef]
- Basu, N.; Bali, A.S.; Weber, P.; Rosas-Aguilar, C.; Edmond, G.; Martire, K.A.; Morrison, G.S. Speaker identification in courtroom contexts – Part I: Individual listeners compared to forensic voice comparison based on automatic-speaker-recognition technology. Forensic Sci Int, 2022, 341, 111499. [Google Scholar] [CrossRef] [PubMed]
- Drygajlo, A. (École Polytechnique Féderale de Lausanne and School of Forensic Science, Lausanne, Switzerland); Jessen, M. (Federal Criminal Police Office, Forensic Science Institute, Wiesbaden, Germany); Gfroerer, S. (Federal Criminal Police Office, Forensic Science Institute, Wiesbaden, Germany); Wagner, I. (Federal Criminal Police Office, Forensic Science Institute, Wiesbaden, Germany); Vermeulen, J. (Netherlands Forensic Institute, The Hague, Netherlands); Niemiec, D. (Central Forensic Laboratory of the Police, Warsaw, Poland); Niemi, T. (National Bureau of Investigation Forensic Laboratory, Vantaa, Finland) ENFSI Methodological Guidelines for Best Practice in Forensic Semiautomatic and Automatic Speaker Recognition, 2015. (Available: https://enfsi.eu/wp-content/uploads/2016/09/guidelines_fasr_and_fsasr_0.pdf).
- Jessen, M. Forensic voice comparison. In Handbook of Communication in the Legal Sphere; Visconti, J., Ed.; De Gruyter Mouton: Berlin, Germany, 2018. [Google Scholar] [CrossRef]
- Morrison, G. S.; Enzinger, E. Introduction to forensic voice comparison. In The Routledge Handbook of Phonetics; Katz, W.F., Assmann, P.F., Eds.; Routledge: Oxfordshire, England, 2019. [Google Scholar] [CrossRef]
- Hansen, J. H. L.; Hasan, T. Speaker recognition by machines and humans: A tutorial review. IEEE Signal Process Mag 2015, 32, 74–99. [Google Scholar] [CrossRef]
- Wagner, I. (Federal Criminal Police Office, Forensic Science Institute, Wiesbaden, Germany); Boss, D. (Bavarian State Bureau of Investigation Forensic Science Institute, Munich, Germany); Hughes, V. (Department of Language and Linguistic Science, University of York, York, UK); Svirava, T. (The North-Western Regional Centre of Forensic Science of the Ministry of Justice of the Russian Federation, St. Petersburgh, Russian Federation); Siparov, I. (ACUSTEK, Ltd., St. Petersburg, Russia); Rolfes, M. (Berlin State Criminal Police Office, Forensic Science Institute, Berlin, Germany). ENFSI Best Practice Manual for the Methodology of Forensic Speaker Comparison, 2022. (Available: https://enfsi.eu/wp-content/uploads/2022/12/5.-FSA-BPM-003_BPM-for-the-Methodology-1.pdf).
- Gama, R.; Castro, M. E.; Lith-Bijl, J. T. van, Desuter, G. Does the wearing of masks change voice and speech parameters? EuroArch Oto-Ehino-L 2022, 279, 1701–1708. [Google Scholar] [CrossRef] [PubMed]
- Shekaraiah, S.; Suresh, K. Effect of Face Mask on Voice Production During COVID-19 Pandemic: A Systematic Review. J. Voice, 2021, 38, 446–457. [Google Scholar] [CrossRef] [PubMed]
- Cavallaro, G.; Nicola, V. Di; Quaranta, N.; Fiorella, M. L. Acoustic voice analysis in the COVID-19 era. Acta Otorhinolaryngo 2021, 41, 1–5. [Google Scholar] [CrossRef] [PubMed]
- Fiorella, M. L.; Cavallaro, G.; Nicola, V. Di; Quaranta, N. Voice Differences When Wearing and Not Wearing a Surgical Mask. J. Voice 2023, 37, e1–e467. [Google Scholar] [CrossRef] [PubMed]
- Gojayev, E. K.; Büyükatalay, Z. Ç.; Akyüz, T.; Rehan, M.; Dursun, G. The Effect of Masks and Respirators on Acoustic Voice Analysis During the COVID-19 Pandemic. J. Voice 2024, 38, e1–e798. [Google Scholar] [CrossRef] [PubMed]
- Joshi, A.; Procter, T.; Kulesz, P. A. COVID-19: Acoustic Measures of Voice in Individuals Wearing Different Facemasks. J. Voice 2023, 37, e1–e971. [Google Scholar] [CrossRef] [PubMed]
- Lin, Y.; Cheng, L.; Wang, Q.; Xu, W. Effects of Medical Masks on Voice Assessment During the COVID-19 Pandemic. J. Voice 2023, 37, e25–e802. [Google Scholar] [CrossRef] [PubMed]
- Georgiou, G. P. Acoustic markers of vowels produced with different types of face masks. Appl. Acoust. 2022, 191, 108691. [Google Scholar] [CrossRef] [PubMed]
- Magee, M.; Lewis, C.; Noffs, G.; Reece, H.; Chan, J.C.S.; Zaga, C.J.; Paynter, C.; Birchall, O.; Azocar, S.R.; Ediriweera, A.; Kenyon, K.; Caverlé, M.W.; Schultz, B.G.; Vogel, A. Effects of face masks on acoustic analysis and speech perception: Implications for peri-pandemic protocols. J. Acoust. Soc. Am. 2020, 148, 3562–3568. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, D. D.; McCabe, P.; Thomas, D.; Purcell, A.; Doble, M.; Novakovic, D.; Chancon, A.; Madill, C. Acoustic voice characteristics with and without wearing a facemask. Sci. Rep. 2021, 11, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Knowles, T.; Badh, G. The impact of face masks on spectral acoustics of speech: Effect of clear and loud speech styles. J. Acoust. Soc. Am. 2022, 151, 3359–3368. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, D. D.; Chacon, A.; Payten, C.; Black, R.; Sheth, M.; McCabe, P.; Novakovic, D.; Madill, C. Acoustic characteristics of fricatives, amplitude of formants and clarity of speech produced without and with a medical mask. Int. J. Lang. Commun. Disord. 2022, 57, 366–380. [Google Scholar] [CrossRef] [PubMed]
- Fecher, N.; Watt, D. Speaking under cover: The effect of face-concealing garments on spectral properties of fricatives. In Proceedings of the 17th International Congress of Phonetic Sciences, Hong-Kong, Hong-Kong, 17-21 August 2011, 17–21.
- Saigusa, J. The Effects of Forensically Relevant Face Coverings on the Acoustic Properties of Fricatives. LS 2017, 3, 40–52. [Google Scholar] [CrossRef]
- Latoszek, B.B. v.; Jansen, V.; Watts, C. R.; Hetjens, S. The Impact of Protective Face Coverings on Acoustic Markers in Voice: A Systematic Review and Meta-Analysis J. Clin. Med. 2023, 12, 5922. [Google Scholar] [CrossRef]
- Geng, P.; Lu, Q.; Guo, H.; Zeng, J. The effects of face mask on speech production and its implication for forensic speaker identification-A cross-linguistic study. PLoS One 2023, 18, e0283724. [Google Scholar] [CrossRef] [PubMed]
- Ferreira, A. Phonetic-Oriented Identification of Twin Speakers Using 4-Second Vowel Sounds and a Combination of a Shift-Invariant Phase Feature (NRD), MFCCs, and F0 Information. In Proceedings of the 2019 AES International Conference on Audio Forensics, Porto, Portugal, 18-20 June 2019. [Google Scholar]
- Ferreira, A.; Fernandes, V. Consistency of the F0, Jitter, Shimmer and HNR voice parameters in GSM and VOIP communication. In Proceedings of the International Conference on Digital Signal Processing (DSP), London, United Kingdom, 23-25 August 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Vaz-Freitas, S.; Pestana, P. M.; Almeida, V.; Ferreira, A. Acoustic analysis of voice signal: Comparison of four applications software. Biomed. Signal Process Control 2018, 40, 318–323. [Google Scholar] [CrossRef]
- Fernandes, V.; Ferreira, A. On the Relevance of F0, Jitter, Shimmer and HNR acoustic parameters in forensic voice comparisons using GSM, VOIP and contemporaneous high-quality voice recordings. In Proceedings of the 2017 AES International Conference on Audio Forensics, Arlington, United States of America, 15-17 June 2017. [Google Scholar]
- Khan, A.; Javed, A.; Malik, K.M.; Raza, M.A.; Ryan, J.; Saudagar, A.K.J.; Malik, H. Toward Realigning Automatic Speaker Verification in the Era of COVID-19,” Sensors 2022, 22, 2638. [CrossRef]
- Bogdanel, G.; Belghazi-Mohamed, N.; Gómez-Moreno, H.; Lafuente-Arroyo, S. Study on the Effect of Face Masks on Forensic Speaker Recognition. Proceedings of Information and Communications Security: 24th International Conference, ICICS 2022, Canterbury, United Kingdom, 5-8 September 2022; pp. 608–621. [Google Scholar] [CrossRef]
- Ribeiro, V. V.; Dassie-Leite, A. P.; Pereira, E. C.; Santos, A. D. N.; Martins, P.; Irineu, R. de A. Effect of Wearing a Face Mask on Vocal Self-Perception during a Pandemic. J. Voice 2022, 36, e1–e878. [Google Scholar] [CrossRef] [PubMed]








| Countries | Recorder device | Sample rate (kHz) |
|---|---|---|
| ES - Spain | Newer NW-800 | 48 |
| HR - Croatia | Zoom ZDM-1 / Zoom H4n PRO | 8 |
| HU - Hungary | Audio-Technica AT897 | 8 |
| KA - Georgia | Stagg MD-1500 / Philips DVT6000 | 44.1 |
| LT - Lithuania | Marantz PMD660 | 8 |
| PT - Portugal | Behringer B-1 / Newer NW-800Tascam DR-40X / Tascam DR-40 | 44.1 |
| RO - Romania | Olympus ME52W / Behringer B-1 | 8 |
| TR - Türkiye | König K-CM700 / Shure SM48 | 8 |
| UA - Ukraine | Zoom F1-LP | 8 |
| Age classes | [18 - 30] | [31 - 40] | [41 - 50] | [51–+∞) | Total | ||||
|---|---|---|---|---|---|---|---|---|---|
| Languages | Male | Female | Male | Female | Male | Female | Male | Female | |
| DE - German | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 3 |
| ES - Spanish | 0 | 0 | 1 | 1 | 1 | 3 | 3 | 2 | 11 |
| HR - Croatian | 9 | 10 | 11 | 10 | 10 | 12 | 11 | 10 | 83 |
| HU - Hungarian | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 80 |
| KA - Georgian | 10 | 10 | 11 | 9 | 10 | 10 | 8 | 10 | 78 |
| LT - Lithuanian | 15 | 18 | 9 | 8 | 5 | 9 | 6 | 11 | 81 |
| PL - Polish | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 2 | 5 |
| PT - Portuguese | 25 | 32 | 25 | 26 | 27 | 28 | 24 | 25 | 212 |
| RO - Romanian | 13 | 11 | 12 | 16 | 21 | 21 | 24 | 16 | 134 |
| RU - Russian | 0 | 1 | 0 | 1 | 2 | 2 | 3 | 4 | 14 |
| TR - Turkish | 5 | 26 | 21 | 11 | 14 | 3 | 0 | 0 | 80 |
| UA - Ukrainian | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 80 |
| ES - Spanish MB | 0 | 0 | 1 | 1 | 1 | 2 | 3 | 2 | 10 |
| HR - Croatian MB | 4 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 6 |
| LT - Lithuanian MB | 3 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 7 |
| PT - Portuguese MB | 8 | 10 | 16 | 13 | 11 | 11 | 14 | 8 | 91 |
| LT – Lithuanian LL | 17 | 15 | 9 | 8 | 4 | 8 | 6 | 11 | 78 |
| PL – Polish LL | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 2 | 5 |
| RU – Russian LL | 0 | 1 | 0 | 1 | 1 | 2 | 1 | 3 | 9 |
| Language | Sex | Mask type | Linear mixed-effect models results on the acoustic parameters | ||||||
|---|---|---|---|---|---|---|---|---|---|
| F0 | H1-H2 | Intensity | Speech rate | HNR | Jitter | Shimmer | |||
| ES - Spanish | F | NM vs SU | 1.87 | 1.94 | 0.35 | 1.82 | 8.76* | 0.06 | 0.41 |
| NM vs FP | 1.97 | 2.26 | 0.04 | 0.02 | 27.22** | 0.17 | 4.27 | ||
| M | NM vs SU | 0.33 | 0.29 | 7.33 | 0.5 | 0.55 | 0.65 | 1.26 | |
| NM vs FP | 0.05 | 0.94 | 7.09 | 0.29 | 0.48 | 3.17 | 0.12 | ||
| HR - Croatian | F | NM vs SU | 7.89** | 0.31 | 3.11 | 1.29 | 0.35 | 0.49 | 2.15 |
| NM vs FP | 3.96 | 4.53* | 7.95** | 0.06 | 6.99* | 1.87 | 0.85 | ||
| M | NM vs SU | 4.20* | 0.23 | 3.06 | 1.53 | 0.10 | 0.99 | 0.17 | |
| NM vs FP | 17.34*** | 0.58 | 9.90** | 0.01 | 4.64* | 1.66 | 0.34 | ||
| HU - Hungarian | F | NM vs SU | 8.28** | 4.54* | 0.30 | 1.58 | 20.09*** | 2.40 | 23.25*** |
| NM vs FP | 9.54** | 0.01 | 6.23* | 0.85 | 18.22*** | 2.94 | 37.76*** | ||
| M | NM vs SU | 3.49 | 2.51 | 0.28 | 0.01 | 7.92** | 2.84 | 14.89*** | |
| NM vs FP | 7.78** | 0.82 | 1.65 | 3.31 | 18.49*** | 5.42* | 9.27** | ||
| KA - Georgian | F | NM vs SU | 0.09 | 3.76 | 15.73*** | 5.77* | 8.34** | 8.96** | 10.50** |
| NM vs FP | 0.02 | 2.47 | 14.35** | 0.11 | 0.48 | 7.86** | 3.26 | ||
| M | NM vs SU | 6.64* | 0.65 | 0.12 | 0.20 | 1.09 | 0.56 | 0.06 | |
| NM vs FP | 8.06** | 0.54 | 0.52 | 0.75 | 4.13* | 0.00 | 3.67 | ||
| LT - Lithuanian | F | NM vs SU | 4.65* | 0.47 | 4.38* | 6.46* | 11.53** | 0.01 | 0.24 |
| NM vs FP | 2.77 | 0.25 | 5.43* | 2.84 | 9.41** | 0.00 | 0.12 | ||
| M | NM vs SU | 6.07* | 0.66 | 7.70** | 0.01 | 5.14* | 0.80 | 2.00 | |
| NM vs FP | 5.44* | 5.42* | 5.96* | 3.11 | 8.34** | 1.36 | 1.75 | ||
| PT - Portuguese | F | NM vs SU | 61.00*** | 0.62 | 1.69 | 0.45 | 79.72*** | 2.78 | 10.01** |
| NM vs FP | 52.66*** | 0.11 | 0.17 | 21.3*** | 188.27*** | 1.73 | 11.65** | ||
| M | NM vs SU | 103.03*** | 0.16 | 8.15** | 3.80 | 77.54*** | 0.00 | 7.79** | |
| NM vs FP | 95.96*** | 5.56* | 2.55 | 22.67*** | 153.01*** | 2.08 | 11.69** | ||
| RO - Romanian | F | NM vs SU | 2.64 | 3.98 | 0.18 | 11.44** | 1.77 | 3.29 | 7.38** |
| NM vs FP | 1.06 | 11.43** | 6.71* | 32.4*** | 0.15 | 0.72 | 1.1 | ||
| M | NM vs SU | 18.44*** | 0.21 | 0.01 | 7.78** | 0.04 | 0.99 | 8.71** | |
| NM vs FP | 17.89*** | 0.06 | 1.46 | 7.66** | 5.42* | 0.04 | 1.28 | ||
| RU - Russian | F | NM vs SU | 2.43 | 0.03 | 4.71 | 2.20 | 0.14 | 0.84 | 0.33 |
| NM vs FP | 3.62 | 0.06 | 0.21 | 1.71 | 2.04 | 1.45 | 0.56 | ||
| M | NM vs SU | 0.13 | 0.02 | 2.93 | 0.74 | 1.72 | 0.74 | 3.94 | |
| NM vs FP | 0.06 | 0.19 | 1.56 | 0.85 | 1.75 | 0.08 | 1.62 | ||
| TR - Turkish | F | NM vs SU | 0.01 | 0.16 | 0.05 | 9.4** | 14.46*** | 0.01 | 9.17** |
| NM vs FP | 1.89 | 1.60 | 0.76 | 15.56*** | 21.6*** | 0.60 | 7.15* | ||
| M | NM vs SU | 22.92*** | 0.71 | 3.48 | 7.20* | 23.29*** | 0.81 | 20.25*** | |
| NM vs FP | 21.12*** | 1.46 | 0.92 | 1.61 | 55.17*** | 5.47* | 27.93*** | ||
| UA - Ukrainian | F | NM vs SU | 2.63 | 5.39* | 2.25 | 10.96** | 0.01 | 0.15 | 0.77 |
| NM vs FP | 0.88 | 5.73* | 6.28* | 3.42 | 4.01 | 0.04 | 0.19 | ||
| M | NM vs SU | 28.4*** | 0.17 | 12.73** | 13.87** | 0.00 | 0.47 | 10.62** | |
| NM vs FP | 18.16*** | 1.06 | 10.85** | 4.59* | 2.06 | 0.05 | 10.38** | ||
| Language | Sex | Mask type | Linear mixed-effect models results on the acoustic parameters | ||||||
|---|---|---|---|---|---|---|---|---|---|
| F0 | H1-H2 | Intensity | Speech rate | HNR | Jitter | Shimmer | |||
| ES – Spanish MB | F | NM vs SU | 1.61 | 0.16 | 3.40 | 4.00 | 13.53* | 4.66 | 0.77 |
| NM vs FP | 1.09 | 0.10 | 9.99* | 8.04* | 9.03* | 0.61 | 8.31* | ||
| M | NM vs SU | 0.20 | 9.41* | 67.07** | 1.77 | 0.06 | 4.69 | 0.01 | |
| NM vs FP | 0.13 | 8.79* | 52.97** | 0.00 | 0.18 | 8.96* | 0.04 | ||
| PT – Portuguese MB | F | NM vs SU | 24.75*** | 4.17* | 2.20 | 0.23 | 7.18* | 5.19* | 21.36*** |
| NM vs FP | 14.91*** | 6.29* | 6.04** | 2.84 | 17.93*** | 3.84 | 32.94*** | ||
| M | NM vs SU | 44.39*** | 0.14 | 28.62*** | 1.94 | 13.27** | 10.43** | 40.78*** | |
| NM vs FP | 44.48*** | 0.00 | 12.26** | 6.70* | 23.07*** | 3.71 | 37.21*** | ||
| LT – Lithuanian LL | F | NM vs SU | 1.13 | 0.05 | 8.90** | 3.03 | 0.62 | 2.88 | 0.09 |
| NM vs FP | 0.52 | 0.13 | 7.71** | 1.29 | 3.30 | 2.09 | 0.45 | ||
| M | NM vs SU | 5.96* | 1.33 | 3.48 | 0.37 | 3.07 | 1.09 | 0.43 | |
| NM vs FP | 7.95** | 0.06 | 4.62* | 1.13 | 2.60 | 0.59 | 0.07 | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).