Submitted:
01 October 2024
Posted:
01 October 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Contribute to the improvement of proficiency tests and collaborative exercises within the FSAAWG.
- Address the lack of suitable reference populations given the variability of languages spoken in Europe.
- Evaluate the impact of face masks widely used during the COVID-19 pandemic on forensic speaker recognition.
2. Related Work
- when confronted with the presence of face masks, the five comparative Machine Learning based, and an additional ensemble-based, ASR classifiers severely misclassified speech samples recorded using cloth masks;
- the ASR system's performance is degraded when the distance between mouth and detector increases, either with or without face masks;
- the type of microphone can adversely affect the ASR system performance, and when subjects with different types of masks were tested, the equal error rate increased even more.
3. Materials and Methods
- Text reading samples where each person was recorded reading a selected text without a mask (NM), wearing a surgical mask (SU), and wearing an FFP2 mask (FP) in their native language. Each volunteer was also asked to read the same text, without a mask, in a non-native language(s) that he/she speaks.
- Dialogue speech samples collected from the same individuals in their native language and also speaking in non-native language(s), always without face masks. If the individual does not speak proficiently a non-native language(s) this sample(s) could be ignored.
4. Results
4.1. EER Results by Language and Sex
4.2. EER Results between Languages
4.3. Cllr Results by Language and Sex
4.4. Cllr (CV) Results between Languages
5. Discussion/Conclusions
5. Future work
- Test if the system performance affected when comparisons are made with samples obtained via mobile and landline communications.
- Add more languages to the FMVD.
- Development of a mask detector tool from speech signals.
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Waghmare, K.; Gawali, B. Speaker Recognition for forensic application: A Review. JPSP 2022, 6, 984–992. [Google Scholar]
- Drygajlo, A. (École Polytechnique Féderale de Lausanne and School of Forensic Science, Lausanne, Switzerland); Jessen, M. (Federal Criminal Police Office, Forensic Science Institute, Wiesbaden, Germany); Gfroerer, S. (Federal Criminal Police Office, Forensic Science Institute, Wiesbaden, Germany); Wagner, I. (Federal Criminal Police Office, Forensic Science Institute, Wiesbaden, Germany); Vermeulen, J. (Netherlands Forensic Institute, The Hague, Netherlands); Niemiec, D. (Central Forensic Laboratory of the Police, Warsaw, Poland); Niemi, T. (National Bureau of Investigation Forensic Laboratory, Vantaa, Finland). ENFSI Methodological Guidelines for Best Practice in Forensic Semiautomatic and Automatic Speaker Recognition, 2015. (Available: https://enfsi.eu/wp-content/uploads/2016/09/guidelines_fasr_and_fsasr_0.pdf).
- Jessen, M. Forensic voice comparison. In Handbook of Communication in the Legal Sphere; Visconti, J., Ed.; De Gruyter Mouton: Berlin, Germany, 2018; pp. 219–255. [Google Scholar] [CrossRef]
- Wagner, I. (Federal Criminal Police Office, Forensic Science Institute, Wiesbaden, Germany); Boss, D. (Bavarian State Bureau of Investigation Forensic Science Institute, Munich, Germany); Hughes, V. (Department of Language and Linguistic Science, University of York, York, UK); Svirava, T. (The North-Western Regional Centre of Forensic Science of the Ministry of Justice of the Russian Federation, St. Petersburgh, Russian Federation); Siparov, I. (ACUSTEK, Ltd., St. Petersburg, Russia); Rolfes, M. (Berlin State Criminal Police Office, Forensic Science Institute, Berlin, Germany). ENFSI Best Practice Manual for the Methodology of Forensic Speaker Comparison, 2022. (Available: https://enfsi.eu/wp-content/uploads/2022/12/5.-FSA-BPM-003_BPM-for-the-Methodology-1.pdf).
- Basu, N.; Bali, A.S.; Weber, P.; Rosas-Aguilar, C.; Edmond, G.; Martire, K.A.; Morrison, G.S. Speaker identification in courtroom contexts – Part I: Individual listeners compared to forensic voice comparison based on automatic-speaker-recognition technology. Forensic Sci. Int. 2022, 341, 111499. [Google Scholar] [CrossRef] [PubMed]
- Morrison, G. S.; Enzinger, E. Introduction to forensic voice comparison. In The Routledge Handbook of Phonetics; Katz, W.F., Assmann, P.F., Eds.; Routledge: Oxfordshire, UK, 2019; pp. 599–634. [Google Scholar] [CrossRef]
- Hansen, J. H. L.; Hasan, T. Speaker recognition by machines and humans: A tutorial review. IEEE Signal Process. Mag. 2015, 32, 74–99. [Google Scholar] [CrossRef]
- Morrison, G.S.; Sahito, F.H.; Jardine, G.; Djokic, D.; Clavet, S.; Berghs, S.; Dorny, C.G. INTERPOL survey of the use of speaker identification by law enforcement agencies. Forensic Sci. Int. 2016, 263, 92–100. [Google Scholar] [CrossRef] [PubMed]
- Gold, E.; French, P. International Practices in Forensic Speaker Comparison. Int. J. Speech Lang. Law 2011, 18, 293–307. [Google Scholar] [CrossRef]
- Gold, E.; French, P. International Practices in Forensic Speaker Comparisons: Second Survey. Int. J. Speech Lang. Law 2019, 1, 1–20. [Google Scholar] [CrossRef]
- Drygajlo, A.; Haraksim, R. Biometric Evidence in Forensic Automatic Speaker Recognition. In Handbook of Biometrics for Forensic Science, Advances in Computer Vision and Pattern Recognition; Tistarelli, M., Champod, C., Eds.; Springer: Cham, Switzerland, 2017; pp. 221–239. [Google Scholar] [CrossRef]
- Morrison, G.S.; Enzinger, E.; Ramos, D.; González-Rodríguez, J.; Lozano-Díez, A. Statistical Models in Forensic Voice Comparison. In Handbook of Forensic Statistics; Banks, D.L., Kafadar, K., Kaye, D.H., Tackett, M., Eds.; Chapman and Hall/CRC: Boca Raton, USA, 2020; pp. 451–497. [Google Scholar] [CrossRef]
- Morrison, G. S.; Enzinger, E.; Hughes, V.; Jessen, M.; Meuwly, D.; Neumann, C.; Planting, S.; Thompson, W.C.; van der Vloed, D.; Ypma, R.J.F.; Zhang, C. Consensus on validation of forensic voice comparison. Sci. Justice 2021, 61, 299–309. [Google Scholar] [CrossRef] [PubMed]
- Khan, A.; Javed, A.; Malik, K.M.; Raza, M.A.; Ryan, J.; Saudagar, A.K.J.; Malik, H. Toward Realigning Automatic Speaker Verification in the Era of COVID-19. Sensors 2022, 22, 2638. [Google Scholar] [CrossRef] [PubMed]
- Al-Karawi, K.A. Face mask effects on speaker verification performance in the presence of noise. Multimed. Tools Appl. 2023, 83, 4811–4824. [Google Scholar] [CrossRef] [PubMed]
- Saeidi, R.; Niemi, T.; Karppelin, H.; Pohjalainen, J.; Kinnunen, T.; Alku, P. Speaker Recognition for Speech Under Face Cover. In Proceedings of the INTERSPEECH 2015, Annual Conference of the International Speech Communication Association, Dresden, Germany, 6-10 September 2015; pp. 1012–101. [Google Scholar] [CrossRef]
- Saeidi, R.; Huhtakallio, I.; Alku, P. Analysis of face mask effect on speaker recognition. In Proceedings of the INTERSPEECH 2016, Annual Conference of the International Speech Communication Association, San Francisco, USA, 8-12 September 2016; pp. 1800–1804. [Google Scholar] [CrossRef]
- Iszatt, T.; Malkoc, E.; Kelly, F.; Alexander, A. Exploring the impact of face coverings on x-vector speaker recognition using VOCALISE. In Proceedings of the IAFPA 2020/2021, 29th International Association of Forensic Phonetics and Acoustics, 26 July 2021, Marburg, Germany (online).
- Fecher, N. The ‘Audio-Visual Face Cover Corpus’: Investigations into audio-visual speech and speaker recognition when the speaker’s face is occluded by facewear. In Proceedings of the INTERSPEECH 2012, Annual Conference of the International Speech Communication, Portland, USA, 9-13 September 2012; pp. 2247–2250. [Google Scholar] [CrossRef]
- Bogdanel, G.; Belghazi-Mohamed, N.; Gómez-Moreno, H.; Lafuente-Arroyo, S. Study on the Effect of Face Masks on Forensic Speaker Recognition. Proceedings of Information and Communications Security: 24th International Conference, ICICS 2022, Canterbury, UK, 5-8 September 2022; pp. 608–621. [Google Scholar] [CrossRef]
- Ribeiro, V. V.; Dassie-Leite, A. P.; Pereira, E. C.; Santos, A. D. N.; Martins, P.; Irineu, R. de A. Effect of Wearing a Face Mask on Vocal Self-Perception during a Pandemic. J. Voice 2022, 36, 878.e1–878.e7. [Google Scholar] [CrossRef] [PubMed]
- Geng, P.; Lu, Q.; Guo, H.; Zeng, J. The effects of face mask on speech production and its implication for forensic speaker identification-A cross-linguistic study. PLoS One 2023, 18, e0283724. [Google Scholar] [CrossRef] [PubMed]
- Kelly, F.; Forth, O.; Kent, S.; Gerlach, L.; Alexander, A. Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors. In Proceedings of the AES 2019, International Conference on Audio Forensics, Porto, Portugal, 18-20 June 2019. [Google Scholar]
- Brummer, N.; Leeuwen van, D. On calibration of language recognition scores. In Proceedings of the IEEE Odyssey - The Speaker and Language Recognition Workshop, San Juan, PR, USA, 28-30 June 2006; pp. 1–8. [Google Scholar] [CrossRef]
- Meester, R.; Slooten, K. Probability and forensic evidence; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar] [CrossRef]



















| Countries | Recorder device(s) | Sample rate (kHz) |
|---|---|---|
| Croatia (HR) | Zoom ZDM-1 / Zoom H4n PRO | 8 |
| Georgia (KA) | Stagg MD-1500 / Philips DVT6000 | 44.1 |
| Hungary (HU) | Audio-Technica AT897 / Steinberg U28M | 8 |
| Lithuania (LT) | Marantz PMD660 | 8 |
| Portugal (PT) | Behringer B-1 / Newer NW-800 Tascam DR-40X / Tascam DR-40 |
44.1 |
| Romania (RO) | Olympus ME52W / Behringer B-1 | 8 |
| Türkiye (TR) | König K-CM700 / Shure SM48 | 8 |
| Ukraine (UA) | Zoom F1-LP | 8 |
| Age classes | [18 – 30] | [31 – 40] | [41 – 50] | [51 - +∞) | Total | ||||
|---|---|---|---|---|---|---|---|---|---|
| Languages | Male | Female | Male | Female | Male | Female | Male | Female | |
| Croatian (HR) | 9 | 10 | 11 | 10 | 10 | 10 | 10 | 10 | 80 |
| Georgian (KA) | 10 | 10 | 11 | 9 | 10 | 10 | 8 | 10 | 78 |
| Hungarian (HU) | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 80 |
| Lithuanian (LT) | 15 | 17 | 9 | 8 | 5 | 9 | 6 | 11 | 80 |
| Portuguese (PT) | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 80 |
| Romanian (RO) | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 80 |
| Turkish (TR) | 5 | 26 | 21 | 11 | 14 | 3 | 0 | 0 | 80 |
| Ukrainian (UA) | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 80 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).