Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

What All Do Audio Transformer Models Hear? Probing Acoustic Representations for Language Delivery and Its Structure

Version 1 : Received: 2 January 2021 / Approved: 5 January 2021 / Online: 5 January 2021 (11:20:22 CET)

How to cite: Kumar, Y.; Shah, J.; Shah, R.R.; Chen, C. What All Do Audio Transformer Models Hear? Probing Acoustic Representations for Language Delivery and Its Structure. Preprints 2021, 2021010081. https://doi.org/10.20944/preprints202101.0081.v1 Kumar, Y.; Shah, J.; Shah, R.R.; Chen, C. What All Do Audio Transformer Models Hear? Probing Acoustic Representations for Language Delivery and Its Structure. Preprints 2021, 2021010081. https://doi.org/10.20944/preprints202101.0081.v1

Abstract

In recent times, BERT based transformer models have become an inseparable part of the 'tech stack' of text processing models. Similar progress is being observed in the speech domain with a multitude of models observing state-of-the-art results by using audio transformer models to encode speech. This begs the question of what are these audio transformer models learning. Moreover, although the standard methodology is to choose the last layer embedding for any downstream task, but is it the optimal choice? We try to answer these questions for the two recent audio transformer models, Mockingjay and wave2vec2.0. We compare them on a comprehensive set of language delivery and structure features including audio, fluency and pronunciation features. Additionally, we probe the audio models' understanding of textual surface, syntax, and semantic features and compare them to BERT. We do this over exhaustive settings for native, non-native, synthetic, read and spontaneous speech datasets

Keywords

Transformers; wave2vec; bert; mockingjay; interpretability

Subject

Computer Science and Mathematics, Algebra and Number Theory

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.