ARTICLE | doi:10.20944/preprints202101.0081.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Transformers; wave2vec; bert; mockingjay; interpretability
Online: 5 January 2021 (11:20:22 CET)
In recent times, BERT based transformer models have become an inseparable part of the 'tech stack' of text processing models. Similar progress is being observed in the speech domain with a multitude of models observing state-of-the-art results by using audio transformer models to encode speech. This begs the question of what are these audio transformer models learning. Moreover, although the standard methodology is to choose the last layer embedding for any downstream task, but is it the optimal choice? We try to answer these questions for the two recent audio transformer models, Mockingjay and wave2vec2.0. We compare them on a comprehensive set of language delivery and structure features including audio, fluency and pronunciation features. Additionally, we probe the audio models' understanding of textual surface, syntax, and semantic features and compare them to BERT. We do this over exhaustive settings for native, non-native, synthetic, read and spontaneous speech datasets
TECHNICAL NOTE | doi:10.20944/preprints202101.0053.v1
Subject: Earth Sciences, Other Keywords: climate; disasters; interpretability; relief; satellite imagery
Online: 4 January 2021 (15:58:21 CET)
Natural disasters ravage the world's cities, valleys, and shores on a monthly basis. Having precise and efficient mechanisms for assessing infrastructure damage is essential to channel resources and minimize the loss of life. Using a dataset that includes labeled pre- and post- disaster satellite imagery, we train multiple convolutional neural networks to assess building damage on a per-building basis. In order to investigate how to best classify building damage, we present a highly interpretable deep-learning methodology that seeks to explicitly convey the most useful information required to train an accurate classification model. We also delve into which loss functions best optimize these models. Our findings include that ordinal-cross entropy loss is the most optimal loss function to use and that including the type of disaster that caused the damage in combination with a pre- and post-disaster image best predicts the level of damage caused. Our research seeks to computationally contribute to aiding in this ongoing and growing humanitarian crisis, heightened by climate change.
ARTICLE | doi:10.20944/preprints202104.0395.v1
Subject: Life Sciences, Biochemistry Keywords: Deep-Learning; Interpretability; Omics; Biophysics; Drug Synergy; Cancer
Online: 14 April 2021 (17:44:40 CEST)
High-throughput screening technologies continues to produce large amounts of multiomics data from different populations and cell types for various diseases, such as cancer. However, analysis of such data encounters difficulties due to cancer heterogeneity, further exacerbated by human biological complexity and genomic variability. There is a need to redefine the drug discovery development pipeline, bringing an Artificial Intelligence (AI)-powered informational view that integrates relevant biological information and explores new ways to develop effective anticancer approaches. Here, we show SynPred, an interdisciplinary approach that leverages specifically designed ensembles of AI-algorithms, links omics and biophysical traits to predict synergistic anticancer drug synergy. SynPred exhibits state-of-the-art performance metrics: accuracy – 0.85, precision – 0.77, recall – 0.75, AUROC – 0.82, and F1-score - 0.76 in an independent test set. Moreover, data interpretability was achieved by deploying the most current and robust feature importance approaches. A simple web-based application available online at http://www.moreiralab.com/resources/synpred/ was constructed to predict synergistic anticancer drug combinations requiring only the upload of the two drug SMILES to be tested, allowing easy access by non-expert researchers.
ARTICLE | doi:10.20944/preprints202101.0302.v1
Online: 15 January 2021 (16:01:11 CET)
Sarcasm is a linguistic expression often used to communicate the opposite of what is said, usually something that is very unpleasant with an intention to insult or ridicule. Inherent ambiguity in sarcastic expressions, make sarcasm detection very difficult. In this work, we focus on detecting sarcasm in textual conversations from various social networking platforms and online media. To this end, we develop an interpretable deep learning model using multi-head self-attention and gated recurrent units. Multi-head self-attention module aids in identifying crucial sarcastic cue-words from the input, and the recurrent units learn long-range dependencies between these cue-words to better classify the input text. We show the effectiveness of our approach by achieving state-of-the-art results on multiple datasets from social networking platforms and online media. Models trained using our proposed approach are easily interpretable and enable identifying sarcastic cues in the input text which contribute to the final classification score. We visualize the learned attention weights on few sample input texts to showcase the effectiveness and interpretability of our model.
ARTICLE | doi:10.20944/preprints202201.0072.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Deep Learning; Black-box; Interpretability; Explainability; Model introspection; MRA segmentation
Online: 6 January 2022 (10:39:37 CET)
Clinicians are often very sceptical about applying automatic image processing approaches, especially deep learning based methods, in practice. One main reason for this is the black-box nature of these approaches and the inherent problem of missing insights of the automatically derived decisions. In order to increase trust in these methods, this paper presents approaches that help to interpret and explain the results of deep learning algorithms by depicting the anatomical areas which influence the decision of the algorithm most. Moreover, this research presents a unified framework, TorchEsegeta, for applying various interpretability and explainability techniques for deep learning models and generate visual interpretations and explanations for clinicians to corroborate their clinical findings. In addition, this will aid in gaining confidence in such methods. The framework builds on existing interpretability and explainability techniques that are currently focusing on classification models, extending them to segmentation tasks. In addition, these methods have been adapted to 3D models for volumetric analysis. The proposed framework provides methods to quantitatively compare visual explanations using infidelity and sensitivity metrics. This framework can be used by data scientists to perform post-hoc interpretations and explanations of their models, develop more explainable tools and present the findings to clinicians to increase their faith in such models. The proposed framework was evaluated based on a use case scenario of vessel segmentation models trained on Time-of-fight (TOF) Magnetic Resonance Angiogram (MRA) images of the human brain. Quantitative and qualitative results of a comparative study of different models and interpretability methods are presented. Furthermore, this paper provides an extensive overview of several existing interpretability and explainability methods.
ARTICLE | doi:10.20944/preprints202212.0051.v1
Subject: Medicine & Pharmacology, Clinical Neurology Keywords: XAI; segmentation; detection; aspiration; glottis; vocal cords; endoscopy; FEES; interpretability; meaningful sequences; key frames
Online: 2 December 2022 (12:22:54 CET)
Disorders of swallowing often lead to pneumonia when material enters the airways (aspiration). Flexible Endoscopic Evaluation of Swallowing (FEES) plays a key role in the diagnostics of aspiration but is prone to human errors. An AI-based tool could facilitate this process. Recent non-endoscopic/non-radiologic attempts to detect aspiration using machine-learning approaches have led to unsatisfying accuracy and show black box characteristics. Hence, for clinical users it is hard to trust in these model decisions. Our aim is to introduce an explainable artificial intelligence (XAI) approach to detect aspiration in FEES. Our approach is to teach the AI about the relevant anatomical structures like the vocal cords and the glottis based on 92 annotated FEES videos. Simultaneously, it is trained to detect bolus that passes the glottis and becomes aspirated. During testing, the AI successfully recognized glottis and vocal cords, but could not yet achieve satisfying aspiration detection quality. Albeit detection performance has to be optimized, our architecture results in a final model that explains its assessment by locating meaningful frames with relevant aspiration events and by highlighting the suspected bolus. In contrast to comparable AI tools, our framework is verifiable, interpretable and therefor accountable for clinical users.
ARTICLE | doi:10.20944/preprints202212.0062.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: graph neural network; motif-based representation; molecular property prediction; graph matching; interpretability; GPU-enabled accelerating.
Online: 5 December 2022 (06:57:41 CET)
This work considers the task of representation learning on the attributed relational graph (ARG). Both the nodes and edges in an ARG are associated with attributes/features allowing ARGs to encode rich structural information widely observed in real applications. Existing graph neural networks offer limited ability to capture complex interactions within local structural contexts, which hinders them from taking advantage of the expression power of ARGs. We propose Motif Convolution Module (MCM), a new motif-based graph representation learning technique to better utilize local structural information. The ability to handle continuous edge and node features is one of MCM’s advantages over existing motif-based models. MCM builds a motif vocabulary in an unsupervised way and deploys a novel motif convolution operation to extract the local structural context of individual nodes, which is then used to learn higher-level node representations via multilayer perceptron and/or message passing in graph neural networks. When compared with other graph learning approaches to classifying synthetic graphs, our approach is substantially better in capturing structural context. We also demonstrate the performance and explainability advantages of our approach by applying it to several molecular benchmarks.