REVIEW | doi:10.20944/preprints202002.0239.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: interpretable machine learning; deep learning; predictive biology
Online: 17 February 2020 (04:12:20 CET)
Machine learning (ML) has emerged as a critical tool for making sense of the growing amount of genetic and genomic data available because of its ability to find complex patterns in high dimensional and heterogeneous data. While the complexity of ML models is what makes them powerful, it also makes them difficult to interpret. Fortunately, recent efforts to develop approaches that make the inner workings of ML models understandable to humans have improved our ability to make novel biological insights using ML. Here we discuss the importance of interpretable ML, different strategies for interpreting ML models, and examples of how these strategies have been applied. Finally, we identify challenges and promising future directions for interpretable ML in genetics and genomics.
ARTICLE | doi:10.20944/preprints202206.0149.v1
Subject: Medicine And Pharmacology, Pulmonary And Respiratory Medicine Keywords: critical care; artificial intelligence; predictive analytics; VAP; interpretable models
Online: 10 June 2022 (04:43:07 CEST)
(1) Background: Ventilator-associated pneumonia (VAP) causes high mortality among patients with respiratory disease and imposes major burdens on healthcare infrastructure. Models that use electronic health record data to predict the onset of VAP may spur earlier treatment and improve patient outcomes. We developed and studied the performance of interpretable machine learning (ML) models that predict the onset of VAP from electronic health records (EHRs); (2) Methods: We trained Logistic Regression (LR), full feature Explainable Boosting Machine (fEBM), and eXtreme Gradient Boosting (XGBoost) ML models on data from the MIMIC- III (v1.3) database. Model performance was measured by area under the receiver operating characteristic curves (AUCs). We trained a minimal-feature EBM model (mEBM) with features derived from white blood cell (WBC) counts, duration of ventilation, and Glasgow Coma Scale (GCS). Finally, model robustness was evaluated on randomly sparsified EHR datasets; (3) Results: The fEBM model outperformed the XGBoost and LR models at 24 hours post-intubation. The mEBM model maintained an AUC of 0.893. The fEBM model performance remained robust on sparsified datasets; (4) Conclusions: Our novel interpretable ML algorithm reliably predicts the onset of VAP in intubated patients. Integration of this EBM-based model into clinical practice may enable clinicians to better anticipate and prevent VAP.
ARTICLE | doi:10.20944/preprints202305.0737.v1
Subject: Engineering, Civil Engineering Keywords: seismic sequence; interpretable machine learning; successive earthquakes; seismic dama-ge prediction; seismic damage accumulation; machine learning; explainable machine learning
Online: 10 May 2023 (10:35:55 CEST)
This study investigates the interpretability of machine learning (ML) models applied to cumulative damage prediction during a sequence of earthquakes, emphasizing the use of techniques such as SHapley Additive exPlanations (SHAP), Partial Dependence Plots (PDPs), Local Interpretable Model-agnostic Explanations (LIME), Accumulated Local Effects (ALE), Permutation and Impurity-based technique. The research explores the cumulative damage during seismic sequences, aiming to identify critical predictors and assess their influence on the cumulative damage. Moreover, the predictors contribution in respect with the range of final damage is evaluated. Nonlinear time history analyses are applied to extract the seismic response of an eight-story Reinforced Concrete (RC) frame. The regression problem’s input variables are divided into two distinct physical classes: pre-existing damage from the initial seismic event and seismic parameters representing the intensity of the subsequent earthquake, expressed by Park and Ang damage index (DIPA) and Intensity Measures (IMs), respectively. The study offers a comprehensive review of cutting-edge ML methods, hyperparameter tuning, and ML method comparisons. A LightGBM model emerges as the most efficient, among 15 different ML methods examined, with critical predictors for final damage being the initial damage caused by the first shock and the IMs of the subsequent shock: IFVF and SIH. The importance of these predictors is supported by feature importance analysis and local/global explanation methods, enhancing the interpretability and practical utility of the developed model.
ARTICLE | doi:10.20944/preprints201908.0100.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: cancer biomarker discovery; gene expression data; Ingenuity Knowledge Base (IKB); transfer learning; interpretable classification rules
Online: 8 August 2019 (05:28:26 CEST)
Background: Ongoing molecular profiling studies enabled by advances in biomedical technologies are producing vast amounts of ‘omic’ data for early detection, monitoring, and prognosis of diverse diseases. A major common limitation is the scarcity of biological samples, necessitating integrative modeling frameworks that can make optimal use of available data for disease classification tasks. Related data sets are often available from different studies, but may have been generated using different technology platforms. Thus, there is a critical need for flexible modeling methods that can handle data from diverse sources to facilitate the discovery of robust biomarkers that underlie disease regulatory processes. Results: In this paper, we introduce a novel framework called Knowledge Augmented Rule Learning (KARL), which incorporates two sources of knowledge, domain, and data, for pattern discovery from small and high-dimensional datasets, such as transcriptomic data. We propose KARL as a transfer rule learning framework in which knowledge of the domain is transferred to the learning process on data in order to 1) improve the reliability of the discovered patterns, and 2) study the knowledge of the domain when used along with data for modeling. In this work, we generated KARL models on gene expression datasets for five types of cancer, including brain, breast, colon, lung, and prostate. As our knowledge of the domain, we used the Ingenuity Knowledge Base (IKB) to extract genes related to hallmarks of cancer and annotated these prior relationships before learning classifiers from these datasets. Conclusions: Our results show that KARL produces, on average, rule models that are more robust classifiers than the baseline without such background knowledge, for our tasks of cancer prediction using 25 publicly available gene expression datasets. Moreover, KARL helped us learn insights about previously known relationships in these gene expression datasets, along with new relationships not input as known, to enable informed biomarker discovery for cancer prediction tasks. KARL can be applied to modeling similar data from any other domain and classification task. Future work would involve extensions to KARL to handle hierarchical knowledge to derive more general hypotheses to drive biomedicine.
REVIEW | doi:10.20944/preprints202309.0581.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: artificial intelligence; medicine; explainable AI; interpretable AI
Online: 8 September 2023 (09:53:03 CEST)
Due to the success of artificial intelligence (AI) applications in the medical field over the past decade, concerns about the explainability of these systems have increased. The reliability requirements of black-box led algorithms for making decisions affecting patients pose a challenge even beyond their accuracy. Recent advances in AI increasingly underscore the need to incorporate explainability into these systems. While most traditional AI methods and expert systems are inherently interpretable, recent literature has focused primarily on explainability techniques for more complex models such as deep learning. This scoping review analyzes the existing literature on explainability and interpretability of AI methods in the medical and clinical field, providing an overview of past and current research trends, and limitations that might impede the development of Explainable Artificial Intelligence (XAI) in medicine, challenges, and possible research directions. In addition, this review discusses possible alternatives for leveraging medical knowledge to improve interpretability in clinical settings, while taking into account the needs of users.
ARTICLE | doi:10.20944/preprints202107.0040.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: predictive maintenance; transfer learning; interpretable machine learning
Online: 1 July 2021 (22:38:28 CEST)
Using data-driven models to solve predictive maintenance problems has been prevalent for original equipment manufacturers (OEMs). However, such models fail to solve two tasks that OEMs are interested in: (1) Making the well-built failure prediction models working on existing scenarios (vehicles, working conditions) adaptive to target scenarios. (2) Finding out the failure causes, furthermore, determining whether a model generates failure predictions based on reasonable causes. This paper investigates a comprehensive architecture towards making the predictive maintenance system adaptive and interpretable by proposing (1) an ensemble model dealing with time-series data consisting of a long short-term memory (LSTM) neural network and Gaussian threshold to achieve failure prediction one week in advance and (2) an online transfer learning algorithm and a meta learning algorithm, which render existing models adaptive to new vehicles with limited data volumes. (3) Furthermore, the Local Interpretable Model-agnostic Explanations (LIME) interpretation tool and super-feature methods are applied to interpret individual and general failure causes. Vehicle data from Isuzu Motors, Ltd., are adopted to validate our method, which include time-series data and histogram data. The proposed ensemble model yields predictions with 100% accuracy for our test data on engine stalling problem and is more rapidly adaptive to new vehicles with smaller error following application of either online transfer learning or the meta learning method. The interpretation methods help elucidate the global and individual failure causes, confirming the model credibility.
ARTICLE | doi:10.20944/preprints202310.0511.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: interpretable solving; GCN; retinaNet; formal language; diagram parsing
Online: 10 October 2023 (03:02:38 CEST)
This paper proposes an interpretable geometry solution based on the formal language set of text and diagram. Geometry problems are solved by machines, which still poses challenges in natural language processing and computer vision. Significant progress promotes existing methods in the extraction of geometric formal languages. However, the neglect of the graph structure information in the formal language and the lack of further refinement of the extracted language set can lead to the poor effect of the theorem prediction and affect the accuracy in problem solving. In this paper, the formal language graph is constructed by the extracted formal language set and used to theorem prediction by graph convolutional network. So as to better extract the relationship set of diagram elements, an improved diagram parser was proposed. The test results indicate that the improved method has good results in solving the problem with interpretability geometry.
ARTICLE | doi:10.20944/preprints202311.1540.v1
Subject: Social Sciences, Psychology Keywords: Phishing Susceptibility; Cyber Security; Interpretable Artificial Intelligence; Machine Learning
Online: 24 November 2023 (02:42:26 CET)
As artificial intelligence continues to advance, researchers are increasingly using machine learning algorithms to study the factors that make people more susceptible to phishing scams. Most studies in this area have taken one of two approaches: either they explore statistical associations between various factors and susceptibility, or they use complex models such as deep neural networks to predict phishing behavior. However, these approaches have limitations in terms of providing practical insights for individuals to avoid future phishing attacks and delivering personalized explanations regarding their susceptibility to phishing. In this paper, we propose a machine learning approach that leverages explainable artificial intelligence techniques to examine the influence of human and demographic factors on susceptibility to phishing attacks. Our analysis reveals that psychological factors such as impulsivity and conscientiousness, as well as appropriate online security habits, significantly affect an individual's susceptibility to phishing attacks. Furthermore, our individualized case-by-case approach offers personalized recommendations on mitigating the risk of falling prey to phishing exploits, considering the specific circumstances of each individual.
ARTICLE | doi:10.20944/preprints202012.0318.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Interpretable Artificial Intelligence; Cardiovascular disease prediction; Machine Learning in Healthcare
Online: 14 December 2020 (09:49:13 CET)
Learning systems have been very focused on creating models that are capable of obtaining the best results in error metrics. Recently, the focus has shifted to improvement in order to interpret and explain their results. The need for interpretation is greater when these models are used to support decision making. In some areas this becomes an indispensable requirement, such as in medicine. This paper focuses on the prediction of cardiovascular disease by analyzing the well-known Statlog (Heart) Data Set from the UCI’s Automated Learning Repository. This study will analyze the cost of making predictions easier to interpret by reducing the number of features that explain the classification of health status versus the cost in accuracy. It will be analyzed on a large set of classification techniques and performance metrics. Demonstrating that it is possible to make explainable and reliable models that have a good commitment to predictive performance.
ARTICLE | doi:10.20944/preprints202311.0161.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: deep learning; skin cancer; image augmentation; GAN; geometric augmentation; image classification; interpretable technique
Online: 2 November 2023 (10:52:57 CET)
This research paper presents a deep learning approach to early detection of skin cancer using image augmentation techniques. The authors propose a two-stage image augmentation technique that involves the use of geometric augmentation and generative adversarial network (GAN) to classify skin lesions as either benign or malignant. This research utilized the public HAM10000 dataset to test the proposed model. Several pre-trained models of CNN were employed, namely Xception, Inceptionv3, Resnet152v2, EfficientnetB7, InceptionresnetV2, and VGG19. Our approach achieved accuracy, precision, recall, and F1-score of 96.90%, 97.07%, 96.87%, 96.97%, respectively, which is higher than the performance achieved by other state-of-the-art methods. The paper also discusses the use of SHapley Additive exPlanations (SHAP), an interpretable technique for skin cancer diagnosis, which can help clinicians understand the reasoning behind the diagnosis and improve trust in the system. Overall, the proposed method presents a promising approach to automated skin cancer detection that could improve patient outcomes and reduce healthcare costs.
ARTICLE | doi:10.20944/preprints202302.0117.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Synthetic categorical data generation; generative adversarial networks; imbalance learning; CTGAN; interpretable machine learning; cardiovascular disease
Online: 7 February 2023 (03:43:16 CET)
Machine Learning (ML) methods have become important to enhance the performance of decision-support predictive models. However, class imbalance is one of the main challenges for developing ML models, because it limits the generalization of these models, and biases the learning algorithms. In this paper, we consider oversampling methods for generating synthetic categorical clinical data aiming to improve the predictive performance in ML models, and the identification of risk factors for cardiovascular diseases (CVDs). We performed a comparative study of several categorical synthetic data generation methods, including Generative Adversarial Networks (GANs). Then, we assessed the impact of combining oversampling strategies and linear and nonlinear supervised ML methods. Lastly, we conducted a post-hoc model interpretability based on the importance of the risk factors. Experimental results show the potential of GAN-based models for generating high-quality categorical synthetic data, yielding probability mass functions that are highly close to real data, maintaining relevant insights, and contributing to increase the predictive performance. The GAN-based model and a linear classifier outperforms other oversampling techniques, improving 2\% the area under the curve. These results demonstrate the capability of synthetic data to help both in determining risk factors and building models for CVD prediction.