REVIEW | doi:10.20944/preprints202002.0239.v1
Online: 17 February 2020 (04:12:20 CET)
Machine learning (ML) has emerged as a critical tool for making sense of the growing amount of genetic and genomic data available because of its ability to find complex patterns in high dimensional and heterogeneous data. While the complexity of ML models is what makes them powerful, it also makes them difficult to interpret. Fortunately, recent efforts to develop approaches that make the inner workings of ML models understandable to humans have improved our ability to make novel biological insights using ML. Here we discuss the importance of interpretable ML, different strategies for interpreting ML models, and examples of how these strategies have been applied. Finally, we identify challenges and promising future directions for interpretable ML in genetics and genomics.
ARTICLE | doi:10.20944/preprints202206.0149.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: critical care; artificial intelligence; predictive analytics; VAP; interpretable models
Online: 10 June 2022 (04:43:07 CEST)
(1) Background: Ventilator-associated pneumonia (VAP) causes high mortality among patients with respiratory disease and imposes major burdens on healthcare infrastructure. Models that use electronic health record data to predict the onset of VAP may spur earlier treatment and improve patient outcomes. We developed and studied the performance of interpretable machine learning (ML) models that predict the onset of VAP from electronic health records (EHRs); (2) Methods: We trained Logistic Regression (LR), full feature Explainable Boosting Machine (fEBM), and eXtreme Gradient Boosting (XGBoost) ML models on data from the MIMIC- III (v1.3) database. Model performance was measured by area under the receiver operating characteristic curves (AUCs). We trained a minimal-feature EBM model (mEBM) with features derived from white blood cell (WBC) counts, duration of ventilation, and Glasgow Coma Scale (GCS). Finally, model robustness was evaluated on randomly sparsified EHR datasets; (3) Results: The fEBM model outperformed the XGBoost and LR models at 24 hours post-intubation. The mEBM model maintained an AUC of 0.893. The fEBM model performance remained robust on sparsified datasets; (4) Conclusions: Our novel interpretable ML algorithm reliably predicts the onset of VAP in intubated patients. Integration of this EBM-based model into clinical practice may enable clinicians to better anticipate and prevent VAP.
ARTICLE | doi:10.20944/preprints201908.0100.v1
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: cancer biomarker discovery; gene expression data; Ingenuity Knowledge Base (IKB); transfer learning; interpretable classification rules
Online: 8 August 2019 (05:28:26 CEST)
Background: Ongoing molecular profiling studies enabled by advances in biomedical technologies are producing vast amounts of ‘omic’ data for early detection, monitoring, and prognosis of diverse diseases. A major common limitation is the scarcity of biological samples, necessitating integrative modeling frameworks that can make optimal use of available data for disease classification tasks. Related data sets are often available from different studies, but may have been generated using different technology platforms. Thus, there is a critical need for flexible modeling methods that can handle data from diverse sources to facilitate the discovery of robust biomarkers that underlie disease regulatory processes. Results: In this paper, we introduce a novel framework called Knowledge Augmented Rule Learning (KARL), which incorporates two sources of knowledge, domain, and data, for pattern discovery from small and high-dimensional datasets, such as transcriptomic data. We propose KARL as a transfer rule learning framework in which knowledge of the domain is transferred to the learning process on data in order to 1) improve the reliability of the discovered patterns, and 2) study the knowledge of the domain when used along with data for modeling. In this work, we generated KARL models on gene expression datasets for five types of cancer, including brain, breast, colon, lung, and prostate. As our knowledge of the domain, we used the Ingenuity Knowledge Base (IKB) to extract genes related to hallmarks of cancer and annotated these prior relationships before learning classifiers from these datasets. Conclusions: Our results show that KARL produces, on average, rule models that are more robust classifiers than the baseline without such background knowledge, for our tasks of cancer prediction using 25 publicly available gene expression datasets. Moreover, KARL helped us learn insights about previously known relationships in these gene expression datasets, along with new relationships not input as known, to enable informed biomarker discovery for cancer prediction tasks. KARL can be applied to modeling similar data from any other domain and classification task. Future work would involve extensions to KARL to handle hierarchical knowledge to derive more general hypotheses to drive biomedicine.
ARTICLE | doi:10.20944/preprints202107.0040.v1
Subject: Engineering, Industrial & Manufacturing Engineering Keywords: predictive maintenance; transfer learning; interpretable machine learning
Online: 1 July 2021 (22:38:28 CEST)
Using data-driven models to solve predictive maintenance problems has been prevalent for original equipment manufacturers (OEMs). However, such models fail to solve two tasks that OEMs are interested in: (1) Making the well-built failure prediction models working on existing scenarios (vehicles, working conditions) adaptive to target scenarios. (2) Finding out the failure causes, furthermore, determining whether a model generates failure predictions based on reasonable causes. This paper investigates a comprehensive architecture towards making the predictive maintenance system adaptive and interpretable by proposing (1) an ensemble model dealing with time-series data consisting of a long short-term memory (LSTM) neural network and Gaussian threshold to achieve failure prediction one week in advance and (2) an online transfer learning algorithm and a meta learning algorithm, which render existing models adaptive to new vehicles with limited data volumes. (3) Furthermore, the Local Interpretable Model-agnostic Explanations (LIME) interpretation tool and super-feature methods are applied to interpret individual and general failure causes. Vehicle data from Isuzu Motors, Ltd., are adopted to validate our method, which include time-series data and histogram data. The proposed ensemble model yields predictions with 100% accuracy for our test data on engine stalling problem and is more rapidly adaptive to new vehicles with smaller error following application of either online transfer learning or the meta learning method. The interpretation methods help elucidate the global and individual failure causes, confirming the model credibility.
ARTICLE | doi:10.20944/preprints202012.0318.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Interpretable Artificial Intelligence; Cardiovascular disease prediction; Machine Learning in Healthcare
Online: 14 December 2020 (09:49:13 CET)
Learning systems have been very focused on creating models that are capable of obtaining the best results in error metrics. Recently, the focus has shifted to improvement in order to interpret and explain their results. The need for interpretation is greater when these models are used to support decision making. In some areas this becomes an indispensable requirement, such as in medicine. This paper focuses on the prediction of cardiovascular disease by analyzing the well-known Statlog (Heart) Data Set from the UCI’s Automated Learning Repository. This study will analyze the cost of making predictions easier to interpret by reducing the number of features that explain the classification of health status versus the cost in accuracy. It will be analyzed on a large set of classification techniques and performance metrics. Demonstrating that it is possible to make explainable and reliable models that have a good commitment to predictive performance.