ARTICLE | doi:10.20944/preprints202007.0450.v1
Subject: Mathematics & Computer Science, Computational Mathematics Keywords: Apache Spark; distributed computing; distributed matrix algebra; deep learning; matrix primitives
Online: 19 July 2020 (21:22:01 CEST)
The new barrier mode in Apache Spark allows embedding distributed deep learning training as a Spark stage to simplify the distributed training workflow. In Spark, a task in a stage doesn’t depend on any other tasks in the same stage, and hence it can be scheduled independently. However, several algorithms require more sophisticated inter-task communications, similar to the MPI paradigm. By combining distributed message passing (using asynchronous network IO), OpenJDK’s new auto-vectorization and Spark’s barrier execution mode, we can add non-map/reduce based algorithms, such as Cannon’s distributed matrix multiplication to Spark. We document an efficient distributed matrix multiplication using Cannon’s algorithm, which improves significantly on the performance of the existing MLlib implementation. Used within a barrier task, the algorithm described herein results in an up to 24% performance increase on a 10,000x10,000 square matrix with a significantly lower memory footprint. Applications of efficient matrix multiplication include, among others, accelerating the training and implementation of deep convolutional neural network based workloads, and thus such efficient algorithms can play a ground-breaking role in faster, more efficient execution of even the most complicated machine learning tasks
ARTICLE | doi:10.20944/preprints202111.0200.v1
Subject: Engineering, Biomedical & Chemical Engineering Keywords: Diabetes; Diagnosis; Machine Learning; Wireless Body Area Networks; Apache Spark; Feature Selection
Online: 10 November 2021 (09:00:39 CET)
Disease-related data and information collected by physicians, patients, and researchers seem insignificant at first glance. Still, the same unorganized data contain valuable information that is often hidden. The task of data mining techniques is to extract patterns to classify the data accurately. One of the various Data mining and its methods have been used often to diagnose various diseases. In this study, a machine learning (ML) technique based on distributed computing in the Apache Spark computing space is used to diagnose diabetics or hidden pattern of the illness to detect the disease using a large dataset in real-time. Implementation results of three ML techniques of Decision Tree (DT) technique or Random Forest (RF) or Support Vector Machine (SVM) in the Apache Spark computing environment using the Scala programming language and WEKA show that RF is more efficient and faster to diagnose diabetes in big data.
ARTICLE | doi:10.20944/preprints202209.0326.v1
Subject: Medicine & Pharmacology, Other Keywords: multidrug resistance organism; sepsis; adequate empirical antibiotics; source of infection; APACHE II; ICU length stay; predictors; risk factors; mortality
Online: 21 September 2022 (10:45:23 CEST)
Background: Multi-drug resistance organisms (MDRO) often cause increased morbidity, mortality, and length of stays (LOS). However, there is uncertainty whether the infection of MDRO increase the morbidity, mortality, and ICU-LOS. Objective: This study performed to determine the prevalence of MDRO in ICU, site of infection and the association of MDRO or site of infection with mortality. Secondary outcome was determined by ascertaining the association of MDRO or site of infection with (ICU-LOS). Methods: A retrospective cohort study was performed with adult sepsis patients in ICU. Univariate and multivariate (MVA) logistic regression with cox regression modeling were performed to compute the association of MDRO on ICU-mortality. MVA modelling was performed for ICU-LOS predictors. Results: Out of 228 patients, the isolated MDRO was 97 (42.5%) of which 78% were gram-negative bacteria. The mortality rate among those with MDRO was 85 (37.3%). The hospital acquired infection (HAI) was significantly predictor for ICU-LOS in univariate linear regression (R² = 0.034, P=0.005). In MVA linear regression, both Enterococcus faecalis infection and acinetobacter baumannii (AC) -MDRO were predictors for ICU-LOS with (R² = 0.478, P<0.05). In the univariate cox regression, only the infection with AC- MDRO was a risk factor for ICU-mortality with [ HR =1.802 (95% CI: 1.2 – 2.706; P = 0.005)]. Conclusions: Identifying risk factors for MDRO highlight the appropriate administration of empirical antibiotics and effectively control of source of infection which would reduce mortality and ICU-LOS. The usage of broad- spectrum antibiotics should be limited for those having substantial risk factors to acquire MDRO.