Deep learning model for detection of hypoalbuminemia using electrocardiography

Medical research team, Medical AI, co. Seoul, South Korea; Artificial Intelligence and Big Data Research Center, Sejong Medical Research Institute, Bucheon, South Korea; Department of Critical Care and Emergency Medicine, Mediplex Sejong Hospital, Incheon, South Korea; Medical R&D center, Body friend, co. Seoul, South Korea; Division of Cardiology Cardiovascular Center, Mediplex Sejong Hospital, Incheon, South Korea; All authors take responsibility for all aspects of the reliability and freedom from bias of the data presented and their discussed interpretation.


Introduction
As albumin provides 80% of the total colloid osmotic pressure of plasma and 50% of protein content, it has a pivotal role in the maintenance of homeostasis. [1] Se rum albumin transports several different fat-soluble hormones and drugs and serves as a plasma buffer, maintaining physiological pH levels, and antioxidant, involved in the scavenging of oxygen free radicals. [2,3] Hypoalbuminemia is common in numerous diseases, including liver cirrhosis, malnutrition, nephrotic syndrome, heart failure, and s epsis. [4,5] The evaluation of the albumin level is a cornerstone for diagnosis and proper treatment. The monitoring of the albumin level is crucial for patients who have diseases that impair the retention and excretion of fluid, such as heart failure and liver cirrhosis, and patients who take medications transported by albumin, such as diuretics. [6][7][8] As the symptom of hypoalbuminemia is nonspecific, it is challenging to diagnose it with only the medical history and physical examinations until the condition is uncompensated and complications occur. [9] The gold standard for diagnosing hypoalbuminemia is a laboratory examination to measure the concentration of albumin. However, the laboratory tests are invasive, costly, and require specialized equipment and infrastructure, such as trained medical staff for sampling blood and hematology analysis devices for assessment with biochemical reagents. Detecting hypoalbuminemia in daily life is important to monitor the health status and detect deteriorating events. However, the evaluation using a laboratory exam could not be used for this purpose. Moreover, laboratory tests are too expensive to use in low-income countries for screening hypoalbuminemia.
Albumin represents a circulating endogenous reservoir of nitric oxide (NO) and acts as a donor for NO, which has diverse cardiovascular effects. [10] Albumin is associated with right ventricular free wall longitudinal strain and development and prognosis of cardiovascular diseases, which can be correlated with electrocardiography (ECG). [11][12][13][14][15][16]  However, it is not simple to provide diagnostic tools using such subtle ECG changes based on conventional statistical methods. Recent studies have shown that deep learning models using ECG could detect diverse diseases, including anemia, valvular heart disease, heart failure, myocardial infarction, and electrolyte imbalances. [17][18][19][20][21][22][23][24][25] In this study, we developed and validated a deep-learning-based model (DLM) to detect hypoalbuminemia using ECG.

Study design and population
We carried out a retrospective multi-center diagnostic study in which a DLM was developed using ECGs, and then internally and externally validated. We excluded individuals with missing demographic, electrocardiographic, and albumin laboratory exam information. the index ECG. As the purpose of the validation data was to assess the accuracy of the DLM, we used only one ECG from each patient for the internal and external validation datasets, the time-closest to the albumin laboratory exam in the study period.
This study was approved by the institutional review boards of SGH and MSH.
Clinical data, including digitally stored ECGs, albumin laboratory exam values, age, and sex, were obtained from both hospitals. Both institutional review boards waived the need for informed consent because of the retrospective nature of the study using fully anonymized ECG and health data and minimal harm.

Procedures
The predictor variables were ECG, age, and sex. Digitally stored 12-lead ECG data, 5000 numbers for each lead, were recorded over 10 s (500 Hz). We removed 1-s intervals at both beginning and end of each ECG because of the more artifacts than in other parts. Thus, the length of each ECG was 8 s (4000 numbers). We created a dataset using the entire 12-lead ECG data. We also used partial datasets from the 12-lead ECG data, such as six-limb-lead and single-lead datasets (lead I). We selected the sets of leads because the data can easily be recorded by wearable and pad devices in contact with the hands and legs. Consequently, when we developed and validated the DLM using 12-lead ECGs, we used a dataset of twodimensional (2D) data of 12 × 4000 numbers. When we developed and validated an algorithm using six-lead ECGs, we used datasets of 6 × 4000 numbers, while for the singlelead ECGs, we used datasets of 1 × 4000 numbers. The endpoint of this research was hypoalbuminemia, defined by serum albumin concentration < 3.5 g/dL.
As shown in Supplementary material 2, the DLM was developed using several hidden layers of neurons to learn complex hierarchical nonlinear representations from the data. A residual block with five stages included two convolution layers, two batch normalizations, one max-pooling, and one dropout layer (repeated), as shown in Supplementary material 2. We used 6 1 × 4 max-pooling layers between blocks 1 and 4 and 2 × 4 max-pooling layers between blocks 4 and 5. The last convolutional layer of the residual block was connected to a flattened layer, which was fully connected to a one-dimensional (1D) layer composed of 128 nodes. The input layer of epidemiology data (age and sex) was concatenated with the 1D layer. Two fully connected 1D layers were connected to the output node, which was composed of one node. The output node used a softmax function as an activation function because the output of the softmax function was between 0 and 1. The architecture of the DLM was evaluated and verified using a grid search. We developed an additional DLM using six-limb-lead and single-lead ECGs.

Statistical analysis
Continuous variables are presented as mean values (standard deviations (SDs)) and source software library as the backend and Python (version 3.6) for the analysis.

Visualization of the developed XDM for interpretation
To understand the developed model and compare it to existing medical knowledge, it was necessary to identify a region that had a significant effect on the decision of the developed DLM. We employed a sensitivity map using a saliency method. [27,28] The map was computed using the first-order gradients of the classifier probabilities with respect to the input signals. If the probability of the classifier was sensitive to a specific region of the signal, the region would be considered significant in the model. In other words, we verified the region of the ECG that was associated with hypoalbuminemia using a sensitivity map. We used a gradient class activation map as a sensitivity map and guided the gradient backpropagation method. We verified the variable importance values of ECG features, age, and sex in logistic regression, random forest, and deep learning using the deviance difference, mean decreased Gini, and relative importance based on the Garson's algorithm, respectively. [29] Verification of the DLM performance to predict the hypoalbuminemia by a subgroup analysis We hypothesized that the ECGs would display subtle abnormal patterns in the prehypoalbuminemia phase and that the developed DLM would classify certain cases as abnormal, yielding a false positive (a study subject classified as having hypoalbuminemia but considered as non-hypoalbuminemia) as the initial result. We carried out a subgroup analysis of patients who underwent follow-up laboratory examinations in the internal and external validation datasets. The difference in data between the initial and follow-up echocardiography data was over 14 d. Among those patients, we verified the development of hypoalbuminemia in patients who were initially considered non-hypoalbuminemia patients, whose serum albumin concentration was 3.5 g/dL or higher. The DLM data were categorized into high-and low-risk groups based on the risk score using cutoff values, which were determined using the Youden's J statistic with the development dataset. [26] We used the Kaplan-Meier method to analyze the hypoalbuminemia development over 24 months.

Results
The

Discussion
We developed and validated the DLM based on an ensemble network for hypoalbuminemia detection using 12-, six-, and single-lead ECGs and demonstrated a reasonable performance. Subsequently, we visualized our DLM to determine the regions and characteristics of the ECG that were used for hypoalbuminemia detection and verified the important variable for the decision in diverse statistical methods, such as logistic regression, random forest, and DLM. We carried out a subgroup analysis for non-hypoalbuminemia (normal) patients at the initial laboratory examination. The DLM could predict the development of hypoalbuminemia. To the best of our knowledge, this study is the first that develops a DLM for detection and prediction of hypoalbuminemia using ECG and demonstrates interpretable patterns of decision making using the DLM.
The development of a reliable tool for detection and prediction of hypoalbuminemia is the cornerstone for the monitoring of the albumin status and early management to prevent an irreversible disease progression. [6][7][8] Hypoalbuminemia is also associated with the nutritional status of patients. As one of the pathophysiologies of hypoalbuminemia is associated with an increase in capillary permeability and altered kinetics of serum albumin in inflammatory states, hypoalbuminemia is a reflection of the extent of physiologic stress from disease-and trauma-related inflammations. [30] Hypoalbuminemia is associated with liver diseases, kidney diseases, and malnutrition or malabsorption based on albumin synthesis, loss, and intake, respectively. [4,5] Therefore, the detection of hypoalbuminemia is important to not only monitor nutritional and homeostasis general conditions, but also early-diagnose other diverse diseases using hypoalbuminemia as a surrogate factor and detect deterioration of the patient. [11][12][13][14][15][16] Although laboratory tests for albumin are diagnostic tests for hypoalbuminemia, they require blood sampling and infrastructure for a blood analysis. Therefore, the detection of hypoalbuminemia based on laboratory tests could not be used in daily life and low-income countries. As stated above, albumin is an endogenous reservoir of NO and acts as a donor for NO. [10]The cardioprotective roles of NO include regulation of blood pressure and vascular tone, inhibition of platelet aggregation and leukocyte adhesion, and prevention of smooth muscle cell proliferation. Reduced bioavailability of NO is considered one of the central factors common in endothelial dysfunction. The underlying pathology for most cardiovascular diseases is atherosclerosis, which is associated with endothelial dysfunction. [10] The albumin level has been significantly negatively correlated with right ventricular free wall longitudinal strain. [11] Albumin is a strong prognostic factor for cardiovascular diseases. As the albumin level affects diverse cardiovascular statuses, we hypothesized that we could detect hypoalbuminemia based on ECG. [11][12][13][14][15][16] The most important aspect of deep learning is its ability to extract features and develop an algorithm using various types of data, such as images, 2D data, and waveforms. [31] Attia et al. and our study group developed a DLM to screen for heart failure, arrhythmia, valvular heart disease, left ventricular hypertrophy, electrolyte imbalance, and anemia. [17][18][19][20][21][22][23][24][25] Deep learning is criticized for its unreliable outcomes because of the low transparency of the process, so-called black box. Therefore, we used a sensitivity map to describe the abnormal findings that affected the decision of the DLM for the detection of RI and description of the variable importance of ECG features. Using this method, we verified the ECG region and features associated with hypoalbuminemia. Conventional methods are based on hypotheses of researchers. For deep learning methods, such as the DLM and sensitivity mapping in this study, the findings are not based on previous medical knowledge of humans, but on data. Therefore, we could provide new knowledge only by the data, without human prejudice. Deep learning could discover the complex hierarchical nonlinear representation that could not be discovered using conventional statistical methods, such as logistic regression. In this study, we verified the important ECG region for the detection of hypoalbuminemia from waveform data. We verified that hypoalbuminemia could be detected and predicted using ECG based on the DLM. showed that the amplitude of the QRS complex was associated with albumin levels. [32,33] Toma et al. and Wu et al. showed that prolonged QT intervals and T accentuated deceleration of the T wave correlated with hypoalbuminemia. [13,34] In this study, the DLM could predict the development of hypoalbuminemia. The DLM may detect structural and physiological changes that affect the vulnerability of hypoalbuminemia. These findings suggest that the DLM could be used to not only diagnose hypoalbuminemia but also screen patients who have characteristics of hypoalbuminemia risk.
The reliable performances of the six-and single-lead ECG-based DLMs indicate that hypoalbuminemia could be screened with both conventional 12-lead ECG and life-type or   Legend: none