Machine learning model for diagnosing the stage of liver fibrosis in patients with chronic viral hepatitis C

Aim. The purpose of the work was the development of a machine learning model for diagnosing the stage of liver fibrosis in patients with chronic viral hepatitis C according to the data of routine clinical examination. Materials and methods. A total of 1240 patients with chronic viral hepatitis C was examined. A set of data obtained from 689 patients balancing by the stage of liver fibrosis was used for developing and testing machine learning models. 9 routine clinical parameters were selected as the most important predictors for determining the likelihood of liver fibrosis the 3–4 stages presence: age, height, weight and body mass index of the patient, the number of platelets in the clinical blood test, levels of alanine transaminase, aspartate transaminase, gamma-glutamyltransferase, and total bilirubin in a biochemical blood test. Results. The accuracy of the developed method for determining the 3–4 stages of liver fibrosis in patients with chronic viral hepatitis C in comparison with the «gold standard» of diagnosis (liver biopsy) was 80.56% (95% CI: 69.53–88.94%), sensitivity — 66.67%, specificity — 94.44%. Conclusion. The developed method is an alternative to more expensive and geographically inaccessible studies. The method does not require the purchase of additional equipment or software, as well as additional laboratory tests, when used in real clinical practice. The introduction of the method into clinical practice can help to solve the problem of low material and territorial availability of diagnostic tests and allow determining the stage of liver fibrosis in patients with chronic viral hepatitis C. Spec — specificity; PPV — positive predictive value; NPV — negative predictive value; Acc —


Introduction
There are 71 million people in the world infected with the hepatitis C virus (HCV), which is approximately 1% of the world's population [1]. At present, it is known that chronic infection develops in 60-85% of cases after acute viral hepatitis C [2]. Patients with chronic viral hepatitis C (CHC) are at the risk of developing life-threatening conditions such as liver cirrhosis and hepatocellular carcinoma [1,3]. Currently, there is increasing world mortality associated with CHC because of the growth of the population of longterm ill patients with severe liver fibrosis [4].
According to the experts from the World Health Organization (WHO), in 2016 approximately 399 thousand people died from hepatitis C in the world, mainly as a result of the development of liver cirrhosis and hepatocellular carcinoma [1].
Liver damage develops in the absolute majority of the patients with CHC and is an independent risk of premature death of the patients (RR 12.48; 95% CI 9.34-16.66) [5]. Liver cirrhosis and hepatocellular carcinoma are the leading causes of CHC patient's deaths [1,2].
In this regard, in the absolute majority of cases, the stage of liver fibrosis is the main indicator of the progression of this disease, which determines the further tactics of patient management and the priorities of therapy [6]. Thus, the material and territorial availability of methods for diagnosing the stage of liver fibrosis is a prerequisite for improving the quality of medical care for the patients with CHC, as well as reducing the mortality and mortality rate associated with this disease.
The «gold standard» for determining the stage of liver fibrosis of the patients with CHC is liver biopsy. However, due to its inherent drawbacks, less invasive and faster methods such as transient elastography (Fibroscan®), B-mode ultrasound and Fibrotest® are mostly commonly used in real clinical practice. In most cases, such methods allow determining the stage of liver fibrosis and refuse to perform a liver biopsy, but the accuracy of such methods is lower.
More accessible diagnostic methods are calculated indices APRI, FIB-4, MDA, FORNS and others, based on the determination of indirect routine clinical and laboratory markers. Such indices are calculated using simple mathematical formulas developed with the help of algorithms for classical statistical data analysis. The APRI and FIB-4 indices are recommended by the experts of the European Professional Association for the Study of Liver Diseases (EASL) to determine the 4 stage of liver fibrosis of the patients with CHC in routine clinical practice [6,8]. Meanwhile, in terms of their accuracy, sensitivity and specificity such methods are inferior to invasive and instrumental analogues [9][10][11][12][13][14][15][16].
Mathematical models developed with modern machine learning algorithms are generally more efficient than simple computed indices. The material and territorial availability of the methods based on the use of machine learning models, as in cases of simple calculated indices, can be achieved through application of a limited set of indirect routine clinical and laboratory markers. In its turn, increasing the information content of such methods in relation to the indicators of the information content of the APRI and FIB-4 tests can be achieved by using modern machine learning algorithms, analyzing and taking into account complex relationships between all predictors used.
The aim of this study was to develop a machine learning model for determining the 3-4 stage of liver fibrosis of the patients with CHC according to the data of routine clinical examination. -Barr viral infections, hemochromatosis, autoimmune hepatitis, alcoholic and non-alcoholic fatty liver disease, biliary tract disease and toxic hepatitis. Also, among the patients whose information was entered into the database, there were no patients with hepatocellular carcinoma and those who abused alcohol, injecting drugs or narcotic drugs taken by inhalation during the last 6 months.

Materials and methods
The following information was entered into the study database: gender, age, height and weight of the patient, levels of platelets, hemoglobin, erythrocytes, leukocytes, absolute content of neutrophils and lymphocytes in a clinical blood test, erythrocyte sedimentation rate (ESR), levels of alanine transaminase (ALT), aspartate transaminase (AST), gamma-glutamyltransferase (GGT), total bilirubin, total protein, albumin, creatinine, amylase, triglycerides, cholesterol, alkaline phosphatase (ALP), iron and glucose in a biochemical blood test, as well as prothrombin index (PTI), genotype HCV, viral load and liver fibrosis stage on the Metavir scale.
Exclusion of data obtained from the patients whose primary medical records contained information about less than 80% of the parameters evaluated in the study (    Only the data of the «training sample» were used with 5-fold crossvalidation at the stage Mathematical and statistical processing of the obtained data was carried out using the software environment for statistical analysis «R». To balance the data and split it into groups, were calculated using the corresponding functions from the packages «caret» and «ROCR».
Gradient boosting machine learning models were developed in the «R» statistical analysis software using the classic «gbm» package. The exponential function AdaBoost was used as the loss function. The «scale» function was used to normalize the data in the range from 0 to 1.

Results
Using machine learning algorithms, we have developed several models to determine the likelihood of the presence of the 3-4 stages of liver fibrosis in patients with CHC (Table 2).

Discussion
The Despite the fact that our study did not use unrepresentative and unbalanced samples, did not unreasonably exclude data, and used two independent test samples to estimate the accuracy, the developed machine learning model requires external verification. It also seems appropriate to conduct further research to assess the clinical and economic efficiency of the method.

Funding:
The study was carried out as part of a research work on a state order (НИОКТР АААА-А18-118022790087-7).
Data sharing: The datasets analyzed during this study are available from the authors on reasonable request.
Conflicts of Interest: All authors declared no conflict of interest.