Early Fetal Weight Estimation with Expectation Maximization Algorithm

Fetal weight estimation before delivery is important in obstetrics, which assists doctors diagnose abnormal or diseased cases. Linear regression based on ultrasound measures such as bi-parietal diameter (bpd), head circumference (hc), abdominal circumference (ac), and fetal length (fl) is common statistical method for weight estimation but the regression model requires that time points of collecting such measures must not be too far from last ultrasound scans. Therefore this research proposes a method of early weight estimation based on expectation maximization (EM) algorithm so that ultrasound measures can be taken at any time points in gestational period. In other words, gestational sample can lack some or many fetus weights, which gives facilities to practitioners because practitioners need not concern fetus weights when taking ultrasound examinations. The proposed method is called dual regression expectation maximization (DREM) algorithm. Experimental results indicate that accuracy of DREM decreases insignificantly when completion of ultrasound sample decreases significantly. So it is proved that DREM withstands missing values in incomplete sample or sparse sample.


Introduction
According to the regression approach of fetal weight estimation, without loss of generality, an estimation formula is a linear regression function Z = α0 + α1X1 + α2X2 + … + αnXn where Z is estimated fetal weight whereas Xi (s) are gestational ultrasound measures such as bi-parietal diameter (bpd), head circumference (hc), abdominal circumference (ac), fetal length (fl). Variable Z is called response variable or dependent variable. Each Xi is called regression variable, regressor, predictor, regression variable, or independent variable. Each αi is called regression coefficient. Researches focusing on the regression approach aim to estimate coefficients from gestational sample of ultrasound measures. For example, Hadlock et al. (Hadlock, Harrist, Sharman, Deter, & Park, 1985) proposed regression models for weight estimation based on head size, abdominal size, and femur length, which is better than those based on measurements of head and body. Error means in percentage of their models are 1.3%, 1.5%, 0.4%, 1.4%, 2.3%, and -0.7% whereas error standard deviations are 10. 1%, 9.8%, 7.7%, 7.3%, 7.4%, 7.3%.
Deter, Rossavik, and Harrist (Deter, Rossavik, & Harrist, 1988) reassessed the weight estimation procedure of Rossavik (regression analysis) with particular emphasis on parameter estimation and performance over a wide weight range. As results, Deter, Rossavik, and Harrist assured that "there is no systematic errors over a 250 gram to 4750 gram weight range and random errors (± l standard deviation) of 10% to 13% below 200 gram and 6% to 8% above 2000 gram. The weights of small-and large-for-gestational age fetuses were systematically overestimated (4.1%) and underestimated (-3.0%), respectively, but systematic errors were not found in average-for-gestational age fetuses".
Varol et al. (Varol, Saltik, Kaplan, Kilic, & Yardim, 2001) evaluated the growth curve of well-functioned regression models (Hadlock formulas, for example). Their purpose is to contribute to develop national standard growth curve of gestational age and birth weight. Percentile values and correlation coefficients were calculated and well-functioned regression models were produced for growth curve. As a result, the regression model for gestational age age = 4.945 + 0.606*ac + 0.105*bpd + 0.286*fl with adjusted R 2 = 0.937 is optimal.
Salomon, Bernard, and Ville (Salomon, Bernard, & Ville, 2007) used polynomial regression approach to compute a new reference chart for weight estimation. Their resulted birth-weight chart showed that the weight estimation was noticeably larger at 25 -36 weeks. At 28 -32 weeks, the 50 th centile of actual birth weight is approximated to the 50 th centile of estimated weight.
A. R. Akinola, I. O. Akinola, and O. O. Oyekan (Akinola, Akinola, & Oyekan, 2009) evaluated many regression estimation models. Their results showed that models with hc and ac are not as good as those with ac and bpd. The combination of fl and ac did not improve accuracy. The uses of multiple measures gives most accurate estimation.
Lee et al. (Lee, et al., 2009) used multiple linear regression model with standard measures (bpd, fl, ac) and their proposed biometrics such as fractional arm volume (fav) and fractional thigh volume (ftv). They produced six weight estimation models such as model 1, model 2, model 3, model 4, model 5, and model 6. The model 3 which is log(weight) = 0.5046 + 1.9665*log(bpd) -0.3040*log(bpd)*log(bpd) + 0.9675*log(ac) + 0.3557*log(fav) and model 6 which is log(weight) = -0.8297 + 4.0344*log(bpd) -0.7820*log(bpd)*log(bpd) + 0.7853*log(ac) + 0.0528*log(ftv)*log(ftv) gain highest accuracy. Model 5 classified an additional 9.1% and 8.3% of fetuses within 5% and 10% of birth weight. Model 6 classified an additional 7.3% and 4.1% of infants within 5% and 10% of birth weight. Bennini et al. (Bennini, Marussi, Barini, Faro, & Peralta, 2009) created a total of 210 pregnant women in their research into a formula-generating group (150 women) and prospective validation group (60 women). Polynomial regression is used to generate one formula based on two-dimension measures, one formula based on fetal thigh volume by multiplanar technique, and one formula based on fetal thigh volume by Virtual Organ Computeraided Analysis. The experimental results showed that their models are significant good and there is no significant difference between two-dimension model and three-dimension models.
Cohen et al. (Cohen, et al., 2010) used linear regression model to compare estimated weights for births after 6 days after last ultrasound scan and actual weights. Their results indicate that the mean ± standard deviation percentage among deliveries within 1 day of last ultrasound scan is 0.2 ± 9%.
Siggelkow et al. (Siggelkow, et al., 2010) proposed a new algorithm of isotonic regression to construct a birth weight prediction function that increases monotonically with each of input variables (ultrasound measures) and minimizes empirical quadratic loss. As a result, their isotonic regression function gains a small mean absolute error (312 gram).
Pinette et al. (Pinette, et al., 1999) used mean weight value from multiple formulas in order to improve the estimation. For instance, Pinette et al. calculated four estimated weight values w1, w2, w3, and w4 from formulas of Shepard, Hadlock, and Combs and then, they computed the mean w = (w1 + w2 + w3 + w4) / 4 as the optimal estimated value of birth weight.
When fetal weight is estimated based on gestational age, the weight-for-gestational chart is used. In such chart, if gestational age falls below 10 th percentile then, it is impossible to estimate respective weight and so such problem is called small-for-gestational-age which often occurs because of missing data. Hutcheon and Platt (Hutcheon & Platt, 2008) applied standard epidemiologic approaches to correct the missing data problem. However such approaches does not use regression model. When gestational age is incompletely recorded, Eberg, Platt, and Filion (Eberg, Platt, & Filion, 2017) proposed four approaches to estimating missing gestational age: (1) generalized estimating equations for longitudinal data; (2) multiple imputation; (3) estimation based on fetal birth weight and sex; and (4) conventional approaches that assigned a fixed value (39 weeks for all or 39 weeks for full term and 35 weeks for preterm).
In general, most of researches required that the time point to take ultrasound measures is not too far from last ultrasound scan so that bias of actual birth weight at delivery time is small enough. If ultrasound examinations are taken soon, the gestational sample will lack weights because ultrasound measures do not conform to actual birth weight at delivery time. In the next section, we propose an algorithm which is an application of expectation maximization (EM) algorithm for estimating fetal weight in case of incomplete data. Here we should survey some researches related to how to apply EM algorithm into regression model. In literature of EM algorithm, missing values of data sample are estimated in expectation step and coefficients of regression model are re-estimated in maximization step. For example, Zhang, Deng, and Su (Zhang, Deng, & Su, 2016) used EM algorithm to build up linear regression model for studying glycosylated hemoglobin from partial missing data. In other words, they aim to discover relationship between independent variables (predictors) and diabetes. Therefore, the ideology of applying EM algorithm into regression model is not new but we propose a special technique to construct mutually two dual regression models at the same time from incomplete gestational sample. The algorithm that implements such technique is called dual regression expectation maximization (DREM) algorithm. DREM simplifies EM algorithm with assumption of normal distribution and moreover only weights in gestational sample are missing. DREM is described in next section.

Methodology
This research continues our previous research (Nguyen & Ho, A framework of fetal age and weight estimation, 2014) in which the used ultrasound samples are collected in fetal age from 28 weeks to 42 weeks because delivery time is not over 48 hours since last ultrasound scan. Hence, accuracy of weight estimation is only ensured when ultrasound examinations are performed after 28-week old fetal age. In the chapter "Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight" of the book "E-Health" by Thomas F. Heston, we proposed an early weight estimation in which ultrasound measures can be taken before 28week old fetal age. In this research, we implement such proposal as a so-called dual regression expectation maximization (DREM) algorithm and then make experiments on DREM. With DREM, the gestational sample can be totally collected at any appropriate time points in gestational period. In other words, the sample can lack fetal weights. This is a convenience for practitioners because they do not need to concern fetal weights when taking ultrasound examinations. Consequently, early weight estimation is achieved. As a convention, vectors are column vectors if there is no additional information.
Suppose we estimate two linear regression models Z = α0 + α1X1 + α2X2 + … + αnXn and Z = β0 + β1Y where Z is fetal weight and Y is fetal age whereas Xi (s) are gestational ultrasound measures such as bpd, hc, ac, and fl. Suppose both random variables Y and Z conform normal distribution, according to equation 1 (Lindsten, Schön, Svensson, & Wahlström, 2017, pp. 8-9). Note, only Z is random variable whereas X and Y are data.
Where, α = (α0, α1,…, αn) T and β = (β0, β1) T are parameter vectors where X = (1, X1, X2,…, Xn) T and (1, Y) T are data vectors. As a convention, linear function Z = α0 + α1X1 + α2X2 + … + αnXn is called first function, first model, or first formula whereas linear function Z = β0 + β1Y is called second function, second model, or second formula. Such two models are estimated in duality at the same time. The means of Z with regard to the first model and the second model are α T X and β T (1, Y) T , respectively whereas the variances of Z with regard to the first model and the second model are σ1 2 and σ2 2 , respectively. Note that the superscript " T " denotes transposition operator in vector and matrix. Let D = (X, y, z) be collected sample in which X is a set of sample measures, y is a set of sample fetal ages, and z is a set of fetal weights with note that z is empty or incomplete. If z is empty, there is no zi in z. If z is incomplete, z has some values but there are also some missing values in z. However, the constraint is that X is complete. Now we focus on estimating α and β based on D. As a convention let α * and β * be estimates of α and β, respectively (Lindsten, Schön, Svensson, & Wahlström, 2017, p. 8).
Conditional expectation of sufficient statistic Z given X with regard to P(Z | X, α) is specified by equation 3.
There is a heuristic assumption that the explicit dependence and the implicit dependence be the expectation of Z with regard to the entire probability of Z specified by equation 2. If the heuristic assumption is satisfied, Thus, from equations 3 and 4, we have equation 5 to specify the heuristic assumption.
Please pay attention to equation 5 because Z will be estimated by such expectation later. For convenience, let Θ = (α, β) T be the compound parameter. The entire probability of Z given X and Y specified by equation 2 is re-written as follows: (Due to all observations are independently and identically distributed) )) The log-likelihood function is logarithm of the entire joint probability according to equation 6.
The notation 0 = (0, 0,…, 0) T denotes zero vector. All equations in the system 8 are linear, whose unknowns are Θ = (α, β) T . We apply expectation maximization (EM) algorithm into estimating Θ = (α, β) T with lack of fetal weights. Note that the entire probability P(Z | X, Y, α, β) specified by equation 2 is product of regular exponential distributions. EM algorithm has many iterations and each iteration has expectation step (E-step) and maximization step (M-step) for estimating parameters. Given current parameter Θ t = (α t , β t ) T at the t th iteration, the two steps are shown in table 1 (Dempster, Laird, & Rubin, 1977, p. 4).
1. E-step: Missing values zi (s) are estimated as expectations of themselves based on the current parameter Θ t , according to equation 5.
= + (1, ) 2 2. M-step: The next parameter Θ t+1 is a maximizer of L(Θ), which is the solution of equation system 8. The equation system 8 is solvable because missing values zi (s) were estimated in E-step. According to (Montgomery & Runger, 2010, pp. 456-459), the solution of system 8 at current iteration is specified by equation 9. Where, Note, Θ t+1 becomes current parameter for the next iteration. Table 1. E-step and M-step of EM algorithm The EM algorithm stops if at some t th iteration, we have Θ t = Θ t+1 = Θ * . At that time, Θ * = (α * , β * ) T is the optimal estimate of EM algorithm and hence the first model and the second model are determined with α * and β * . In practice, the algorithm can stop if the deviation between Θ t and Θ t+1 is smaller than a small enough terminated threshold. In this research such terminated threshold is ε = 0.001. The smaller the terminated threshold is, the more accurate the algorithm is.
The two steps shown in table, which is application of EM algorithm into linear regression model, is the proposed algorithm which is called dual regression expectation maximization (DREM) algorithm. The duality is implied by equation 5 which is used to estimate missing values zi in the E-step. If equation 3 is used to estimate zi for the first model and equation 4 is used to estimate zi for the second model then, dual option is turned off. If dual option is turned off but yi is missing then, equation 3 is used to estimate zi for both the first model and the second model. In general, DREM has two options such as dual and non-dual. As usual, dual option is default option of DREM. In other words, equation 5 is used to estimate missing values zi by default. Next section focuses on experimental results of DREM.

Experimental results
We uses the gestational sample of 127 cases in which each case includes ultrasound measures, fetus age, and fetus weight. Ultrasound measures are bi-parietal diameter (bpd), head circumference (hc), abdominal circumference (ac), and fetal length (fl). The unit of bpd, hc, ac, and fl is millimeter. The units of fetal age and fetal weight are week and gram, respectively. Ho and Phan ,  collected the sample of pregnant women at Vinh Long General Hospital -Vietnam with obeying strictly all medical ethical criteria. These women and their husbands are Vietnamese. Their periods are regular and their last periods are determined. Each of them has only one alive fetus. Fetal age is from 28 weeks to 42 weeks. Delivery time is not over 48 hours since ultrasound scan.
Such sample of 127 cases is used as dataset to make experiment in this research. The dataset is split into two folders and each folder owns one training dataset and one testing dataset. Training dataset and testing dataset in the same folder are separated, which means that they do not have common cases. Training dataset is named sample{i}.base and testing dataset is named sample{i}.test. There are two training datasets and two testing datasets such as sample1.base, sample1.test, sample2.base, and sample2.base.
-Folder 1 has sample1.base of 513 cases and sample1.test of 514 cases.
-Folder 2 has sample2.base of 514 cases and sample1.test of 513 cases. We do not have ultrasound sample which has examination cases whose delivery time is over 48 hours since ultrasound scan so as to evaluate early weight estimation in real time. Therefore, we make training datasets sparse in order to test DREM algorithm. Each training dataset is made sparse with sparse ratios (0.2, 0.4, 0.6, 0.8) into incomplete (missing) Table 2. Ten testing pairs Note, training datasets in the 1 st and 2 nd pairs are complete so that we can simulate DREM in early weight estimation. The 1 st and 2 nd pairs are called completed pairs whereas 3 rd , 4 th , 5 th , 6 th , 7 th , 8 th , 9 th , and 10 th are called incomplete pairs. Experimental results from incomplete pairs are compared together and are aligned with experimental results from complete pairs so as to evaluate DREM with subject to both early weight estimation and withstanding of DREM for missing values. The terminated condition in DREM is that the deviation between estimated coefficients and current coefficients is smaller than ε = 0.001.
DREM algorithm always results out two regression models on weight estimation. The first model estimates fetal weight based on ultrasound measures whereas the second model estimates fetal weight based on age. DREM algorithm supports two options such as dual and non-dual. With dual option, the first model and the second model are mutually dependent and so they are mutually improved. In other words, by dual option, missing weights zi (s) are estimated by equation 5 for both the first model and the second model. With non-dual option, the first model and second model are independent. By non-dual option, missing weights zi (s) are estimated separately by equation 3 for the first model and by equation 4 for the second model. When sample is complete with the 1 st pair and the 2 nd pair, DREM results the same models as least squares method (Montgomery & Runger, 2010, pp. 452-459) does. As a convention, let f{i}{d/n} and g{i}{d/n} be the first model and the second model with regard to the testing pair i th and dual/non-dual option. For example, f1d is the first model derived from the 1 st testing pair with dual option whereas g2n is the second model derived from the 2 nd testing pair with nondual option. By looking up The smaller the MAE is, the more accurate the DREM is. Where, = − = {0.1925,0.3674,0.5284,0.6748} Note that ̅ = 0.4408 and sD = 0.2078 are sample mean and sample standard deviation of D. Because the t0 is larger than the percentage point t0.05, 3 = 2.353, difference between the percentage of missing values and the percentage of decrease in accuracy of DREM is significant within test 1, odd pairs (3 rd , 5 th , 7 th , 9 th ), and dual option. Table 5 shows paired ttests for test 1 and test 2 with dual / non-dual options, given MAE metric and significant level 95%. We use odd pairs (even pairs) in a same group which is compared with the 1 st pair (2 nd pair) because odd pairs (even pairs) share the same testing dataset sample1.test (sample2.test).  Table 5. Paired t-tests given MAE metric where t0.05, 3 = 2.353 From paired t-tests in table 5, it is asserted that the withstanding of DREM for missing values with regard to MAE metric is significant because the bias ratios with regard to MAE metric are much smaller than percentages of missing values.
We compare the mean (average) of MAE metric with regard to dual option and non-dual option. In both test 1 and test 2, DREM with dual option decreases MAE metric very little (-5.70 ≈ 175.2604 -169.5578 in test 1 and 2.34 ≈ 249.0563 -246.7210 in test 2). In other words, dual DREM does not improve both the first model and the second model with subject to MAE metric. However, the deviation of MAE between test 1 and test 2 with dual option (73.7959 = 249.0563 -175.2604) is smaller than the deviation of MAE between test 1 and test 2 with nondual option (77.1632 = 246.7210 -169.5578). This implies that dual option makes trade-off between the first model and the second model. Moreover we will know later that dual option will also increase convergence of DREM by decreasing the number of iterations.
The smaller the RMSE is, the more accurate the DREM is. We evaluate DREM with error ranges. Equation 11 specifies error range metric.
The smaller both error mean μ and error standard deviation σ are, the more accurate the DREM is. Because error range is not scalar, it is incorrect to make paired t-tests and so we cannot assert the withstanding of DREM for missing values with regard to error range. It is also incorrect to calculate the mean (average) of error ranges but we assert that dual DREM does not improve the first model and the second model with subject to error range because error means and error standard deviations of dual option are often larger than those of non-dual option.
We evaluate DREM with correlation coefficient (R). Equation 12 specifies R metric.
The correlation coefficient R reflects adequacy of a given formula. The larger the R is, the better the formula is.  Table 10. Paired t-tests given R metric where t0.05, 3 = 2.353 From paired t-tests in table 10, it is asserted that the withstanding of DREM for missing values with regard to R metric is significant because the bias ratios with regard to R metric are much smaller than percentages of missing values.
We compare the mean (average) of R metric with regard to dual option and non-dual option. In both test 1 and test 2, DREM with dual option does not increase R. Even though, DREM with dual option almost does not changes R (-0.001 ≈ 0.9573 -0.9584 in test 1 and 0 = 0.9198 -0.9198 in test 2). In other words, dual DREM does not improve both the first model and the second model with subject to R metric. However, dual option makes trade-off between the first model and the second model because the deviation of R between test 1 and test 2 with dual option (0.0375 = 0.9573 -0.9198) is smaller than the deviation of R between test 1 and test 2 with non-dual option (0.0386 = 0.9584 -0.9198).
With regard to metrics such as MAE, RMSE, and R then, DREM surely withstands incomplete data. Its accuracy decreases insignificantly when the percentages of missing values increases significantly. However dual option does not improve DREM in accuracy for both the first model and the second model although duality is a feature of DREM. However, there is an interesting discovery under duality of DREM. Table 11 lists the number of iterations of DREM with regard to dual option and non-dual option.
Pair Dual Non-dual  1  2  2  2  2  2  3  12  12  4  12  13  5  17  19  6  19  20  7  30  33  8  31  35  9  57  100  10  62  72  Table 11. The number of iterations of DREM with dual / non-dual option From table 11, the number of iterations in dual option is smaller, which means that the convergence of DREM is improved with dual option. The reason is that DREM with dual option takes advantages prior information from both the first model and the second model in order to speed up the convergence. Especially, the more the sparse ratio (percentage of missing values) increases, the more dual option brings into play. The sample in this research is not huge and so such improvement is insignificant.
An alternative technique to improve the convergence of DREM is to initialize the parameter Θ 1 = (α 1 , β 1 ) T at the first iteration of EM process in proper way instead of initializing Θ 1 in arbitrary way. Note, by default, Θ 1 is initialized as zero vector. Let X' be the complete matrix of ultrasound measures, which is created by removing rows whose respective weights zi (s) are missing from X. Similarly, let Y' be the complete matrix of fetal ages, which is created by removing rows whose respective weights zi (s) are missing from Y. Let z' be the complete vector of non-missing weights. The advanced Θ 1 is initialized by equation 13.
Equation 13, which is solution of the equation system 8, is also a variant of equation 9 where X, Y, and z are replaced by X', Y', and z'. Note, X, X', Y', and z' are complete whereas Y and z can be incomplete.  table 12 with table 11, it is asserted that the advanced initialized parameter Θ 1 improves the convergence of DREM. The interesting result is that non-dual option seems to be preeminent in speeding up DREM with advanced Θ 1 whereas it is worse option with arbitrary Θ 1 . Now we test DREM with advanced Θ 1 and very small terminated threshold ε = 10 -10 .
Pair Dual Non-dual  1  1  1  2  1  1  3  18  3  4  19  6  5  31  8  6  34  8  7  53  13  8  58  15  9  107  40  10  110  27  Table 13. The number of iterations of DREM with advanced initialized parameter and very small terminated threshold From table 13, there is no doubt that non-dual option is preeminent in speeding up DREM with advanced Θ 1 . In general, dual option is only useful if researchers want to build up two acceptable (mutual) regression models because dual option makes trade-off between the first model and the second model. Recall that the essence of DREM is to build up two mutual regression models in duality but scientists can turn off such dual option.

Conclusions
The analysis of experiments proves efficiency of DREM in withstanding sparse dataset but this good result is derived from preeminence of EM algorithm when EM algorithm estimates missing values by sufficient statistic. The duality of DREM currently is not an excellent feature although it improves the convergence of DREM. Note, in literature of EM, the convergence of EM will be improved with support of additional information like prior probability. So, the first model of DREM gives additional information to the second model and vice versa. In general, the ideology of DREM does not go beyond EM algorithm but DREM solves effectively the problem of incomplete sample, which in turn results out the early weight estimation in obstetrics. In this research, only weight values are missing. In the future, we will improve DREM to solve a hazard problem in which fetal weight, fetal ages, and ultrasound measures can be missing. We may also introduce another algorithm different from DREM which is also another implementation of the proposal mentioned in the chapter "Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight" of the book "E-Health" by Thomas F. Heston. If such hazard problem is solved successfully, practitioners will have a lot of benefits when they will not be stressful in taking ultrasound examinations because some measures are allowed to be missing. In other words, it is acceptable for practitioners to make unintentional mistakes when taking ultrasound examinations. Moreover researchers also get benefits because they can receive estimation models from incomplete sample. In literature of EM algorithm, there are methods to estimate regression model with lack of some independent variables and so the improvement of DREM is feasible.