Predicting Earned Value Indexes in Residential complexes’ Construction Projects Using Artificial Neural Network Model

The importance of this study may be defined by using the smart techniques to earned value indicators of residential buildings projects in Republic of Iraq, only one development intelligent forecasting model was presented to predict Schedule Performance Index (SPI), Cost Performance Index (CPI), and To Complete Cost Performance Indicator (TCPI) are defined as the dependent. The approach is principally influenced by the determining numerous factors which effect on the earned value management, that involves Iraqi historical data. In addition, six independent variables (F1: BAC, Budget at Completion., F2: AC, Actual Cost., F3, A%, Actual Percentage., F4: EV, Earned Value. F5: P%, Planning Percentage., and F6: PV, Planning Value) were arbitrarily designated and satisfactorily described for per construction project. It was found that ANN has the capability to envisage the dust storm with a great accuracy. The correlation coefficient (R) has been 90.00%, and typical accuracy percentage has been 89.00%.


INTRODUCTION
In early stage, a proper estimation is considered as a key factor of a construction project in accomplishment of any intended project. On the other hand, estimation is being a quite complex during the planning stage, while documentation besides drawings are not done yet. Thus, several techniques have been used to achieve a perfectly estimation in an early stage, while the information of the intended project is still restricted. [1] One of the highly powerful and popular multilayer feed forward network is trained with backpropagation. The training of the developed network is conducted by back-propagation algorithm which was developed and includes three stages of the feed forward of the input training patterns, the calculation and back-propagation stage of the associated error, and the adjustment stage of the weights. [15-16-17] The foremost goal of this paper is to construct artificial neural networks model to envisage the earned value indicators of residential buildings projects in Republic of Iraq. For achieving this, there is a necessity for identifying the factors that impact residential buildings projects performance. Consequently, the author in this paper is trying improvement of earned value model through the following steps: 1. Choose the appropriate neural networks Software. 2. Identification of ANNs models variables that have an effect on the earned value index in Iraqi residential buildings project. 3. Expansion and investigation of the proposed ANNs models to predict the earned value indexes. 4. Examine the substantiation and authentication of the developed mathematical models.

CHOOSE THE APPROPRIATE NEURAL NETWORKS SOFTWARE.
Today, neural networks are used for solving many business problems such as earned value forecasting. The researcher studied many neural network programs. The researcher found that the best program for neural networks, which is easy to use, and is compatible with all the problems simple and complex, and accepts all types of variables and different factors. Numerous uses that provide for the statistics analysis establishment as in Microsoft Excel, STATISTICA, MINITAB, and MATLAB, however this paper adopts SPSS Program. SPSS stands for Statistical Package for the Social Sciences for the premier statistics analysis environment. The SPSS simulator has visual, easy-to-use, object-oriented method for problem solving by means of intelligent technologies. SPSS is used by numerous classes of investigators for multifaceted statistical data investigation. Long produced by SPSS Inc., it had been acquired by IBM in 2009. The up-to-date varieties have termed as IBM SPSS Statistics.

IDENTİFİCATİON OF ANNS MODELS VARİABLES
This study uses historic data analysis as the methodology foundation. Additionally, using historical data helps in giving a relation among the key factors influencing the earned value parameters of the residential buildings projects to create estimations for new projects. ANN models necessitate lots of data. Consequently, many historic residential buildings projects were collected which had done between 2012 and 2016, in Mosel residential complex project in Iraq. The projects collected from cities, Ministry of Construction and Housing, consultants and contractors. Subsequently, the data has been analyzed. The used data and information collection method in this study is the direct and indirect data gathering from local engineering firms as a sub-contractor. Currently, a noteworthy obstacle interrupted this method due to the unsecured situation in Iraq, and the insufficiency in documentation. Despite of this difficulty, the scientists could gather a trusted and effectively data for more than fourty eight residential constructions projects. That was through coming some companies and introducing the intended documents and reports for residential buildings projects. Moreover, these factors that effect on the earned value indexes adopted in Models of ANN as shown in the Table (1).

Table1. Variables of ANN Models
There are two types of variables that affected on the earned value in residential complex project in Republic of Iraq that are dependent variables and Independent variables.

Dependent variables
Cost Performance Index (CPI), Schedule Performance Index (SPI), and To Complete Cost Performance Indicator (TCPI) have been defined as the dependent variable and every specific engineering project has employed as the elementary unit of the observation.

Independent variables
After starting the dependent variables that projected via ANNs model, it has been essential for developing independent variables to clarify any variant in earned value indexes. There are many variables as an independent variable such as:

ANNs Model Equation 5) ANNs Model Validity
Mathematical models are divided into to three parts, the detailed description of these stages are concluded in the following section: The specified variables at the stage of the data identification, have been applied for developing the ANN models. Three mathematical models have established in this section, the features of the project in a mathematical model were applied to estimate earned value indexes. SPSS version 24 was used as a tool and technique to building the three models, as following: Follow the researcher five stages in building this model and as follows:

Development of Model Inputs and Outputs
Selected model input variables have the most important influence on the model performance as the significant step in evolving ANN models. Huge number of input variables for ANN models typically upsurges the network size, resultant in a reduction in processing speed and a decrease in the network efficiency. Different methods were recommended for selecting input variables such as Method of prior knowledge: based on prior knowledge, the suitable input variables can be chosen. This method is typically used in the field of project management, and is adopted in this study.
As an initial stage for neural network modeling, the problem at hand requires identifying and tagging the data as input or as output. SPSS v.24 has Microsoft Excel sheet, which is used through this step. The independent factors influencing the problem have recognized and measured as (N) input parameters, which are characterized by nodes at the input buffer of a neural network. The output of the model is Schedule Performance Index (SPI) and the input of this model is Earned Value (EV) and Planned Value (PV).

Data Division ANN Model
Data pre-processing has been highly important for employing neural nets positively. It evaluates what information is presented to produce a model throughout the training phase. Consequently, the subsequent step in the ANN model's development has been isolating the existing data into three subsets, training, testing and validation sets. Learning has achieved on the training set, that is employed for estimating the weights while the cross-validation set has adopted for generalization for producing better output for unseen instances. Nevertheless, the test set is used for measuring the generalization ability of the network, and evaluated network performance .
In the current step of the ANN models development, the current data has divided into three sets of training, testing, and validation sets. The separation may be prepared by separating the data when they having bottommost testing error and the maximum coefficient correlation. This separation may be achieved by using Neuframe software. In this study, a specified network was used which has the best performance regarding the testing error (in order to compare to other standards to assess the prediction performance, training error and correlation of validation set). By applying the software parameters as it is, different networks with a number of divisions have been developed. The results are briefed in Table  (2). Table 2

. Effect of Data Division on ANN Model Behaviour
It could be noticed from the Table (2) that the finest part is 80% for training set, 10% testing set, and 10% for validation set, in relation to taken testing error and coefficient of correlation (r) 5.300% and 90.950% respectively. Therefore, this part was accepted in ANN model. The influence of using numerous choices for divisions (i.e. blocked, striped, and random) has considered and presented in Table (3). It can be said that the performance of ANN model was quite insensitive to the separation process. When the striped division was applied, the best performance was achieved.

Model Architecture
One of the essential and tricky parts in the development of ANN models is to establish the model architecture. Usually, there is no direct and exact method to determine the proper number of nodes to include in every concealed layer. This issue becomes complex with increasing the number of hidden layers in the neural network.
The network of ANN Model has been adjusted to unique hidden layer through default parameters of the software (learning rate is 0.2 and momentum term is 0.8 and the transfer functions in hidden and output layer node are sigmoid). Many networks with various numbers of hidden layer nodes have been developed and the consequences have been depicted in Table ( As a results, only single hidden node has been selected in the present model with the lowermost testing error (5.300 %). It is supposed that the network with one hidden node has been considered optimal. Hence, it has designated in this model.
The influence of the momentum term on model outcomes has been studied for the model of one hidden node (learning rate equal to 0.20). The consequences have presented in Table (5). It could be noticed that the optimal value for momentum term has been (0.8) with minimum testing error of (5.30%), therefore it has been adopted in this model. After that, the test errors start to decrease to some extent within 0.8-0.95 range. So that the acquired optimal value for the momentum term has been (0.800) with training error of (5.300%) and significant minimum testing error (5.300 %) and the highest correlation coefficient (r) equal to (90.950 %). Therefore, it has been employed in this model.

Effect of Parameters Neuron
No. Furthermore, the consequence of the learning rate on the model performance has been examined (momentum term equal to 0.80) for ANN Model. The consequences have presented in Table (6). The optimal value for Learning Rate (LR) has been (0.2) with minimum lowest prediction error of (5.3%); henceforth it has been employed in this model. Therefore, the gotten optimal value for the Learning Rate (LR) equal to (0.200) with significant testing error, significant value training error and great significant coefficient of correlation (90.950 %); henceforth, it has been employed in this model. The impacts of adopting dissimilar transfer functions (i.e. tanh and sigmoid) have been examined and as depicted by Table (7). Accordingly, ANNs model performance has been reasonably unresponsive to the kind of the transfer function. The finest performance has been predicted at what time the sigmoid transfer function has been employed for hidden and output layers with the smallest prediction error 5.300 % coupled with highest correlation coefficient (r) (90.950 %).

EVM Indicators Model Equation
Minor connection weights have been gotten by SPSS for the optimum SPI, CPI and TCPI models. The neural network is converted into practically modest formulation. The group of ANN model has been exemplified in Figure (1), although the weights of connection and levels threshold (bias) have been explained by Table (8).  Based on connection weight besides the threshold levels presented in Table (

VERIFICATION AND VALIDATION OF THE ANN MODEL
The summary of computing Cost Performance Index (CPI), Schedule Performance Index (SPI), and To Complete Cost Performance Indicator (TCPI) by ANN for verification of estimating models has explained by Table (9). Where column two has actual Index that gotten from residential buildings project under construction in Iraq, and column (3) represents estimate Index after applying ANN equation on them, where ANN equation has gotten through SPSS program. The comparison between the estimated and actual Index is shown. Correlation coefficient between columns (Actual Index and Estimate Index by ANN equal to 98.03%, 80.45% and 63.38% for Schedule Performance Index (SPI), Cost Performance Index (CPI), and To Complete Cost Performance Indicator (TCPI) respectively, therefore it can be concluded that this model has an excellent covenant with the actual measured results, as presented in Figure (

. Study the relationship between Observed and Predicted TCPI for Validation Data
The description of four observations of residential buildings project (variables) is shown in Table (10) below. The Performance Measures have been significant in evaluating models; there have been dual magnitudes used in measuring the network performance for a specific data set. [18]  The outputs of the comparative study have been specified in Table (14). The MAPE and average Accuracy percentage produced by ANN model (SPI) have been 21.90% and 78.10% correspondingly. Consequently, ANN model (SPI model) has a good conformity with the actual measured results, MAPE and AA% produced by ANN model (CPI) have been 15.60% and 84.40% respectively. Therefore, it can be concluded that ANN model (CPI model) shows a great conformity with the actual measurements, MAPE and AA% generated by ANN model (TCPI) have been 161.72% and 83.82% respectively. For that reason, ANN model (TCPI model) has great conformity with the actual measured results.