A Novel Integrated Machine & Business Intelligence Framework for Sensor Data Analysis

: Increased smart devices in various industries is creating numerous sensors in each of the equipment prompting the need for methods and models for sensor data. Current research proposes a systematic approach to analyze the data generated from sensors attached to industrial equipment. The methodology involves data cleaning, preprocessing, basics statistics, outlier, and anomaly detection. Present study presents the prediction of RUL by using various Machine Learning models like Regression, Polynomial Regression, Random Forest, Decision Tree, XG Boost. Hyper Parameter Optimization is performed to find the optimal parameters for each variable. In each of the model for RUL prediction RMSE, MAE are compared. Outcome of the RUL prediction should be useful for decision maker to drive the business decision; hence Binary classification is performed, and business case analysis is performed. Business case analysis includes the cost of maintenance and cost of non-maintaining a particular asset. Current research is aimed at integrating the machine intelligence and business intelligence so that the industrial operations optimized both in resource and profit.


Introduction
Advances in Internet of things has made possible to connect numerous machines and devices to internet, so that information about machine operation and health is available for analysis. With increased network access it has become easy to capture the data using sensor, there is a need for models and methods for analyzing the data captured. Products in industries like Aerospace, Defense, Automotive when embedded suitable sensors that are miniature and self-powered can make the products both are being developed These sensors are self-powered and self-intelligent to capture information, also process the same as much as possible so that the a reaction mechanism can be activated using the local processing. Fig. 1 depicts the growth of the number of devices that are being connected to internet from 2012 to 2020, and it is predicted that more machines will be connected to internet in near future. Hence there is need for Algorithms needs to be developed which can be deployed both for cloud and edge processing of the data. These algorithms differ from the usual data analysis models as sensor data reflects the challenges of real operating environment of the system. The sensor is self-powered and exposed to harsh working environments hence the data being captured can be faulty and erroneous. One of the challenges of the current IoT system is to generating business insights from IoT platforms. Referring to Fig. 1 each of the device that is connected to internet adds more data Huge volume of data generated from these sensors needs to be integrated with artificial intelligence system for timely action by optimizing the business resources. There is need for systems that are intelligent and selfimproving. With the advent of edge and cloud computing it is becoming important to increase the speed and accuracy of predictions.

Literature Review
Data Driven Modelling has gained momentum with numerous publications in the field. NASA has been enduring people to use the data available through CAMPSS [1] [3] used the neural network approach for predicting the condition of industrial equipment like refrigerator. Pranav Kumar et al [6] used genetic algorithm-based approach for optimizing the industrial design parameters using the genetic algorithm, the research is aimed at using machine learning algorithm for optimization of various parameters. The research so fat focused on initial approaches for the modelling the data related to design and operation of industrial components. There is need for better data driven models to be useful for industrial IOT applications. There are studies focusing on anomaly detection using time series models there is need to work on real time anomaly detection for edge computing and safety critical equipment. Alexendria [3] et al used machine learning models for sensor analysis in using a sensor node approach. Temperature (•C), Humidity (%), Light (Lux), Pressure (hPa) are sensed using the node and the same are processed by internet gateway. The experiment is conducted in a lab environment with human presence to include additional data of number of people. The approach is limited in terms of variables considered. Taha et al [4] explored machine learning based methods and models for sensor data that is generated from an aircraft engine. The engine consists of multiple subsystems with parameters like pressure, temperature, flow rate. The proposed methodology uses only a set of total number of sensors for predicting the remaining useful life. RUL prediction can be improved if the number of parameters considered can be maximized. The model is more developed by excluding the extreme values of each of the sensors. Okoh et al [5] presented a review of existing remaining useful life estimation computational techniques. Ellefsen et al [6] proposed a deep learning based architecture, his paper investigates the effect of unsupervised pre-training in RUL predictions utilizing a semi-supervised setup. Additionally, a Genetic Algorithm (GA) approach is applied to tune the diverse number of hyper-parameters in the training procedure. Zhou et al [7] proposed a residual lifetime estimation on function data. Hanachi et al [8], modelled RUL using performancebased sensor data of engineering systems. S.K. Singh [9] propose a novel soft computing method for RUL prediction. Al-Dahidi et al [10] proposed methodology for RUL prediction for a heterogenous fleet operating under normal and severe operating conditions. Saxena [11] proposed RUL prediction model by including run to failure data. The model proposed RUL prediction and Binary and Multi class classification. Zhang [12] et al proposed a RUL estimation for a rotating element like bearing, this method is significant as it is useful for many rotating equipment. Kim [13] et al proposed a RUL prediction for multi sensor data set. Wang et al [14] A generic probabilistic framework for structural health prognostics and uncertainty management. Song et al [15] proposed Statistical degradation modeling and prognostics of multiple sensor signals via data fusion: A composite health index approach. The method uses most data analytics techniques. Liu [16] Integration of data fusion methodology and degradation modeling process to improve prognostics. There are many approaches proposed in the literature to find the RUL prediction, these all methods are focused on machine intelligence. Due to this industrial system have limitation of direct application into industrial needs. Hence there is need for integrating the machine intelligence and business intelligence

Problem Statement
The data generated from finance, insurance, and other allied industries are popularly analyzed by methods and models developed over a period of past few decades. Usually these models consider cannot be directly used for sensor data. Whereas in sensor data analysis the way data being captured makes it different from other data. These sensors are located at various healthy and unhealthy condition order to apply such models for industrial equipment a new framework is needed as the data is time dependent in nature and also there is a need for integrated approach by integrating all forms of information available like asset life, asset make, operation data.. Initially the frequency of t data collection is fixed at device level and further it will be stored in cloud data base for further processing. Data that is available at central cloud can be grossly viewed as. Context awareness understanding the operator behavior. As shown in the fig 2 each of the machine is connected to other machine using a unique id for each of the machine, and also each of the machine consists of numerous sensors which are self-powered along with unique identity for each of them. All the sensors collect the data at various time stamps starting from seconds. During the operation, the data will be captured from a healthy state to degradation state. Hence current it is important to connect the data both in sensors level and time stamp level. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 29 July 2020 doi:10.20944/preprints202007.0697.v1

igure. 2. Sensor Data Analysis Framework
• Time series Collection: In this approach the data is collected in seconds, microseconds. These values are then averaged for minute hour data wise and week wide. A machine with multiple of sensors product huge amount of data which need to average for suitable time window for further analysis. Timeseries data needs to be checked for frequency of captured and healthy data capture.
• Cyclic series Collection: Usually many machines are operated with regular frequency. Typically engineering systems like aircraft consider each flight as a cycle of operation with specific time of flight for each of the cycle with specific load. For example, the health condition of an engine may vary in tens of hours, whereas its dynamic response is on the order of seconds. Identification of the current state of the engine health at slow-time epochs is very important for maintenance engineers because necessary repairs must be carried out much before the engine becomes seriously damaged or permanently non-operable. Thus, it is essential to monitor slow-time-scale anomalies for gas turbine engines from the time series data of the engine observables on a fast time scale. To this end, a generic gas turbine engine simulation test bed has been used to validate the anomaly detection method. 2 The features of the engine simulation model are like those of the engine model

Exploratory Data Analysis
Data driven models need data generated from sensors that can capture physics related failure data of the engine is available The data set contains the operational history of the aircraft engines which is recorded over a period of operations from 100 engines. The data is available in csv and text format which needed to smoothen to make it useful for model consumption. The data is processed using python analysis package pandas. The data is converted from standard CVS format to panda's framework understanding the data nomenclature, missing values, null entries etc. The data contains around 100 cycles on the average for each of the engine. Fig 3 Depicts the overall data analysis procedure adapted in current research.

Figure.3. Analytics framework for Sensor Data Analysis
Usually the data is available in standard enterprise database collection system that collect sensor data and store the data in frameworks like SAP, etc. These data files further saved as excel, csv, txt for facilitating the ease of data analysis on specific modules with smaller data samples.
Following  One of the key parameters of the above statistics is standards deviation, the standard deviation

Solution Approach
Predicting the future value of a given variable is based on numerous statistical and machine learning models. These are models that can predict for single variable to multiple variables. Usually the models are built from a simple linearly to complex neural network models. These models and methods start with fundamental mathematical relation between single or multiple independent variables to dependent variable. Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting. Different regression models differ based on -the kind of relationship between dependent and independent variables, they are considering, and the number of independent variables being used. Learn regression models are more suitable for parameters which are linearly dependent on each other. Current research aims at finding the time to failure as accurately as possible hence initially time to failure for all the sensor values for all engines. Hence in current problem time to failure is treated as dependent variable. For any predictive analysis linear regression model is a basic approach to solve for getting initial estimate in current study. Segment training and test data into features data frame and labels series. (3) Matrix A represents the co-efficient corresponding to each of the sensor. These coefficients indicate the effect of variable on the dependent variable.
Matrix B represents the intercept or constant value of linear model. This will be a constant value that will contribute for the dependent variable

Prediction of RUL (Remaining Useful Life)
Remaining Useful life of the component is predicted by using following models which includes many regression and neural network models. There are combination of regression and combined models like Linear, LASSO

Integrated Machine & Business algorithm
One of the challenges of widespread application IIoT system is lack of business intelligence models along with data intelligence models. Any deployment of large number of sensors and cloud-based data storage systems needs to be considered keeping in view of the cost benefit analysis. Proposed is unique and novel framework that includes both the machine intelligence and business intelligence to generate immediate alerts and autonomously plan the operations and maintenance. For transportation industry like air transportation current framework can automatically plan the operations and maintenance at each engine so that there is no cost associated with manual error in choosing the machine to be sent for maintenance. This framework also generate alerts based on the delayed maintenance.

Binary Classification
Binary classification is key aspect of the remaining life estimation in this approach a binary classifier is modelled to find the in this approach the modelling is done to classify the engines that will fail withing a stipulated period let us say 30 days/cycle. Binary classification will help the decision maker to decide the which of the engines will fail in each time. In current analysis a period of 30 cycles is taken. Binary classification when coupled with business decision making algorithm will provide an integrated platform. Following models are considered for binary classification and key metrices are reported in the below  Figure 8 Comparison of Confusion Matrix

Business Intelligence Analysis
Decision making is critical aspect of any industrial IoT system. It is important that the algorithm should give the outcome in terms of profits the business can reap. In current analysis KNN model gave maximum business benefit. This is based an assumed cost and profit for maintenance and

Conclusion & Future Work
Industrial IoT is an emerging technology which can significantly help in improving the safe and intelligent operation of many machines. A successful application of such technology needs to be integrated with machine learning models and business benefit. Current research is aimed at developing an integrated framework for industrial IoT. In this study machine learning models like KNN, Decision Tree, Logistic Regression, SVC are explored to develop a framework which can combine both machine intelligence and business intelligence. Many of the current problems in industries lack unified and integrated decision support system for applying the industrial internet. Current research can further improve to integrate the multiple needs of the industrial plans.