Machine Learning-based Anomaly Detection with Magnetic Data

Pipeline integrity is an important area of concern for the oil and gas, refining, chemical, hydrogen, carbon sequestration, and electric-power industries, due to the safety risks associated with pipeline failures. Regular monitoring, inspection, and maintenance of these facilities is therefore required for safe operation. Large standoff magnetometry (LSM) is a non-intrusive, passive magnetometer-based measurement technology that has shown promise in detecting defects (anomalies) in regions of elevated mechanical stresses. However, analyzing the noisy multi-sensor LSM data to clearly identify regions of anomalies is a significant challenge. This is mainly due to the high frequency of the data collection, mis-alignment between consecutive inspections and sensors, as well as the number of sensor measurements recorded. In this paper we present LSM defect identification approach based on machine learning (ML). We show that this ML approach is able to successfully detect anomalous readings using a series of methods with increasing model complexity and capacity. The methods start from unsupervised learning with "point" methods and eventually increase complexity to supervised learning with sequence methods and multi-output predictions. We observe data leakage issues for some methods with randomized train/test splitting and resolve them by specific non-randomized splitting of training and validation data. We also achieve a 200x acceleration of support-vector classifier (SVC) method by porting computations from CPU to GPU leveraging the cuML RAPIDS AI library. For sequence methods, we develop a customized Convolutional Neural Network (CNN) architecture based on 1D convo∗Work conducted during internship at TOTAL EP Research & Technology USA. He is affiliated with the Department of Mechanical and Industrial Engineering, University of Massachusetts. Workshop on machine learning for engineering modeling, simulation and design @ NeurIPS 2020 lutional filters to identify and characterize multiple properties of these defects. In the end, we report scalability of the best-performing methods and compare them, for viability in field trials.


Introduction
Pipeline integrity is required for safe operation and it is ensured by regular monitoring, inspection, and maintenance of pipeline facilities. The undetected defects can cause significant damage to the day-to-day operations and the environment. Traditionally, intrusive inspection devices are used to detect the defects in the pipe material. These devices measure magnetic or ultrasonic signal in the surface and can disrupt or block the production. Hence, an non-intrusive external inspection is preferred to internal inspection.
Due to the induced mechanical or residual stresses, e.g. around defects, the magnetic behavior of the pipeline material changes locally due to the Villari magneto-restriction effect [1], [2]. Non-intrusive Large Stand-Off Magnetometry (LSM) technology has emerged as a promising tool in detecting small changes in the magnetic readings around defects associated with the regions of elevated stresses [1], [2], [3]. In addition to localizing the effects spatially, LSM's stress quantification has shown to provide insights into the properties of the defect in some cases [3]. A simplified schematic of the LSM is shown in Figure 1 with a multi-axial, multi-sensor alignment. The process of LSM involves the use of an array of sensors installed on an instrument to perform periodic observations of the asset's changes in magnetic signature [4]. Empirical correlation functions are used to differentiate the defect signals from normal regions. However, these functions have a number of limitations, such as requirement for multiple scans, reduced sensitivity to smaller defects, low localization precision, and ignoring some defect properties. , and gathers data in all three spatial dimension. When the magnetometer passes over a damaged/defect region, it records a different magnetic flux compared to the other sections of the pipeline. Due to interferences, the signals might be noisy and therefore to robustly identify anomalies becomes a challenge.
The use of Machine Learning (ML) for anomaly detection has shown great promise (e.g. [5,6,7]), mainly with regards to sequential data and time-series applications (e.g. [8,9]). Some of these methods are purely data-driven techniques identifying the occasional outliers, and others are hybrid methods incorporating feature engineering as well as prior information into the system that aim to robustly characterize the anomalous behavior [8]. Scalable ML frameworks for anomaly detection ( [10,11]) have grown in importance in the recent years, for instance the Python library pyod [12]. ML-based structure health monitoring for asset integrity is growing in usage and has been successfully applied to civil and aerospace engineering applications [13,14].

Our Contribution
In this work, the effectiveness of classical machine learning methods from scikit-learn [15] and pyod [12] Python-libraries, and customized Convolutional Neural Network (CNN) architectures from Keras [16] library for identification and characterization of anomalous magnetic field reading on the LSM experimental dataset are explored. The challenges in using multi-modal, multi-sensor information are discussed, including sequence alignment and remedies suggested. Multi-task classification is successfully achieved using a customized CNN architecture with a Conv1D filter. Scalability of these methods from lab experiments to field trials are discussed and slow performing methods are accelerated using algorithmic advancements.

Dataset
The datasets were obtained from controlled experiments conducted by a research facility. These experiments were conducted using multiple instruments and multiple sensor inputs. These instruments collected data in three axial directions, X, Y and Z for each sensor thereby providing multiple channels for dataset collection. The pipes were scanned with 1 mm resolution. Defects and welds were manufactured to mimic realistic field scenarios. These datasets were obtained at various circumferential positions to the defect. We present the details about two representative defects that we studied and are summarized below, in Table 1  Due to the nature of data collection using spatially distant sensors, alignment of the data channels was required. We used the Dynamic-Time-Warping (DTW) algorithm to re-align the datasets such to reduce alignment error to discover anomalies based on the non-aligned consecutive sensor readings with shifts in position. Typically a DTW algorithm has a quadratic space and time complexity [17,18] as it measures the similarity between the two sequences in the matrix form and calculates an optimal match between them. As one scales the warping with more data relevant to field applicability of this method, the cost to compute scales non-linearly and becomes computationally intractable. Therefore, we use an approximation of the DTW algorithm, known as FastDTW [19] which has a linear space and time complexity. FastDTW makes use of a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution, thereby providing the necessary speed-up [19]. More details are in the Appendix section. Once the different data channels are aligned we apply ML techniques for anomaly detection.

Machine Learning Results
Anomaly detection for sequential datasets using Machine Learning (ML) has shown to perform well for a variety of applications. In this section, we start by applying off-the-shelf packages such as pyod and scikit-learn to our datasets, pre-processed with the FastDTW alignment method. We first review the "point-based" unsupervised methods, following which challenges with data leakage for supervised learning methods are discussed and solutions provided. Finally, the customized CNN architecture is introduced for multi-task classification.

Unsupervised -Point based Learning
For unsupervised point-based methods we use the pyod v0.8 [12] package. Within pyod, we apply several anomaly detection models. Models are first trained on an original pipe scan and returns a baseline score. These trained networks are then tested on the actual LSM data, thereby determining their performance. Out of methods we tried, only k-nearest neighbor (K-NN) detector works reasonable well. Results are shown in Figure 2, and it shows regions of higher outlier scores in defect vicinity, marked by the vertical dotted lines.
The unsupervised learning point-based methods were computationally cheap to investigate, therefore we used these methods to not only determine their applicability to our data, but also to better understand how we may leverage the multi-modal aspect of data recording (namely feature engineering). Data configuration includes, combining or splitting multiple parameters such as pipe sections, sensor channels, sensor axes, and sensor pipe circumferential positions. We obtained the outlier score plots, shown in Figure 2, by individually studying each sensor input. In order to determine the sensitivity of the sensor inputs, we performed multiple experiments combining adding all data channels together, studying multiple angles together, transforming the data using normalization and spectral techniques. We observe that while the k-NN model is very robust in identifying anomalies with a higher outlier scores, studying multiple angles together has shown to outperform other feature engineering options, clearly distinguishing regions of anomaly with low false positives as seen in Figure 2. This can be attributed to a two-fold reason. 1. The orientation of the LSM instrument when positioned at different angles with respect to the defect, and 2. The aggregation of the data from all these channels is able to robustly identify anomalous magnetic field reading compared to a channel studied in isolation.  Therefore, in the following sections we present our results based on aggregating multiple angles together as a single input to the ML model.

Supervised -Point based Learning
The dataset contained a very limited number of defects, which is a challenge for supervised learning, thereby creating a major class imbalance. The labelling was achieved by creating a binary mask in the regions with defects. The defect location and nearby region of distortion of magnetic field is found about 3 feet from the actual location in either direction based on the instrument sensitivity. For the spatial location based masking, we observed issues related to data leakage [refer to Figure  8 in the Appendix section], which has the potential to cause unintentional and non-intuitive data leakage during the randomized test train split. In order to avoid this, we masked an entire defect region as our test dataset, and the rest was used for training. As a result, our training/test set includes 35000/1000 samples with the non-defect label, and 10000/1000 samples with the defect label. The class imbalance was addressed by adding class weights, in proportion to the number of samples in each class, to our learning models.
To quantify the model performance, we report the confusion matrix (in the Appendix section, Figure  7) accuracy scores and the Matthews Correlation Coefficient (MCC). The MCC is a robust correlation coefficient for binary classification problems with imbalanced datasets [20]. A score of +1 indicates perfect correlation, -1 indicates total disagreement, and 0 indicates no better than random predictions [20]. Mathematically, MCC is defined as where, T P , is true positive, T N is true negative, F P is false positive, F N is false negative.
We used the scikit-learn package [15] to build a pipeline with various ML methods including Support Vector Classification (SVC), Logistic Regression, Decision Tree, k-NN, MLP-Classifier, Gaussian Process Regression, and we report the best performing models in Table 2. Among all models SVC, Decision Tree and MLP Classifier are the three best performance methods in terms high accuracy on test and train datasets with low false positives.

Supervised -Sequenced based Learning
The "point-based" methods both in an unsupervised and supervised learning formulation have shown promising results in identifying local defects or the defect labels successfully. However, these models are not able to characterize the defect properties such as predicting the volume or the depth of the anomaly among other features. We identify Convolutional Neural Network (CNN) as a good candidate for sequence based learning and also for application to multi-task classification. CNN based architectures have previously been successfully used to do such multi-task classification [21,22]. We propose using sequence based CNN, to detect the spatial variation of the magnetic signal along the defect. CNN-based networks can extract spatial features ( [23,24]) from the magnetic signal. Upon building this sequence based learning model successfully, the model is extended to have multi-outputs for characterization of the defect region [21].
The challenge in using CNN for this binary classification task, is that the method is restricted in using 1D Convolutional filters (Conv1D) since there is not long enough sequence except along the profile dimension of the data. Other dimensions are limited due to acquisition constraints due to certain number of consecutive measurements, which prevents applying the convolution operator along them. The dataset has the dimension, M x N x P , where M is the number of points in the profile (based on the sensor resolution) along which we apply Conv1D filter, N , is the number of channels which we concatenate in the network in the first level, and P is the total number of angles we concatenate in the network in the second level. In order to use the current dataset as a sequence, we break it into a sub-sequence of 100 profile points which is long enough to capture a defect region. The schematic of the CNN based architecture is shown in Figure 3 where we use concatenation operations at the spatial levels to first study the effects of the multi-channel input separately, and post-concatenation operation study them together. Each Conv Layer is composed of Conv1D-filter -> Batch Normalization Layer -> Dropout [p varied between 0.2 to 0.4] layers. The total number of trainable parameters were 1,249,091. We used the Nesterov-Adam optimizer [25], with an initial learning rate of 0.001 with a decay in a plateau learning region. Binary crossentropy was used as a loss function. One of the defect region as were hold out as a test case, and the rest of the dataset is used for training the model. Figure 4 shows the probability scores from the loss layer overlaid on the actual score (mask), along the profile. It can be observed that the Conv1D network is able to accurately represent all defects successfully.

1D-Convolutional Neural Network for Multi-Output Predictions
To characterize the defect properties such as the volume and depth [see Table 1], previous described CNN architecture is extended to include multiple outputs, for depth and volume prediction as well as Figure 4: The plot shows the probability score from the cross entropy loss plotted as a function of the number of points and overlaid with the real score, shows spikes in probability at the defect locations for all of the defects, including the hold-out test dataset. The few false positives can be addressed by training on more data. the mask (binary classification). The prediction accuracy for the defect characteristics with reasonable accuracy close to 90% for the depth and volume, as well as the mask [refer to Figure 5]. These results can be used to build a ML model that maximizes the information from datasets like in this study with limited labels for defect by using other defect properties to improve predictions (similar outcome to [21]).

Scalability
A limited Proof-of-Concept (PoC) in using Machine Learning for LSM data anomaly detection using multi-modal, lab experimental dataset were developed. However, the biggest adoption for industrial adoption of this ML approach would be the issue of scalability to large datasets coming from the field. For example, the lab experiments have about 10000 profile points, while 1-100 million profile points are captured in field acquisitions. Therefore it is important to address the time complexity performance of all these methods. A comparison of the different successful methods described in the previous section (and is shown in Table 4 in the Appendix section) identifies the slow performing methods, mainly SVC.

Algorithmic enhancement of slower methods
SVC is a robust method especially for binary classification tasks [15], however the current design of the algorithm in scikit-learn library is optimized for CPU without parallelization. Therefore as can be seen from Table 3 SVC performs poorly when facing large datasets. RAPIDS AI cuML library ( [26]) is an open-source software that enables efficient use of GPUs via CUDA and numba [27] for end-to-end machine learning pipelines. For large field datasets (bigger than 100000 data points), a 200x speedup over the scikit-learn implementation of SVC is achieved as seen in Table 3.

Conclusions and Future Work
Large Stand-Off Magnetometry (LSM) is a promising approach that records magnetic fluxes for nondestructive and non-intrusive inspection of pipelines. We developed a proof-of-concept with limited lab experimental data, that leverages state-of-the-art machine learning algorithms for the purpose of identifying anomalous magnetic flux readings associated with the regions of defect. These datasets are multi-modal, multi-sensor sequential data that need several pre-processing steps before applying the ML methods. An approximate multi-sequence alignment technique based on Dynamic Time Warping (DTW) scalable to real-world applications was implemented. A sequence of models with increasing levels of complexity to refine on defect detection capabilities were studied. "Point-based" methods in an unsupervised and supervised learning setting performed well with high accuracy on test/train data and low false positives. A customized Convolutional Neural Networks (CNN) with 1D convolutional filters was built that reflects the dimensionality of magnetic measurements. In addition to identifying anomalies, defect properties were characterized by using a multi-output CNN network with reasonable success. The scalability of these models from research lab experiments to field data showed some slow performing methods, and a 200x speedup of RBF-kernel SVC was achieved using RAPIDS AI cuML library on 32GB V100 GPUs. One way to improve the model performance would be to augment the datasets using the principles of magneto-restriction.

Appendix
In this section, we present the supporting results referring to the text in the main document. Figure 6: The magnetic field readings are obtained using an instrument with sensors and channels. The first scan refers to the pipe scan with an initial configuration, the second scan refers to the scan with shifted configuration. Figure (a) shows the differences between both scans taken from the same sensor. We clearly see some phase shift between the peaks and regions of major misalignment, have been highlighted. Figure (b) Upon applying the FastDTW algorithm, we are able to reduce the phase shift and thereby align the two scans appropriately, as a preprocessing step.