Anomaly detection in particulate matter sensor using hypothesis pruning generative adversarial network

The World Health Organization provides guidelines for managing the particulate matter (PM) level because a higher PM level represents a threat to human health. To manage the PM level, a procedure for measuring the PM value is first needed. We use a PM sensor that collects the PM level by laser‐based light scattering (LLS) method because it is more cost effective than a beta attenuation monitor‐based sensor or tapered element oscillating microbalance‐based sensor. However, an LLS‐based sensor has a higher probability of malfunctioning than the higher cost sensors. In this paper, we regard the overall malfunctioning, including strange value collection or missing collection data as anomalies, and we aim to detect anomalies for the maintenance of PM measuring sensors. We propose a novel architecture for solving the above aim that we call the hypothesis pruning generative adversarial network (HP‐GAN). Through comparative experiments, we achieve AUROC and AUPRC values of 0.948 and 0.967, respectively, in the detection of anomalies in LLS‐based PM measuring sensors. We conclude that our HP‐GAN is a cutting‐edge model for anomaly detection.


13
World Health Organization (WHO) recommends to managing the provides Particulate Matter 14 (PM) level with providing their guideline as shown in Table 1 because it can infiltrate into the deeper 15 site of humans via respiratory organ [1]. For detail, the PM can trigger not only respiratory diseases [2] 16 but also cardiovascular disease [3], lung cancer [4], or some other diseases.  [1]. The unit of each value is µg/m 3 . When the diameter of PM is 2.5µg/m 3 it called PM 2.5 , and when it between 2.5µg/m 3 and 10µg/m 3 , called PM 10 .
Annual mean 24-hour mean PM 2.5 10 25 PM 10 20 50 process than the BAM-based sensor because it more affected by the external environment; we call 36 overall malfunctions as an anomaly. Thus, we propose the novel architecture for anomaly detection 37 to maintain a TEOM-based sensor. Because, if an anomaly detecting solution is provided with the 38 TEOM-based sensor, the efficiency of the PM level monitoring cost can be made more effective.

39
The organization of this paper is described as follows. In Section 2, we summarize previous studies 40 that have efforted for anomaly detection. We present the proposed architecture and experimental 41 results in Section 3 and Section 4 respectively, and conclude the whole content in final section. We only Figure 1. The examples of generating the output via 1D and 2D convolution. In 2D convolution case, the zero-padding method is needed for aggregating time information with maintaining feature dimension.
In order to minimize the information loss for each channel, our purpose is only aggregating the 93 time information via a convolutional layer like a recurrent neural network [30]. In the case of 1D 94 convolution, time information can be summarized while maintaining channel information in natural.

95
However, if the neural network constructed with 2D convolution, the channel information of the input 96 data will be reduced unintentionally. For avoiding this problem the zero-padding can be used as 97 shown in Figure 2, for maintaining the channel dimension, but the last channel of the generated output 98 probably has less information than the front channels. Thus, we adopt the 1D convolutional layer for 99 constructing the neural network. We have already summarized related works in Section 2. UUsing the variational bound with 102 distribution assumption causes generating blurred the sample. Also, the AnoGAN-like architecture has lower throughput because it needs a matching procedure for finding the most closer generated 104 sample to input data. The helpful thing is multiple hypothesis that can produce more relevant results

105
and can work more robustly [31][32][33]. 106 We construct the neural network with reference to the concept of GANomaly as a backbone 107 architecture because it can ease the several mentioned limitations of previous anomaly detection 108 models. We also apply the multiple hypothesis method in the last layer of the generator. Because the 109 multiple hypothesis can utilize to maintain the quality of the output consistently via selecting the best 110 output. We name the procedure of selecting the best output as hypothesis pruning.

111
The multiple hypothesis generates several samples from the input data directly and generates one 112 additional sample from the random noise. Each generated sample is used as a hypothesis respectively 113 but the samples that are not in the best case will be used as a regularization term. The overall 114 architecture of the neural network that we proposed is shown in Figure 2. .
For optimizing the parameters at once, we summarize the three losses L enc , L gen , and L adv with 122 each weighting coefficient w enc , w gen , and w adv . The weighting coefficients are set with values 1, 50, 123 and 1 respectively. These coefficients are referenced from GANomaly [20] and they can be modified by 124 the hyperparameter tuning process. We conduct the hyperparameter tuning in Section 4.4, but finding 125 the best weighting coefficient is not covered in this paper; we only finding the best kernel size, number 126 of the convolutional block, and learning rate. We describe the anomaly detection procedure in this section. The abnormality decision method 131 is simple as shown in Algorithm 2. For using the above algorithm, input data must be measured via 132 PM sensor firstly with containing two channels; PM 2.5 and PM 10 . Then, the generation procedure is 133 conducted via input the data to the neural network. We set the decision boundary θ using µ and σ 134 of the training data as shown in Equation 9. The µ and σ represent the mean and standard deviation 135 of the mean square error between the input and the best hypothesis of the generated samples. For 136 reference, if the user wants to change the sensitivity or specificity of the anomaly detection, θ can be 137 adjusted. Library, Korea. For collecting data, we use the TEOM-based sensor as shown in Figure 3 and the 146 collected dataset is shown in Table 2.  max-pooling is applied at the last of each convolutional block. We use two fully connected layers for

172
We construct HP-GAN by referring to previous studies. HP-GAN, the novel unproven 173 architecture, needs to confirm that it can work better than other architectures. We compose the The Table 4 quantitatively shows that VBMH-GAN has better AUROC, AUPRC than the other.

187
However, as shown in Figure 5, the qualitative result, the generated sample via LM-GAN (GANomaly)  In this experiment, we confirm that the quantitative result represents that the VBMH-GAN is 191 the best architecture, but the qualitative result of them is not good. Thus, we need to conduct more 192 experiment for verifying the above architectures using varied hyperparameters, because each model 193 may need the specific hyperparameter for performing much better.

199
We use the kernel size, number of convolutional blocks, and learning rate as the hyperparameter. layer same as commented in Section 4.2. We compose the 108 kinds of the hyperparameter set for 202 experiment via combining three hyperparameters as shown in Table 5.
203 Table 5. The hyperparameters for the experiment. We use the grid search method for hyperparameter tuning [? ]. The number of convolutional blocks is abbreviated as # of conv-block.

Hyperparameter Values
Kernel size 3, 5, 7, 9, 11, and 13 # of conv-block 2, 3, and 4 Learning rate 5e-3, 1e-3, 5e-4, 1e-4, 5e-5, and 1e-5 When the case of large kernel sizes such as 13, the time axis dimension of the feature vector can be 204 smaller than the kernel of the last layer. However, the amount of the feature information and balance 205 between feature vectors is not changed by the above situation, so we also use the large size kernels. 206 We present the measured performance for each hyperparameter with surface form in Figure 6 and 207 7. The height of the surface represents the anomaly detection ability; the higher height means higher 208 performance. The flatten surface indicates that the neural network stably responds to hyperparameter 209 changes.

210
When the number of the convolutional block is two or three, the surfaces for each architecture 211 look like similar and relatively flatten then four-block cases. However, HP-GAN shows the most high 212 and flatten surface in the four convolutional block case. 213 We also confirm that HP-GAN shows generally flatten surface in the AUPRC surfaces. Thus, 214 we can conclude that HP-GAN may perform stably for anomaly detection than other architectures.

215
However, we additionally conduct the experiment for confirming the average performance with the 216 best hyperparameter of each model. The best hyperparameter set summarized in Table 6.
217  Table 7, the higher the mean represents better performance. On the other hand, the lower 220 standard deviation means higher stability. The HP-GAN that we proposed in this paper shows the 221 higher performance at every indicator. Moreover, HP-GAN has higher stability (lower standard 222 deviation) at every indicator.