A machine learning approach for detecting wind farm noise amplitude modulation

Duc Phuc Nguyen, a Kristy Hansen, Bastien Lechat, Peter Catcheside, and Branko Zajamsek College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia Adelaide Institute for Sleep Health, Flinders University, Adelaide, SA 5042, Australia Amplitude modulation (AM) is a characteristic feature of wind farm noise and has 1 the potential to contribute to annoyance and sleep disturbance. This study aimed to 2 develop an AM detection method using a random forest approach. The method was 3 developed and validated on 6,000 10-second samples of wind farm noise manually 4 classified by a scorer via a listening experiment. Comparison between the random 5 forest method and other widely-used methods showed that the proposed method 6 consistently demonstrated superior performance. This study also found that a com7 bination of low-frequency content features and other unique characteristics of wind 8 farm noise play an important role in enhancing AM detection performance. Taken 9 together, these findings support that using machine learning-based detection of AM 10 is well suited and effective for in-depth exploration of large wind farm noise data sets 11 for potential legislative and research purposes. 12

AM detection method of peak and masking level, as a predictor of AM occurrence. The main advantage of these 34 engineering methods is the ease of their implementation and computational speed, which 35 makes them suitable for automated analysis of large data sets (Conrady et   forest classification algorithm (Breiman, 2001). We trained and tested this new method was 48 trained and tested on human-scored data sets (hereafter referred to as the benchmark data The data set used for development and validation of the AM detection method contained 56 WFN measured at four residences (H1-H4) located between 980 m and 3.5 km from the 57 nearest wind turbine of South Australian wind farms ( Supplementary Fig. S1). Residence 58 H4 was unoccupied and located approximately 30 km from the nearest wind farm, and thus 59 it was assumed that AM WFN did not exist at this location. Noise data were measured for 60 one year at locations H1 and H2 and two weeks and five months at locations H3 and H4, 61 respectively. The H3 data set also contained approximately three days of measurements of 62 background noise when the wind farm was not operating. This data set together with the 63 H4 data set were used for false positive rate evaluations.

64
At all measurement locations, acoustic data were acquired using a Bruel and Kajer LAN-

AM detection method
These data sets were selected randomly from recorded data ( Supplementary Fig. S2). The

74
WFN benchmark data set was primarily scored by a single scorer using a validated rating 75 experiment procedure based on detection theory (Macmillan and Creelman, 2004 The AM f actor, the maximum spectrum amplitude between 0.6 Hz and 1 Hz, is then used to Optimisation of hyperparameters, that is parameters which are set before the learning  AM detection method features extracted from the other automated AM detection methods described in Section C. The aggregate metric of M CC is a more informative and faithful score of overall classifi-149 cation performance compared to common metrics such as accuracy or F 1-score (Chicco and 150 Jurman, 2020). The M CC ranges from -1 (classification is always wrong) to 0 (classification 151 is no better than random guess) to 1 (classification is always correct), and it is calculated 152 as follows The benchmark data set of 6,000 10-second audio files was unbalanced with around 40% 171 of audio samples containing AM (Fig. 2a) were also nearly uniform, consistent with ecological validity (Fig. 2b).  predicting AM are AM f actor, SpectralCrest, dif f LCLA and P R (Fig. 3d). The performance of the random forest-based AM detection method was compared to 195 three automated detectors (a1-a3) on precision-recall plots (Fig. 4a). The test set for  AM detection method performance of a1-a3 was poor with the mean AU P RC ranging from 0.43 to 0.55 (Table   200 II). The performance of a1 was better than a2 and a3 (all P < 0.001), and a2 performed 201 better than a3 (P < 0.001). The performance of AM detection algorithms has previously been described in terms of

219
In contrast, the optimal thresholds for method a2 and a3 were lower than original suggested presented here achieved substantial improvements in performance compared to previous 257 methods. 258 Very high false positive rates were found for methods a1-a3, which is inconsistent with where weather conditions and topography near wind farms will inevitably to vary. Although 281 the reliability of human scoring has been tested, using a single scorer to classify the AM 282 is not ideal. As suggested by Wendt et al. (2015), two or more scorers and a consensus 283 scoring approach may be preferable to a single scorer to help ensure broader generalisability.

AM detection method
Nevertheless, a single scorer is more practical and avoids potential effects of poor inter-scorer 285 agreement. Also, good inter-scorer agreement was found in a smaller subset of the data, 286 supporting this approach.

287
Although detector a1 clearly warrants improvements in order to increase accuracy, the In summary, this study demonstrates that random forest-based AM detection is a good 297 approach for AM classification, and substantially outperforms traditional AM detection 298 methods to achieve classification performance close to that of humans. It was also shown 299 that a simplified classifier based on a single decision tree using the four main features iden-300 tified through the random forest approach also achieves good classification performance.

301
This approach is readily interpretable and easy to implement without the need for extensive 302 computer resources. Finally, it is important to stress that the main aim for developing an 303 improved AM detection algorithm was to better understand the characteristics of this phe-304 nomenon, and thus algorithm performance was prioritized above algorithm simplicity and 305