Submitted:
30 January 2024
Posted:
30 January 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Data
2.2. Network and Training
- Robustness in Drone Category Classification: The distinction between sounds from different drone categories is more subtle compared to the distinction between drone and non-drone sounds. By first excluding ’no drone’ sounds, the training of the drone category classifier becomes more robust, concentrating solely on the nuanced differences between the drone classes.
- Efficiency in Operational Deployment: Considering that in a real-world deployment scenario, drone sounds are expected to be less frequent than non-drone sounds, a cascaded approach is more practical. Initially, the system continuously monitors for the presence of drones. Once a drone is detected, the second classifier steps in to determine its category. This method is particularly beneficial when integrating neuromorphic technology like SynSense’s Xylo [18], which offers binary classifications with ultra-low power consumption. In a drone detection system, detecting a drone could trigger a wake-up process for a more power-intensive unit (e.g., an Arduino) to perform the detailed classification and relay the detection to the cloud via a 5G connection.
2.3. Augmentation Techniques and Data Preperation
- Pitch Shifting: The pitch of drone sounds was altered without changing the playback speed, simulating variations in drone motor speeds. In the Matlab code, the applyCustomPitching function (see Appendix 1) shifts the pitch randomly within a specific range, broadening the classifier’s ability to recognize different drone sounds.
- Adding Delay: The applyDelay function (see Appendix 2) introduces a time delay to the original sound. This delay, varied in length and amplitude, mimics echo effects in various environments, enhancing the model’s adaptability to different acoustic settings.
- Environmental Noise: The applyEnvironmentalNoise function (see Appendix 3) was used to mix ambient noises into the drone sounds. Ambient sounds, sourced from diverse environments, were employed to train the model in differentiating drone noises from background sounds in real-world scenarios.
- Harmonic Distortion: The applyHarmonicDistortion function (see Appendix 4) was utilized to add harmonic distortions, simulating the effect of sound traveling through different media. This technique challenges the model to maintain accuracy in complex acoustic landscapes.
3. Results
3.1. ’Drone’ vs. ’no Drone’ classification
3.2. Drone class classification without augmentation
3.2.1. Classifier Variability and Stochastic Processes
3.2.2. Real-World Performance of Non-Augmented Classifiers
3.3. Single augmentation techniques
3.3.1. Harmonic Distortions
3.3.2. Environmental Noise
3.3.3. Pitching
3.3.4. Echos
3.4. Combined augmentation techniques
-
Effectiveness in Lower Drone Categories (C0 and C1):
- The augmentation techniques, particularly variations in maxPitch, maxNoise, and distLevel, showed promising results in the lower drone categories (C0 and C1).
- Specific augmentations led to a noticeable improvement in accurately classifying these categories, including when considering the neighboring class hits.
- The classifiers generally demonstrated an acceptable level of accuracy for C0 and C1 categories, suggesting that the chosen augmentation methods were effective for smaller drones.
-
Challenges in Higher Drone Categories (C2 and C3):
- In contrast, the classification of larger drones (C2, and especially C3) was less successful. The same augmentation techniques that benefited lower categories did not translate well to these higher categories.
- The accuracy, even when considering hits in the neighboring classes, was not deemed acceptable for C2 and was particularly lacking for C3.
- This indicates a need for further refinement in the augmentation approach or possibly the development of distinct strategies tailored to larger drone categories.
4. Discussion
4.1. Interpretation of Findings
- Influence of Random Processes in Model Training: In this study, the influence of random processes during the training process, such as the initialization of weights, on the results has been importantly highlighted. It has been strongly observed that the setting of a fixed seed for random number generation significantly reduces variability in ML model training outcomes. This contributes to more reliable and reproducible results, emphasizing the necessity of managing random initialization effects in ML experiments. Consequently, the employment of fixed seeds is advocated as a best practice for achieving consistency in ML model performance. It is recommended for future studies to utilize fixed seeds for random number generators, standardizing weight initialization and the selection of mini-batches. This approach ensures more consistent outcomes and interpretations in ML research.
- Impact of Data Augmentation: The results demonstrate that different data augmentation techniques – specifically pitch shifting, time delay, harmonic distortion, and ambient noise – improve the classifier’s performance adapted to real situations. The study underscores that each augmentation method affects specific aspects of drone detection and thus needs to be carefully tailored to the dataset in question.
- Harmonic Distortion: Introducing harmonic distortions helped the system simulate the effects of sound waves traveling through different mediums, improving accuracy in complex acoustic environments.
- Inclusion of Ambient Noise: Adding ambient noises to the training data helped prepare the model to distinguish drone sounds against the backdrop of everyday noises. However, it was found that too much noise can impair the classifier’s performance.
- Pitch Shifting: Adjusting the pitch within a certain range enabled the system to be more flexible in responding to variations in drone motor sounds. The results indicate that slight adjustments in pitch can enhance detection accuracy, noting that excessive changes can be counterproductive.
- Time Delay and Echo Effects: Time delays were introduced in the audio signal to mimic echo effects in various environments, resulting in increased adaptability of the model to different acoustic conditions. Recent experiments with random delay augmentation have emphasized the complexity of simulating real-world echo conditions. Although the classifier’s accuracy did not notably improve with this method, its application in real-world scenarios, characterized by widely varying echo characteristics, demonstrated an increased ability to correctly identify drone classes. This indicates the importance of incorporating a broad range of delay parameters to capture the variety of possible real-world echo conditions.
- Combined Application of Techniques: The combination of these augmentation techniques yielded mixed results. Generally, all drone recordings were disproportionately classified into drone classes C0 and C1. A clear tendency on how the parameters for the various techniques should be chosen for simultaneous augmentation was not discernible.
4.2. Theoretical and practical implications
4.3. Limitations and Challenges
4.4. Future Research
- Realistic Validation: Using outdoor measurement data allows for a more realistic and meaningful assessment of classifiers, helping to evaluate the models’ ability to operate accurately under various real conditions.
- Data Quality Improvement: By selectively filtering out irrelevant sections, the validation data can gain in quality, leading to more accurate assessments of model performance.
- Efficiency Enhancement: The use of the ’Drone’ vs. ’No Drone’ classifier for preprocessing can significantly speed up and simplify the data preparation process.
- Insight for Model Improvements: The results from the validation can provide insights into necessary model improvements, particularly in terms of robustness against environmental noises and other variable factors.
5. Conclusions
- Effectiveness of Data Augmentation: Various data augmentation techniques, such as pitch shifting, time delay, harmonic distortion, and ambient noise incorporation, have been demonstrated to significantly enhance the classifier’s accuracy. These techniques have been shown to enable the system to adapt to diverse acoustic environments, effectively identifying and categorizing drone sounds amidst a variety of background noises.
- Optimization of Augmentation Techniques: The study reveals that each augmentation method impacts specific aspects of drone sound detection. For instance, moderate levels of pitch shifting and harmonic distortion were found to be most effective, while excessive application of these techniques could lead to reduced performance or counterproductive results. Additionally, introducing time delays and ambient noises at controlled levels improved the model’s robustness and adaptability.
- Classifier Performance and Reproducibility: The research highlighted the critical role of random processes in ML model training. Variability in the performance of classifiers, even under identical parameter settings, underscores the importance of ensuring consistent initialization of initial weights and the selection of mini-batches. Future research should prioritize standardizing these aspects to achieve more reliable and reproducible outcomes.
- Practical Applicability and Future Directions: While the current ML-based classifier demonstrates significant potential in security and airspace management, complying with EU drone categorization regulations, further refinement is required for optimal performance in classifying drone categories. General drone detection using ML has proven effective, yet precise categorization of drones into specific classes as per EU standards demands additional research. Future studies should focus on exploring advanced optimization algorithms and experimenting with diverse parameter combinations. This exploration will be critical for enhancing the accuracy of drone noise classification systems, particularly in accurately identifying and classifying drones into distinct regulatory categories. Continued research in this direction will not only improve the reliability of drone detection systems but also ensure their compliance with evolving regulatory frameworks, thereby bolstering their practical applicability in various real-world scenarios.
- Contribution to UAV Detection Technology: Significant contributions have been made to the field of UAV detection technology through this research. The establishment of a comprehensive database encompassing 44 different drone models provides a solid foundation for the continued development and training of ML algorithms in this domain. The demonstrated capability to classify drones into distinct categories (C0 to C3) in accordance with EU regulations underlines the practical applicability and relevance of the system in meeting both current and emerging requirements in drone security.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| EU | European Union |
| FFT | Fast Fourier Transform |
| MFCC | Mel-Frequency Cepstral Coefficients |
| ML | Machine Learning |
| UAV | Unmanned Aerial Vehicles |
Appendix A
Appendix A.1. Source code listings




References
- Gatwick Airport drone attack: Police have ’no lines of inquiry’. Available online: BBC News (accessed on 02 January 2024).
- Knoedler, B.; Zemmari, R.; Koch, W. On the detection of small UAV using a GSM passive coherent location system. In Proceedings of the 17th International Radar Symp. (IRS), Krakow, Poland, Date of Conference (05-2016). https://ieeexplore.ieee.org/document/7497375/.
- Nguyen, P.; Ravindranatha, M.; Nguyen, A.; Han, R.; Vu, T. Investigating Cost-effective RF-based Detection of Drones. Proceedings of the 2nd Workshop on Micro Aerial Vehicle Networks, Systems, and Applications for Civilian Use, Singapore, Singapore, 26-06-2016. [CrossRef]
- Shi, X.; Yang, C.; Xie, W.; Liang, C.; Shi, Z.; Chen, J. Anti-Drone System with Multiple Surveillance Technologies: Architecture, Implementation, and Challenges. IEEE Commun. Mag. 2018, 56, 68–74. [Google Scholar] [CrossRef]
- Utebayeva, D.; Ilipbayeva, L.; Matson, E.T. Practical Study of Recurrent Neural Networks for Efficient Real-Time Drone Sound Detection: A Review. Drones 2022, 7, 26. [Google Scholar] [CrossRef]
- Al-Emadi, S.; Al-Ali, A.; Al-Ali, A. Audio-Based Drone Detection and Identification Using Deep Learning Techniques with Dataset Enhancement through Generative Adversarial Networks. Sensors 2021, 21, 4953. [Google Scholar] [CrossRef] [PubMed]
- Dumitrescu, C.; Minea, M.; Costea, I.M.; Cosmin Chiva, I.; Semenescu, A. Development of an Acoustic System for UAV Detection. Sensors 2020, 20, 4953. [Google Scholar] [CrossRef] [PubMed]
- Jeon, S.; Shin, J.-W.; Lee, Y.-J.; Kim, W.-H.; Kwon, Y.-H.; Yang, H.-Y. Empirical Study of Drone Sound Detection in Real-Life Environment with Deep Neural Networks. arXiv 2017. [Google Scholar] [CrossRef]
- Park, S.; Kim, H.-T.; Lee, S.; Joo, H.; Kim, H. Survey on Anti-Drone Systems: Components, Designs, and Challenges. IEEE Access 2021. [Google Scholar] [CrossRef]
- Kümmritz, S.; Paul, L. Comprehensive Database of Drone Sounds for Machine Learning. Proceedings of the Forum Acusticum 2023, Turino, Italy, 11-09-2023, 667–674. https://dael.euracoustics.org/confs/fa2023/data/articles/000049.pdf. [CrossRef]
- Easy Access Rules for Unmanned Aircraft Systems (Regulations (EU) 2019/947 and 2019/945). Available online: https://www.easa.europa.eu/en/document-library/easy-access-rules/easy-access-rules-unmanned-aircraft-systems-regulations-eu (accessed on 05-01-2024).
- Marcus, G. Deep Learning: A Critical Appraisal. arXiv 2018. [Google Scholar] [CrossRef]
- Nanni, L.; Maguolo, G.; Paci, M. Data augmentation approaches for improving animal audio classification. Ecological Informatics 2020, 57, 101084. [Google Scholar] [CrossRef]
- Oikarinen, T.; Srinivasan, K.; Meisner, O.; Hyman, J.B.; Parmar, S.; Fanucci-Kiss, A.; Desimone, R.; Landman, R.; Feng, G. Deep convolutional network for animal sound classification and source attribution using dual audio recordings. J. Acoust. Soc. Am. 2019, 145, 654–662. [Google Scholar] [CrossRef] [PubMed]
- GitHub Repository, H2 Think gGmbH, DroneClassifier. Available online: https://github.com/H2ThinkResearchInstitute/DroneClassifier (accessed on 05-01-2024).
- GitHub Repository, tensorflow, models, vggish. Available online: https://github.com/tensorflow/models/tree/master/research/audioset/vggish (accessed on 03-01-2024).
- Shi, L.; Ahmad, I.; He, Y.-J.; Chang, K.-H. Hidden Markov model based drone sound recognition using MFCC technique in practical noisy environments. J. Commun. Netw. 2018, 20, 509–518. [Google Scholar] [CrossRef]
- Xylo: Ultra-low power neuromorphic chip | SynSense. Available online: https://www.synsense.ai/products/xylo/ (accessed on 05-01-2023).
- Branding, J.; Von Hörsten, D.; Wegener, J.K.; Böckmann, E.; Hartung, E. Towards noise robust acoustic insect detection: from the lab to the greenhouse. KI - Künstliche Intelligenz 2023. [Google Scholar] [CrossRef]
- Sharma, G.; Umapathy, K.; Krishnan, S. Trends in audio signal feature extraction methods. Applied Acoustics 2020. [Google Scholar] [CrossRef]








| Category | Description |
|---|---|
| C0 | Drones weighing less than 250 grams, typically for leisure and recreational use. |
| C1 | Small drones weighing less than 900 grams, used for both recreational and commercial purposes, with more features than C0 drones. |
| C2 | Drones weighing less than 4 kilograms, used for complex commercial operations, requiring advanced operational skills. |
| C3 | Larger drones weighing less than 25 kilograms, generally used for specialized commercial tasks demanding specific capabilities. |
| Seed | True Class | Predicted Class | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| C0 | C1 | C2 | C3 | mean | C0 | C1 | C2 | C3 | mean | ||
| no | 91.4% | 95.3% | 96.7% | 98.0% | 96.9% | 94.0% | 95.6% | 97.1% | |||
| no | 96,6% | 95,0% | 96,7% | 98,4% | 94,4% | 98,1% | 96,2% | 96,6% | |||
| no | 96,6% | 92,7% | 95,6% | 97,8% | 92,0% | 95,6% | 95,9% | 97,3% | |||
| no | 97,3% | 96,7% | 97,5% | 97,2% | 97,4% | 96,7% | 97,4% | 97,3% | |||
| std | 2,4% | 1,4% | 0,7% | 0,4% | 1,2% | 2,2% | 1,5% | 0,7% | 0,3% | 1,2% | |
| 1 | 96,5% | 98,2% | 97,7% | 99,2% | 98,1% | 98,0% | 97,5% | 98,5% | |||
| 1 | 98,1% | 98,5% | 97,0% | 98,7% | 95,3% | 98,3% | 98,3% | 99,0% | |||
| 1 | 92,7% | 98,2% | 97,4% | 99,1% | 96,3% | 97,0% | 97,3% | 97,6% | |||
| 1 | 97,1% | 97,6% | 96,5% | 99,2% | 95,7% | 98,2% | 97,5% | 97,5% | |||
| std | 2,0% | 0,3% | 0,5% | 0,2% | 0,8% | 1,1% | 0,5% | 0,4% | 0,6% | 0,6% | |
| 2 | 96,8% | 96,0% | 97,7% | 98,1% | 94,6% | 96,5% | 97,9% | 99,0% | |||
| 2 | 95,5% | 98,2% | 97,9% | 99,0% | 98,5% | 97,2% | 97,6% | 98,3% | |||
| 2 | 97,0% | 96,6% | 97,9% | 99,2% | 96,0% | 97,7% | 97,8% | 98,6% | |||
| 2 | 93,3% | 98,3% | 97,0% | 98,0% | 98,7% | 94,4% | 97,6% | 98,0% | |||
| std | 1,5% | 1,0% | 0,4% | 0,5% | 0,8% | 1,7% | 1,3% | 0,1% | 0,4% | 0,9% | |
| 3 | 95,5% | 96,7% | 96,2% | 99,1% | 97,3% | 96,6% | 96,5% | 97,0% | |||
| 3 | 94,8% | 97,6% | 97,3% | 98,7% | 97,5% | 97,1% | 97,1% | 97,2% | |||
| 3 | 96,2% | 95,7% | 97,3% | 99,3% | 97,4% | 97,8% | 96,0% | 97,7% | |||
| 3 | 96,4% | 96,4% | 96,5% | 98,4% | 96,4% | 96,2% | 96,5% | 97,8% | |||
| std | 0,6% | 0,7% | 0,5% | 0,3% | 0,5% | 0,4% | 0,6% | 0,4% | 0,3% | 0,4% | |
| Distortion-Level | C0 | C1 | C2 | C3 | Mean | Std |
|---|---|---|---|---|---|---|
| 0% | 95.0% | 97.7% | 98.0% | 98.9% | 97.4% | 1.7% |
| 7% | 97.2% | 94.1% | 96.2% | 98.5% | 96.5% | 1.9% |
| 14% | 95.9% | 96.7% | 95.3% | 99.1% | 96.8% | 1.7% |
| 21% | 95.1% | 91.5% | 93.2% | 98.9% | 94.7% | 3.2% |
| 28% | 93.3% | 89.1% | 94.4% | 98.5% | 93.8% | 3.9% |
| 35% | 92.5% | 92.9% | 94.8% | 97.9% | 94.5% | 2.5% |
| 42% | 88.7% | 96.1% | 94.1% | 98.8% | 94.4% | 4.3% |
| 49% | 89.6% | 89.9% | 94.2% | 97.7% | 92.9% | 3.9% |
| 56% | 87.6% | 92.6% | 95.0% | 96.9% | 93.0% | 4.0% |
| 63% | 90.0% | 86.8% | 93.8% | 92.1% | 90.0% | 3.0% |
| maxNoise | C0 | C1 | C2 | C3 | Mean | Std |
|---|---|---|---|---|---|---|
| 0% | 95.3% | 96.9% | 97.6% | 98.0% | 97.0% | 1.2% |
| 8% | 96.7% | 96.2% | 98.3% | 98.8% | 97.5% | 1.2% |
| 16% | 93.3% | 92.1% | 97.4% | 98.6% | 95.4% | 3.1% |
| 24% | 95.4% | 95.9% | 97.3% | 98.8% | 96.9% | 1.5% |
| 32% | 78.3% | 93.6% | 96.9% | 99.5% | 92.1% | 9.5% |
| 40% | 64.7% | 92.4% | 96.3% | 98.7% | 88.0% | 15.8% |
| 48% | 76.2% | 88.0% | 97.5% | 81.0% | 85.7% | 9.3% |
| 56% | 76.1% | 90.5% | 90.9% | 99.4% | 89.2% | 9.7% |
| 64% | 67.1% | 90.3% | 95.9% | 93.4% | 86.7% | 13.2% |
| 72% | 50.9% | 89.2% | 96.9% | 81.4% | 79.6% | 20.2% |
| maxPitch | C0 | C1 | C2 | C3 | Mean | Std |
|---|---|---|---|---|---|---|
| 0 | 96.6% | 98.2% | 97.6% | 99.2%Ï | 97.9% | 1.1% |
| 0.2 | 93.7% | 97.0% | 98.2% | 99.0% | 97.0% | 2.3% |
| 0.4 | 95.4% | 96.9% | 97.3% | 99.4% | 97.3% | 1.7% |
| 0.6 | 94.2% | 95.2% | 95.9% | 97.6% | 95.7% | 1.4% |
| 0.8 | 92.3% | 97.2% | 98.1% | 98.9% | 96.6% | 3.0% |
| 1.1 | 94.1% | 94.2% | 94.9% | 98.0% | 95.3% | 1.8% |
| 1.4 | 92.3% | 95.1% | 96.1% | 98.8% | 95.6% | 2.7% |
| 1.7 | 93.0% | 94.1% | 97.0% | 97.8% | 95.5% | 2.3% |
| 2.1 | 94.6% | 96.8% | 95.5% | 97.4% | 96.1% | 1.3% |
| 2.5 | 92.9% | 93.0% | 96.8% | 98.4% | 95.3% | 2.8% |
| Delay Duration |
Delay Amplitude |
C0 | C1 | C2 | C3 | Mean | Std |
|---|---|---|---|---|---|---|---|
| 15 ms | 30% | 97.9% | 96.2% | 97.3% | 98.8% | 97.6% | 1.1% |
| 15 ms | 50% | 96.7% | 95.4% | 97.6% | 98.8% | 97.1% | 1.4% |
| 15 ms | 70% | 94.7% | 96.9% | 96.8% | 98.5% | 96.7% | 1.6% |
| 15 ms | 90% | 94.6% | 96.1% | 97.3% | 98.3% | 96.6% | 1.6% |
| 18 ms | 30% | 94.1% | 94.4% | 96.6% | 99.1% | 96.1% | 2.3% |
| 18 ms | 50% | 93.3% | 97.0% | 97.6% | 98.7% | 96.7% | 2.3% |
| 18 ms | 70% | 88.4% | 97.4% | 97.6% | 98.9% | 95.6% | 4.8% |
| 18 ms | 90% | 94.9% | 97.3% | 96.9% | 98.7% | 97.0% | 1.6% |
| 21 ms | 30% | 96.6% | 96.7% | 96.1% | 97.7% | 96.8% | 0.7% |
| 21 ms | 50% | 95.5% | 97.2% | 97.8% | 99.1% | 97.4% | 1.5% |
| 21 ms | 70% | 95.0% | 96.2% | 97.1% | 98.3% | 96.7% | 1.4% |
| 21 ms | 90% | 92.1% | 95.3% | 97.9% | 99.5% | 96.2% | 3.2% |
| 24 ms | 30% | 94.3% | 97.6% | 97.4% | 98.6% | 97.0% | 1.9% |
| 24 ms | 50% | 95.2% | 95.7% | 97.1% | 98.6% | 96.7% | 1.5% |
| 24 ms | 70% | 92.5% | 96.9% | 95.9% | 98.7% | 96.0% | 2.6% |
| 24 ms | 90% | 93.7% | 94.1% | 96.6% | 98.8% | 95.8% | 2.4% |
| 27 ms | 30% | 93.5% | 97.5% | 97.0% | 98.5% | 96.8% | 2.2% |
| 27 ms | 50% | 93.3% | 95.8% | 96.9% | 98.8% | 96.2% | 2.3% |
| 27 ms | 70% | 93.7% | 97.0% | 96.2% | 98.3% | 96.3% | 1.9% |
| 27 ms | 90% | 94.6% | 94.5% | 96.8% | 99.2% | 96.3% | 2.2% |
| random | 92.9% | 97.4% | 96.7% | 99.3% | 96.6% | 2.7% | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).