Submitted:
24 September 2025
Posted:
25 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Problem Statement
2.1. Constraints of Conventional Monopole and Dipole Identification Methods Without Prior Source Type Assumptions
2.2. Ill-Posedness of the Inverse Problem
2.3. Proposed Framework of This Paper
3. Detailed Architecture of the ViT Algorithm Model
3.1. Input for ViT: CSM with Positional Encoding
3.2. Design of the ViT-Based Multi-Label Identification Network
| Layer no. | Layer type | Kernel number | Kernel size | Stride | Activation | Padding | Output size |
| 1 | Patch Embedding | 768 | 9×9 | 9×9 | Linear | No | 6x6×768 |
| 2 | Add CLS Token | - | - | - | - | - | (6×6+1)×768 |
| 3 | Add Positional Encoding | - | - | - | - | - | 37×768 |
| 4 | Dropout | - | - | - | - | - | 37×768 |
| 5-16 | Transformer Block x12 | 12 heads | - | - | GELU | - | 37×768 |
| 17 | Layer Norm | - | - | - | - | - | 37×768 |
| 18 | CLS token extract | - | - | - | - | - | 1×768 |
| 19-23 | MLP Classifier×S | 384→3 | Linear layers | GELU | - | S×3 |

3.3. Training Details
4. Simulations
4.1. Simulation Setup and Evaluation Metrics


| Parameter Category | Parameter Value |
| Source Frequency | 3000 Hz |
| Microphone Array | Acoular_modify_array_56.xml |
| Source-to-Array Distance | 3 m |
| Scanning Plane | 2 m × 2 m |
| Grid Spacing | 0.03125 m |
| Grid Resolution | 64 × 64 |
4.2. Simulation Analysis of Case 1

4.3. Simulation Analysis of Case 2



4.4. Simulation Analysis of Case 3



4.5. Summary
5. Experimental Verification
5.1. Experimental Environment and Source Configuration


5.2. Experimental Results and Analysis

6. Conclusions
Funding
References
- Crighton, D.G. “Airframe noise.” In Aeroacoustics of Flight Vehicles: Theory and Practice; Volume 1; pp. 391–447, 1991.
- Merino Martínez, R.; Sijtsma, P.; Snellen, M.; et al. “A review of acoustic imaging methods using phased microphone arrays: Part of the ‘Aircraft Noise Generation and Assessment’ Special Issue.” CEAS Aeronaut. J. 2019, 10(1), 197–230.
- Good, M.D.; Gilkey, R.H. “Sound localization in noise: the effect of signal-to-noise ratio.” J. Acoust. Soc. Am. 1996, 99(2), 1108–1117.
- Russell, D.A.; Titlow, J.P.; Bemmen, Y.-J. “Acoustic monopoles, dipoles, and quadrupoles: an experiment revisited.” Am. J. Phys. 1999, 67(8), 660–664.
- Zhang, Y.; Liu, Y. “Fast Evaluations of Integrals in the Ffowcs Williams–Hawkings Formulation in Aeroacoustics via the Fast Multipole Method.” Acoustics 2023, 5(3), 817–844.
- Sijtsma, P. “Acoustic beamforming for the ranking of aircraft noise.” In Accurate and Efficient Aeroacoustic Prediction Approaches for Airframe Noise, VKI Lecture Series 2013-03; Schram, C., Dénos, R., Lecomte, E., Eds.; von Karman Institute: Rhode-St-Genèse, Belgium, 25–29 March 2013.
- Bouchard, C.; Havelock, D.I.; Bouchard, M. “Beamforming with microphone arrays for directional sources.” J. Acoust. Soc. Am. 2009, 125(4), 2098–2104.
- Suzuki, T. “Identification of multipole noise sources in low Mach number jets near the peak frequency.” J. Acoust. Soc. Am. 2006, 119(6), 3649–3659.
- Chen, W.; Jiang, H.; He, W. “Dipole source based virtual three dimensional imaging for propeller noise. ” Aerosp. Sci. Technol. 2022, 124, 107562. [Google Scholar] [CrossRef]
- Liu, Y.; Dowling, A.P.; Quayle, A.R.; et al. “Beamforming correction for dipole measurement using two dimensional microphone arrays.” J. Acoust. Soc. Am. 2008, 124(1), 182–191.
- Porteous, R.; Prime, Z.; Doolan, C.J.; et al. “Three dimensional beamforming of dipolar aeroacoustic sources. ” J. Sound Vib. 2015, 355, 117–134. [Google Scholar] [CrossRef]
- Suzuki, T. “L1 generalized inverse beam-forming algorithm resolving coherent/incoherent, distributed and multipole sources.” J. Sound Vib. 2011, 330(24), 5835–5851.
- Demyanov, M.; Bychkov, O.; Faranosov, G.; et al. “Development of beamforming methods for uncorrelated dipole sources. In ” In Proceedings of the 7th Berlin Beamforming Conference, Berlin, Germany; 2018. [Google Scholar]
- Pan, X.; Wu, H.; Jiang, W. “Multipole orthogonal beamforming combined with an inverse method for coexisting multipoles with various radiation patterns. ” J. Sound Vib. 2019, 463, 114979. [Google Scholar] [CrossRef]
- Lobato, T.; Sottek, R.; Vorländer, M. “Identification of multipole sources with neural deconvolution.” In Forum Acusticum, Torino, Italy, 2023.
- Ma, W.; Liu, X. “Phased microphone array for sound source localization with deep learning.” Aerosp. Syst. 2019, 2(2), 71–81.
- Raumer, H.-G.; Ernst, D.; Spehr, C. “Compensation of Modeling Errors for the Aeroacoustic Inverse Problem with Tools from Deep Learning.” Acoustics 2022, 4(4), 834–848.
- Goudarzi, A. “Improving the analysis of aeroacoustic measurements through machine learning.”. Ph.D. Thesis, Universität Göttingen, Göttingen, Germany, 2023. [Google Scholar]
- Tung, A.; Gerstoft, P. “Multipole Source Capture Using Multiple Dictionary Sparse Bayesian Learning.” In Proceedings of the 2024 58th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 2024.
- Pan, W.; Wei, L.; Feng, D.; et al. “Multipole transfer matrix model based sparse Bayesian learning approach for sound source identification. ” Appl. Acoust. 2024, 221, 109987. [Google Scholar] [CrossRef]
- Yuan, L.; Chen, Y.; Wang, T.; et al. “Tokens-to-Token ViT: Training vision transformers from scratch on ImageNet. In ” In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 558–567. [Google Scholar]
- Yin, H.; Vahdat, A.; Alvarez, J.M.; et al. “A-vit: Adaptive tokens for efficient vision transformer. In ” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 10809–10818. [Google Scholar]
- Jekosch, S.; Sarradj, E. “An Inverse Microphone Array Method for the Estimation of a Rotating Source Directivity.” Acoustics 2021, 3(3), 462–472.
- Haykin, S.; Justice, J.H.; Owsley, N.L.; Yen, J.L.; Kak, A.C. Array Signal Processing; Prentice-Hall: Englewood Cliffs, NJ, USA, 1984. [Google Scholar]
- Bao, F.; Nie, S.; Xue, K.; et al. “All are worth words: A ViT backbone for diffusion models. In ” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 22669–22679. [Google Scholar]
- Wang, A.; Chen, H.; Lin, Z.; et al. “RepViT: Revisiting mobile CNN from ViT perspective. In ” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024 (under review—arXiv preprint). [Google Scholar]
- Dosovitskiy, A.; Hoffer, E.; Singh, B.; et al. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” arXiv 2020, arXiv:2010.11929.
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; et al. “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. ” J. Big Data 2021, 8, 1–74. [Google Scholar] [CrossRef] [PubMed]
- Purwono, P.; Ma’arif, A.; Rahmaniar, W.; et al. “Understanding of convolutional neural network (CNN): a review.” Int. J. Robot. Control Syst. 2022, 2(4), 739–748.
- Shi, H.; Shao, H.; Mao, W.; Wang, Z. “Trio ViT: Post Training Quantization and Acceleration for Softmax Free Efficient Vision Transformer.” IEEE Trans. Circuits Syst. I: Regul. Pap. 2025, PP(99), 1–12.


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).