Cho, H.; Lee, J.S.; Kim, J.S.; Koom, W.S.; Kim, H. Empowering Vision Transformer by Network Hyper-Parameter Selection for Whole Pelvis Prostate Planning Target Volume Auto-Segmentation. Cancers2023, 15, 5507.
Cho, H.; Lee, J.S.; Kim, J.S.; Koom, W.S.; Kim, H. Empowering Vision Transformer by Network Hyper-Parameter Selection for Whole Pelvis Prostate Planning Target Volume Auto-Segmentation. Cancers 2023, 15, 5507.
Cho, H.; Lee, J.S.; Kim, J.S.; Koom, W.S.; Kim, H. Empowering Vision Transformer by Network Hyper-Parameter Selection for Whole Pelvis Prostate Planning Target Volume Auto-Segmentation. Cancers2023, 15, 5507.
Cho, H.; Lee, J.S.; Kim, J.S.; Koom, W.S.; Kim, H. Empowering Vision Transformer by Network Hyper-Parameter Selection for Whole Pelvis Prostate Planning Target Volume Auto-Segmentation. Cancers 2023, 15, 5507.
Abstract
U-Net, based on a deep convolutional neural network (CNN), has been clinically used to au-to-segment normal organs and potentially target volumes. However, CNNs with local geometric dependencies may limit the accuracy of segmentation. Additionally, the performance of CNNs can vary depending on the selection of network hyper-parameters, which was mitigated by the proposition of nnU-Net. We chose a vision transformer architecture called VT U-Net, which features a self-attention excluding the convolution layer, to overcome the limitations of CNNs by utilizing global geometric information of images. The VT U-Net v.2 became more powerful thanks to the adaptive hyper-parameter optimizer embedded in nnU-Net. However, despite leveraging the benefits of nnU-Net, VT U-Net v.2 still had additional network hyper-parameters that needed to be optimally chosen. Accordingly, among various hyper-parameters, this study attempted to find the optimal combination of the patch size and the embedded dimension regarding the transformer. From the 4-fold cross-validation, the modified VT U-Net v.2 showed the highest average performance for planning target volume (PTV) segmentation among the investigated networks. Though nnU-Net was based on convolution layers, the adaptive hyper-parameter optimizers turned out to enhance the performance. It was also confirmed that network hyper-parameters affected the segmentation accuracy of vision transformers.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.