Submitted:
12 June 2023
Posted:
13 June 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- we propose a novel approach to extract features from opposing color pairs in a neural network to exploit the strength of color opponent principle from human color perception. This approach permits to accelerate neural network learning;
- we propose a strategy to integrate color in patterns in a neural network by extracting features locally and between color channels at the same time in successively grouped feature maps that results in reducing the number of parameters and the depth of that neural network, while keeping good performance;
- we propose a lightweight salient object detection neural network architecture based on the proposed approach for learning opposing color pairs along with the strategy of integrating color in patterns. This lightweight salient object detection neural network has few parameters while having performance comparable to the state-of-the-art methods
2. Related work
2.1. Lightweight Salient Object Detection
2.2. Opponent Color models
3. Materials and Methods
3.1. Introduction
- L - M opponent for Red - Green;
- S - (L + M) opponent for Blue - Yellow.
- the color-opponency encoding in the HVS early stage;
- the fact that the color and pattern are linked inextricably in human color perception.
3.2. CoSOV1 : Cone- and Spatial-Opponency Primary Visual Cortex module
3.3. CoSOV1Net neural network model architecture
- The input RGB color channel pairing;
- The encoder;
- The decoder.
3.3.1. Input RGB color channel pairing
3.3.2. Encoder
3.3.3. Decoder
4. Experimental Results
4.1. Implementation Details
4.2. Datasets
4.3. Model Training Settings
- The first stage is with data augmentation. The data augmentation is applied on each batch with random transformation ( zoom in or horizontal flip or vertical flip). This stage has 480 epochs: 240 epochs with learning rate = and the following 240 epochs with learning rate=;
- The second stage is without data augmentation. It has 620 epochs: 240 epochs with learning rate = , followed by 140 epochs with learning rate = and 240 epochs with learning rate = .
4.4. Evaluation Metrics
4.4.1. Accuracy
4.4.2. Lightweight measures
4.5. Comparison with state-of-the-art
4.6. Comparison with SAMNet and HVPNet state-of-the-art
5. Discussion
6. Conclusion
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Ndayikengurukiye, D.; Mignotte, M. Salient Object Detection by LTP Texture Characterization on Opposing Color Pairs under SLICO Superpixel Constraint. Journal of Imaging 2022, 8, 110. [Google Scholar] [CrossRef] [PubMed]
- Smeulders, A.W.; Chu, D.M.; Cucchiara, R.; Calderara, S.; Dehghan, A.; Shah, M. Visual tracking: An experimental survey. IEEE transactions on pattern analysis and machine intelligence 2013, 36, 1442–1468. [Google Scholar]
- Pieters, R.; Wedel, M. Attention capture and transfer in advertising: Brand, pictorial, and text-size effects. Journal of marketing 2004, 68, 36–50. [Google Scholar] [CrossRef]
- Itti, L. Automatic foveation for video compression using a neurobiological model of visual attention. IEEE transactions on image processing 2004, 13, 1304–1318. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Feng, X.; Fan, H. Saliency-based image correction for colorblind patients. Computational Visual Media 2020, 6, 169–189. [Google Scholar] [CrossRef]
- Pinciroli Vago, N.O.; Milani, F.; Fraternali, P.; da Silva Torres, R. Comparing CAM algorithms for the identification of salient image features in iconography artwork analysis. Journal of Imaging 2021, 7, 106. [Google Scholar] [CrossRef]
- Gao, Y.; Shi, M.; Tao, D.; Xu, C. Database saliency for fast image retrieval. IEEE Transactions on Multimedia 2015, 17, 359–369. [Google Scholar] [CrossRef]
- Wong, L.K.; Low, K.L. Saliency-enhanced image aesthetics class prediction. In Proceedings of the 2009 16th IEEE international conference on image processing (ICIP). IEEE; 2009; pp. 997–1000. [Google Scholar]
- Liu, H.; Heynderickx, I. Studying the added value of visual attention in objective image quality metrics based on eye movement data. In Proceedings of the 2009 16th IEEE international conference on image processing (ICIP). IEEE; 2009; pp. 3097–3100. [Google Scholar]
- Chen, L.Q.; Xie, X.; Fan, X.; Ma, W.Y.; Zhang, H.J.; Zhou, H.Q. A visual attention model for adapting images on small displays. Multimedia systems 2003, 9, 353–364. [Google Scholar] [CrossRef]
- Chen, T.; Cheng, M.M.; Tan, P.; Shamir, A.; Hu, S.M. Sketch2photo: Internet image montage. ACM transactions on graphics (TOG) 2009, 28, 1–10. [Google Scholar]
- Huang, H.; Zhang, L.; Zhang, H.C. Arcimboldo-like collage using internet images. In Proceedings of the Proceedings of the 2011 SIGGRAPH Asia Conference, 2011, pp. 1-8.
- Gupta, A.K.; Seal, A.; Prasad, M.; Khanna, P. Salient object detection techniques in computer vision—A survey. Entropy 2020, 22, 1174. [Google Scholar] [CrossRef]
- Wang, W.; Lai, Q.; Fu, H.; Shen, J.; Ling, H.; Yang, R. Salient object detection in the deep learning era: An in-depth survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 2021, 44, 3239–3259. [Google Scholar] [CrossRef] [PubMed]
- Gao, S.H.; Tan, Y.Q.; Cheng, M.M.; Lu, C.; Chen, Y.; Yan, S. Highly efficient salient object detection with 100k parameters. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part VI. Springer; 2020; pp. 702–721. [Google Scholar]
- Liu, Y.; Zhang, X.Y.; Bian, J.W.; Zhang, L.; Cheng, M.M. SAMNet: Stereoscopically attentive multi-scale network for lightweight salient object detection. IEEE Transactions on Image Processing 2021, 30, 3804–3814. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Gu, Y.C.; Zhang, X.Y.; Wang, W.; Cheng, M.M. Lightweight salient object detection via hierarchical visual perception learning. IEEE Transactions on Cybernetics 2020, 51, 4439–4449. [Google Scholar] [CrossRef] [PubMed]
- Shapley, R.; Hawken, M.J. Color in the cortex: single-and double-opponent cells. Vision research 2011, 51, 701–717. [Google Scholar] [CrossRef] [PubMed]
- Kruger, N.; Janssen, P.; Kalkan, S.; Lappe, M.; Leonardis, A.; Piater, J.; Rodriguez-Sanchez, A.J.; Wiskott, L. Deep hierarchies in the primate visual cortex: What can we learn for computer vision? IEEE transactions on pattern analysis and machine intelligence 2012, 35, 1847–1871. [Google Scholar] [CrossRef] [PubMed]
- Nunez, V.; Shapley, R.M.; Gordon, J. Cortical double-opponent cells in color perception: perceptual scaling and chromatic visual evoked potentials. i-Perception 2018, 9, 2041669517752715. [Google Scholar] [CrossRef]
- Conway, B.R. Color vision, cones, and color-coding in the cortex. The neuroscientist 2009, 15, 274–290. [Google Scholar] [CrossRef]
- Conway, B.R. Spatial structure of cone inputs to color cells in alert macaque primary visual cortex (V-1). Journal of Neuroscience 2001, 21, 2768–2783. [Google Scholar] [CrossRef]
- Hunt, R.W.G.; Pointer, M.R. Measuring colour; John Wiley & Sons, 2011.
- Engel, S.; Zhang, X.; Wandell, B. Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature 1997, 388, 68–71. [Google Scholar] [CrossRef]
- Shapley, R. Physiology of color vision in primates. In Oxford Research Encyclopedia of Neuroscience; 2019.
- Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern recognition 2020, 106, 107404. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9 2015, Proceedings, Part III 18. Springer, 2015; pp. 234–241.
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861 2017. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510-4520.
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848-6856.
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the Proceedings of the European conference on computer vision (ECCV), 2018, pp. 116-131.
- Frintrop, S.; Werner, T.; Martin Garcia, G. Traditional saliency reloaded: A good old model in new shape. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 82-90.
- Mäenpää, T.; Pietikäinen, M. Classification with color and texture: jointly or separately? Pattern recognition 2004, 37, 1629–1640. [Google Scholar] [CrossRef]
- Chan, C.H.; Kittler, J.; Messer, K. Multispectral local binary pattern histogram for component-based color face verification. In Proceedings of the 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems. IEEE; 2007; pp. 1–7. [Google Scholar]
- Faloutsos, C.; Lin, K.I. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets; Vol. 24, ACM, 1995.
- Jain, A.; Healey, G. A multiscale representation including opponent color features for texture recognition. IEEE Transactions on Image Processing 1998, 7, 124–128. [Google Scholar] [CrossRef] [PubMed]
- Yang, K.F.; Gao, S.B.; Guo, C.F.; Li, C.Y.; Li, Y.J. Boundary detection using double-opponency and spatial sparseness constraint. IEEE Transactions on Image Processing 2015, 24, 2565–2578. [Google Scholar] [CrossRef] [PubMed]
- Hurvich, L.M.; Jameson, D. An opponent-process theory of color vision. Psychological review 1957, 64, 384. [Google Scholar] [CrossRef]
- Farabet, C.; Couprie, C.; Najman, L.; LeCun, Y. Learning hierarchical features for scene labeling. IEEE transactions on pattern analysis and machine intelligence 2012, 35, 1915–1929. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International conference on machine learning. pmlr; 2015; pp. 448–456. [Google Scholar]
- Santurkar, S.; Tsipras, D.; Ilyas, A.; Madry, A. How does batch normalization help optimization? Advances in neural information processing systems 2018, 31. [Google Scholar]
- Chollet, F. Deep learning with Python; Simon and Schuster, 2021.
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251-1258.
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. Dropblock: A regularization method for convolutional networks. Advances in neural information processing systems 2018, 31. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 2014, 15, 1929–1958. [Google Scholar]
- Pietikäinen, M.; Hadid, A.; Zhao, G.; Ahonen, T. Computer vision using local binary patterns; Vol. 40, Springer Science & Business Media, 2011.
- Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep learning for computer vision: A brief review. Computational intelligence and neuroscience 2018, 2018. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016; pp. 770–778.
- Chollet, F.; et al. Keras. https://keras.io, 2015.
- Borji, A.; Cheng, M.M.; Jiang, H.; Li, J. Salient object detection: A benchmark. IEEE transactions on image processing 2015, 24, 5706–5722. [Google Scholar] [CrossRef]
- Shi, J.; Yan, Q.; Xu, L.; Jia, J. Hierarchical image saliency detection on extended CSSD. IEEE transactions on pattern analysis and machine intelligence 2016, 38, 717–729. [Google Scholar] [CrossRef]
- Yang, C.; Zhang, L.; Lu, H.; Ruan, X.; Yang, M.H. Saliency detection via graph-based manifold ranking. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2013; pp. 3166–3173.
- Wang, L.; Lu, H.; Wang, Y.; Feng, M.; Wang, D.; Yin, B.; Ruan, X. Learning to detect salient objects with image-level supervision. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2017; pp. 136–145.
- Li, G.; Yu, Y. Visual saliency based on multiscale deep features. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2015; pp. 5455–5463.
- Cheng, M.M.; Mitra, N.J.; Huang, X.; Hu, S.M. Salientshape: group saliency in image collections. The visual computer 2014, 30, 443–453. [Google Scholar] [CrossRef]
- Feng, W.; Li, X.; Gao, G.; Chen, X.; Liu, Q. Multi-scale global contrast CNN for salient object detection. Sensors 2020, 20, 2656. [Google Scholar] [CrossRef] [PubMed]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv, 2014; arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the Proceedings of the IEEE international conference on computer vision, 2015; pp. 1026–1034.
- Tieleman, T.; Hinton, G.; et al. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 2012, 4, 26–31. [Google Scholar]
- Margolin, R.; Zelnik-Manor, L.; Tal, A. How to evaluate foreground maps? In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2014; pp. 248–255. [Google Scholar]
- Jiang, H.; Wang, J.; Yuan, Z.; Wu, Y.; Zheng, N.; Li, S. Salient object detection: A discriminative regional feature integration approach. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2013; pp. 2083–2090.
- Li, G.; Yu, Y. Deep contrast learning for salient object detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016; pp. 478–487.
- Liu, N.; Han, J. Dhsnet: Deep hierarchical saliency network for salient object detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016; pp. 678–686.
- Wei, J.; Zhong, B. Saliency detection using fully convolutional network. In Proceedings of the 2018 Chinese Automation Congress (CAC). IEEE; 2018; pp. 3902–3907. [Google Scholar]
- Luo, Z.; Mishra, A.; Achkar, A.; Eichel, J.; Li, S.; Jodoin, P.M. Non-local deep features for salient object detection. In Proceedings of the Proceedings of the IEEE Conference on computer vision and pattern recognition, 2017; pp. 6609–6617.
- Hou, Q.; Cheng, M.M.; Hu, X.; Borji, A.; Tu, Z.; Torr, P.H. Deeply supervised salient object detection with short connections. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2017; pp. 3203–3212.
- Zhang, P.; Wang, D.; Lu, H.; Wang, H.; Ruan, X. Amulet: Aggregating multi-level convolutional features for salient object detection. In Proceedings of the Proceedings of the IEEE international conference on computer vision, 2017; pp. 202–211.
- Zhang, P.; Wang, D.; Lu, H.; Wang, H.; Yin, B. Learning uncertain convolutional features for accurate saliency detection. In Proceedings of the Proceedings of the IEEE International Conference on computer vision, 2017; pp. 212–221.
- Wang, T.; Borji, A.; Zhang, L.; Zhang, P.; Lu, H. A stagewise refinement model for detecting salient objects in images. In Proceedings of the Proceedings of the IEEE international conference on computer vision, 2017; pp. 4019–4028.
- Liu, N.; Han, J.; Yang, M.H. Picanet: Learning pixel-wise contextual attention for saliency detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018; pp. 3089–3098.
- Wang, T.; Zhang, L.; Wang, S.; Lu, H.; Yang, G.; Ruan, X.; Borji, A. Detect globally, refine locally: A novel approach to saliency detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018; pp. 3127–3135.
- Li, X.; Yang, F.; Cheng, H.; Liu, W.; Shen, D. Contour knowledge transfer for salient object detection. In Proceedings of the Proceedings of the european conference on computer vision (ECCV), 2018; pp. 355–370.
- Chen, S.; Tan, X.; Wang, B.; Hu, X. Reverse attention for salient object detection. In Proceedings of the Proceedings of the European conference on computer vision (ECCV), 2018; pp. 234–250.
- Liu, Y.; Cheng, M.M.; Zhang, X.Y.; Nie, G.Y.; Wang, M. DNA: Deeply supervised nonlinear aggregation for salient object detection. IEEE Transactions on Cybernetics 2021, 52, 6131–6142. [Google Scholar] [CrossRef]
- Wu, Z.; Su, L.; Huang, Q. Cascaded partial decoder for fast and accurate salient object detection. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019; pp. 3907–3916.
- Qin, X.; Zhang, Z.; Huang, C.; Gao, C.; Dehghan, M.; Jagersand, M. Basnet: Boundary-aware salient object detection. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019; pp. 7479–7489.
- Feng, M.; Lu, H.; Ding, E. Attentive feedback network for boundary-aware salient object detection. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019; pp. 1623–1632.
- Liu, J.J.; Hou, Q.; Cheng, M.M.; Feng, J.; Jiang, J. A simple pooling-based design for real-time salient object detection. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019; pp. 3917–3926.
- Zhao, J.X.; Liu, J.J.; Fan, D.P.; Cao, Y.; Yang, J.; Cheng, M.M. EGNet: Edge guidance network for salient object detection. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2019; pp. 8779–8788.
- Su, J.; Li, J.; Zhang, Y.; Xia, C.; Tian, Y. Selectivity or invariance: Boundary-aware salient object detection. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2019; pp. 3799–3808.
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the Proceedings of the European conference on computer vision (ECCV), 2018; pp. 405–420.
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the Proceedings of the European conference on computer vision (ECCV), 2018; pp. 325–341.
- Li, H.; Xiong, P.; Fan, H.; Sun, J. Dfanet: Deep feature aggregation for real-time semantic segmentation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019; pp. 9522–9531.











| Methods | #Param (M)↓ | FLOPS (G) ↓ | Speed (FPS) ↑ | ECSSD | DUT-OMRON | DUTS-TE | HKU-IS | THUR15K |
|---|---|---|---|---|---|---|---|---|
| DRFI[61] | - | - | 0.1 | 0.777 | 0.652 | 0.649 | 0.774 | 0.670 |
| DCL [62] | 66.24 | 224.9 | 1.4 | 0.895 | 0.733 | 0.785 | 0.892 | 0.747 |
| DHSNet [63] | 94.04 | 15.8 | 10.0 | 0.903 | - | 0.807 | 0.889 | 0.752 |
| RFCN [64] | 134.69 | 102.8 | 0.4 | 0.896 | 0.738 | 0.782 | 0.892 | 0.754 |
| NLDF [65] | 35.49 | 263.9 | 18.5 | 0.902 | 0.753 | 0.806 | 0.902 | 0.762 |
| DSS [66] | 62.23 | 114.6 | 7.0 | 0.915 | 0.774 | 0.827 | 0.913 | 0.770 |
| Amulet [67] | 33.15 | 45.3 | 9.7 | 0.913 | 0.743 | 0.778 | 0.897 | 0.755 |
| UCF [68] | 23.98 | 61.4 | 12.0 | 0.901 | 0.730 | 0.772 | 0.888 | 0.758 |
| SRM [69] | 43.74 | 20.3 | 12.3 | 0.914 | 0.769 | 0.826 | 0.906 | 0.778 |
| PiCANet [70] | 32.85 | 37.1 | 5.6 | 0.923 | 0.766 | 0.837 | 0.916 | 0.783 |
| BRN [71] | 126.35 | 24.1 | 3.6 | 0.919 | 0.774 | 0.827 | 0.910 | 0.769 |
| C2S [72] | 137.03 | 20.5 | 16.7 | 0.907 | 0.759 | 0.811 | 0.898 | 0.775 |
| RAS [73] | 20.13 | 35.6 | 20.4 | 0.916 | 0.785 | 0.831 | 0.913 | 0.772 |
| DNA [74] | 20.06 | 82.5 | 25.0 | 0.935 | 0.799 | 0.865 | 0.930 | 0.793 |
| CPD [75] | 29.23 | 59.5 | 68.0 | 0.930 | 0.794 | 0.861 | 0.924 | 0.795 |
| BASNet [76] | 87.06 | 127.3 | 36.2 | 0.938 | 0.805 | 0.859 | 0.928 | 0.783 |
| AFNet [77] | 37.11 | 38.4 | 21.6 | 0.930 | 0.784 | 0.857 | 0.921 | 0.791 |
| PoolNet [78] | 53.63 | 123.4 | 39.7 | 0.934 | 0.791 | 0.866 | 0.925 | 0.800 |
| EGNet [79] | 108.07 | 270.8 | 12.7 | 0.938 | 0.794 | 0.870 | 0.928 | 0.800 |
| BANet [80] | 55.90 | 121.6 | 12.5 | 0.940 | 0.803 | 0.872 | 0.932 | 0.796 |
| CoSOV1Net (OURS) | 1.14 | 1.4 | 211.2 | 0.931 | 0.789 | 0.833 | 0.912 | 0.773 |
| Methods | #Param (M)↓ | FLOPS (G) ↓ | Speed (FPS) ↑ | ECSSD | DUT-OMRON | DUTS-TE | HKU-IS | THUR15K |
|---|---|---|---|---|---|---|---|---|
| DRFI[61] | - | - | 0.1 | 0.161 | 0.138 | 0.154 | 0.146 | 0.150 |
| DCL [62] | 66.24 | 224.9 | 1.4 | 0.080 | 0.095 | 0.082 | 0.063 | 0.096 |
| DHSNet [63] | 94.04 | 15.8 | 10.0 | 0.062 | - | 0.066 | 0.053 | 0.082 |
| RFCN [64] | 134.69 | 102.8 | 0.4 | 0.097 | 0.095 | 0.089 | 0.080 | 0.100 |
| NLDF [65] | 35.49 | 263.9 | 18.5 | 0.066 | 0.080 | 0.065 | 0.048 | 0.080 |
| DSS [66] | 62.23 | 114.6 | 7.0 | 0.056 | 0.066 | 0.056 | 0.041 | 0.074 |
| Amulet [67] | 33.15 | 45.3 | 9.7 | 0.061 | 0.098 | 0.085 | 0.051 | 0.094 |
| UCF [68] | 23.98 | 61.4 | 12.0 | 0.071 | 0.120 | 0.112 | 0.062 | 0.112 |
| SRM [69] | 43.74 | 20.3 | 12.3 | 0.056 | 0.069 | 0.059 | 0.046 | 0.077 |
| PiCANet [70] | 32.85 | 37.1 | 5.6 | 0.049 | 0.068 | 0.054 | 0.042 | 0.083 |
| BRN [71] | 126.35 | 24.1 | 3.6 | 0.043 | 0.062 | 0.050 | 0.036 | 0.076 |
| C2S [72] | 137.03 | 20.5 | 16.7 | 0.057 | 0.072 | 0.062 | 0.046 | 0.083 |
| RAS [73] | 20.13 | 35.6 | 20.4 | 0.058 | 0.063 | 0.059 | 0.045 | 0.075 |
| DNA [74] | 20.06 | 82.5 | 25.0 | 0.041 | 0.056 | 0.044 | 0.031 | 0.069 |
| CPD [75] | 29.23 | 59.5 | 68.0 | 0.044 | 0.057 | 0.043 | 0.033 | 0.068 |
| BASNet [76] | 87.06 | 127.3 | 36.2 | 0.040 | 0.056 | 0.048 | 0.032 | 0.073 |
| AFNet [77] | 37.11 | 38.4 | 21.6 | 0.045 | 0.057 | 0.046 | 0.036 | 0.072 |
| PoolNet [78] | 53.63 | 123.4 | 39.7 | 0.048 | 0.057 | 0.043 | 0.037 | 0.068 |
| EGNet [79] | 108.07 | 270.8 | 12.7 | 0.044 | 0.056 | 0.044 | 0.034 | 0.070 |
| BANet [80] | 55.90 | 121.6 | 12.5 | 0.038 | 0.059 | 0.040 | 0.031 | 0.068 |
| CoSOV1Net (OURS) | 1.14 | 1.4 | 211.2 | 0.051 | 0.064 | 0.057 | 0.045 | 0.076 |
| Methods | #Param (M)↓ | FLOPS (G) ↓ | Speed (FPS) ↑ | ECSSD | DUT-OMRON | DUTS-TE | HKU-IS | THUR15K |
|---|---|---|---|---|---|---|---|---|
| DRFI[61] | - | - | 0.1 | 0.548 | 0.424 | 0.378 | 0.504 | 0.444 |
| DCL [62] | 66.24 | 224.9 | 1.4 | 0.782 | 0.584 | 0.632 | 0.770 | 0.624 |
| DHSNet [63] | 94.04 | 15.8 | 10.0 | 0.837 | - | 0.705 | 0.816 | 0.666 |
| RFCN [64] | 134.69 | 102.8 | 0.4 | 0.725 | 0.562 | 0.586 | 0.707 | 0.591 |
| NLDF [65] | 35.49 | 263.9 | 18.5 | 0.835 | 0.634 | 0.710 | 0.838 | 0.676 |
| DSS [66] | 62.23 | 114.6 | 7.0 | 0.864 | 0.688 | 0.752 | 0.862 | 0.702 |
| Amulet [67] | 33.15 | 45.3 | 9.7 | 0.839 | 0.626 | 0.657 | 0.817 | 0.650 |
| UCF [68] | 23.98 | 61.4 | 12.0 | 0.805 | 0.573 | 0.595 | 0.779 | 0.613 |
| SRM [69] | 43.74 | 20.3 | 12.3 | 0.849 | 0.658 | 0.721 | 0.835 | 0.684 |
| PiCANet [70] | 32.85 | 37.1 | 5.6 | 0.862 | 0.691 | 0.745 | 0.847 | 0.687 |
| BRN [71] | 126.35 | 24.1 | 3.6 | 0.887 | 0.709 | 0.774 | 0.875 | 0.712 |
| C2S [72] | 137.03 | 20.5 | 16.7 | 0.849 | 0.663 | 0.717 | 0.835 | 0.685 |
| RAS [73] | 20.13 | 35.6 | 20.4 | 0.855 | 0.695 | 0.739 | 0.849 | 0.691 |
| DNA [74] | 20.06 | 82.5 | 25.0 | 0.897 | 0.729 | 0.797 | 0.889 | 0.723 |
| CPD [75] | 29.23 | 59.5 | 68.0 | 0.889 | 0.715 | 0.799 | 0.879 | 0.731 |
| BASNet [76] | 87.06 | 127.3 | 36.2 | 0.898 | 0.751 | 0.802 | 0.889 | 0.721 |
| AFNet [77] | 37.11 | 38.4 | 21.6 | 0.880 | 0.717 | 0.784 | 0.869 | 0.719 |
| PoolNet [78] | 53.63 | 123.4 | 39.7 | 0.875 | 0.710 | 0.783 | 0.864 | 0.724 |
| EGNet [79] | 108.07 | 270.8 | 12.7 | 0.886 | 0.727 | 0.796 | 0.876 | 0.727 |
| BANet [80] | 55.90 | 121.6 | 12.5 | 0.901 | 0.736 | 0.810 | 0.889 | 0.730 |
| CoSOV1Net (OURS) | 1.14 | 1.4 | 211.2 | 0.861 | 0.696 | 0.731 | 0.834 | 0.688 |
| Methods | #Param (M)↓ | FLOPS (G) ↓ | Speed (FPS) ↑ | ECSSD | DUT-OMRON | DUTS-TE | HKU-IS | THUR15K |
|---|---|---|---|---|---|---|---|---|
| MobileNet* [28] | 4.27 | 2.2 | 295.8 | 0.906 | 0.753 | 0.804 | 0.895 | 0.767 |
| MobileNetV2* [29] | 2.37 | 0.8 | 446.2 | 0.905 | 0.758 | 0.798 | 0.890 | 0.766 |
| ShuffleNet* [30] | 1.80 | 0.7 | 406.9 | 0.907 | 0.757 | 0.811 | 0.898 | 0.771 |
| ShuffleNetV2* [31] | 1.60 | 0.5 | 452.5 | 0.901 | 0.746 | 0.789 | 0.884 | 0.755 |
| ICNet [81] | 6.70 | 6.3 | 75.1 | 0.918 | 0.773 | 0.810 | 0.898 | 0.768 |
| BiSeNet R18 [82] | 13.48 | 25.0 | 120.5 | 0.909 | 0.757 | 0.815 | 0.902 | 0.776 |
| BiSeNet X39 [82] | 1.84 | 7.3 | 165.8 | 0.901 | 0.755 | 0.787 | 0.888 | 0.756 |
| DFANet [83] | 1.83 | 1.7 | 91.4 | 0.896 | 0.750 | 0.791 | 0.884 | 0.757 |
| HVPNet [17] | 1.23 | 1.1 | 333.2 | 0.925 | 0.799 | 0.839 | 0.915 | 0.787 |
| SAMNet [16] | 1.33 | 0.5 | 343.2 | 0.925 | 0.797 | 0.835 | 0.915 | 0.785 |
| CoSOV1Net (OURS) | 1.14 | 1.4 | 211.2 | 0.931 | 0.789 | 0.833 | 0.912 | 0.773 |
| Methods | #Param (M)↓ | FLOPS (G) ↓ | Speed (FPS) ↑ | ECSSD | DUT-OMRON | DUTS-TE | HKU-IS | THUR15K |
|---|---|---|---|---|---|---|---|---|
| MobileNet* [28] | 4.27 | 2.2 | 295.8 | 0.064 | 0.073 | 0.066 | 0.052 | 0.081 |
| MobileNetV2* [29] | 2.37 | 0.8 | 446.2 | 0.066 | 0.075 | 0.070 | 0.056 | 0.085 |
| ShuffleNet* [30] | 1.80 | 0.7 | 406.9 | 0.062 | 0.069 | 0.062 | 0.050 | 0.078 |
| ShuffleNetV2* [31] | 1.60 | 0.5 | 452.5 | 0.069 | 0.076 | 0.071 | 0.059 | 0.086 |
| ICNet [81] | 6.70 | 6.3 | 75.1 | 0.059 | 0.072 | 0.067 | 0.052 | 0.084 |
| BiSeNet R18 [82] | 13.48 | 25.0 | 120.5 | 0.062 | 0.072 | 0.062 | 0.049 | 0.080 |
| BiSeNet X39 [82] | 1.84 | 7.3 | 165.8 | 0.070 | 0.078 | 0.074 | 0.059 | 0.090 |
| DFANet [83] | 1.83 | 1.7 | 91.4 | 0.073 | 0.078 | 0.075 | 0.061 | 0.089 |
| HVPNet [17] | 1.23 | 1.1 | 333.2 | 0.055 | 0.064 | 0.058 | 0.045 | 0.076 |
| SAMNet [16] | 1.33 | 0.5 | 343.2 | 0.053 | 0.065 | 0.058 | 0.045 | 0.077 |
| CoSOV1Net (OURS) | 1.14 | 1.4 | 211.2 | 0.051 | 0.064 | 0.057 | 0.045 | 0.076 |
| Methods | #Param (M)↓ | FLOPS (G) ↓ | Speed (FPS) ↑ | ECSSD | DUT-OMRON | DUTS-TE | HKU-IS | THUR15K |
|---|---|---|---|---|---|---|---|---|
| MobileNet* [28] | 4.27 | 2.2 | 295.8 | 0.829 | 0.656 | 0.696 | 0.816 | 0.675 |
| MobileNetV2* [29] | 2.37 | 0.8 | 446.2 | 0.820 | 0.651 | 0.676 | 0.799 | 0.660 |
| ShuffleNet* [30] | 1.80 | 0.7 | 406.9 | 0.831 | 0.667 | 0.709 | 0.820 | 0.683 |
| ShuffleNetV2* [31] | 1.60 | 0.5 | 452.5 | 0.812 | 0.637 | 0.665 | 0.788 | 0.652 |
| ICNet [81] | 6.70 | 6.3 | 75.1 | 0.838 | 0.669 | 0.694 | 0.812 | 0.668 |
| BiSeNet R18 [82] | 13.48 | 25.0 | 120.5 | 0.829 | 0.648 | 0.699 | 0.819 | 0.675 |
| BiSeNet X39 [82] | 1.84 | 7.3 | 165.8 | 0.802 | 0.632 | 0.652 | 0.784 | 0.641 |
| DFANet [83] | 1.83 | 1.7 | 91.4 | 0.799 | 0.627 | 0.652 | 0.778 | 0.639 |
| HVPNet [17] | 1.23 | 1.1 | 333.2 | 0.854 | 0.699 | 0.730 | 0.839 | 0.696 |
| SAMNet [16] | 1.33 | 0.5 | 343.2 | 0.855 | 0.699 | 0.729 | 0.837 | 0.693 |
| CoSOV1Net (OURS) | 1.14 | 1.4 | 211.2 | 0.861 | 0.696 | 0.731 | 0.834 | 0.688 |
| Measure | #Param (M)↓ | FLOPS (G) ↓ | Speed (FPS) ↑ | ECSSD | DUT-OMRON | DUTS-TE | HKU-IS | THUR15K |
|---|---|---|---|---|---|---|---|---|
| 1st | 1st | 1st | 6th | 7th | 9th | 11th | 11th | |
| MAE | 1st | 1st | 1st | 10th | 10th | 11th | 11th | 10th |
| 1st | 1st | 1st | 11th | 9th | 11th | 15th | 11th |
| Measure | #Param (M)↓ | FLOPS (G) ↓ | Speed (FPS) ↑ | ECSSD | DUT-OMRON | DUTS-TE | HKU-IS | THUR15K |
|---|---|---|---|---|---|---|---|---|
| 1st | 6th | 7th | 1st | 3rd | 3rd | 3rd | 4th | |
| MAE | 1st | 6th | 7th | 1st | 1st | 1st | 1st | 2nd |
| 1st | 6th | 7th | 1st | 3rd | 1st | 3rd | 3rd |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).