Submitted:
13 May 2024
Posted:
15 May 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We propose the Binarization of Hyperbolic Tangent(HTB), leading the convergence speed during training from 1200 epochs to 600 epochs.
- We design the cross-entropy loss function, which is differentiable, enabling the use of optimization algorithms such as gradient descent to minimize the loss function.
- We contrive the Multi-Scale Channel Attention(MSCA) and the Fused Module with Channel and Spatial(FMCS), which interfold features from different scales in channel and spatial. Our method achieves outstanding results on Total-Text and MSRA-TD500 benchmarks.
2. Related Work
2.1. Regression-Based Methods
2.2. Component-Based methods
2.3. Segmentation-Based methods
3. The Proposed Method
3.1. Overview
3.2. Multi-Scale Channel Attention(MSCA)
3.3. Fused Module with Channel and Spatial(FMCS)
3.4. Binarization of Hyperbolic Tangent(HTB)
3.5. Cross-Entropy Loss Function
4. Experiments and Results Analysis
4.1. Datasets and Evaluation
4.3. Comparisons with Other Advanced Methods
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Advances in neural information processing systems 2012, 25. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Lecun, Y.; et al. Gradient-based learning applied to document recognition. Proceedings of the Ieee 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Albelwi, S. Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging. Entropy, 2022. 24,. [CrossRef]
- Lu, C. Reviewing Evolution of Learning Functions and Semantic Information Measures for Understanding Deep Learning. Entropy, 2023. 25,. [CrossRef]
- Mazzaglia, P. , et al. The Free Energy Principle for Perception and Action: A Deep Learning Perspective. Entropy, 2022. 24,. [CrossRef]
- Vinodkumar, P.K. , et al. A Survey on Deep Learning Based Segmentation, Detection and Classification for 3D Point Clouds. Entropy, 2023. 25,. [CrossRef]
- Liu, X.Y.; Meng, G.F.; Pan, C.H. Scene text detection and recognition with advances in deep learning: a survey. International Journal on Document Analysis and Recognition 2019, 22, 143–162. [Google Scholar] [CrossRef]
- Long, S.B.; He, X.; Yao, C. Scene Text Detection and Recognition: The Deep Learning Era. International Journal of Computer Vision 2021, 129, 24. [Google Scholar] [CrossRef]
- Long, Y.; Sun, W.; Pang, Y.; et al. Research on text detection on building surfaces in smart cities based on deep learning. Soft Comput 2022, 26, 10103–10114. [Google Scholar] [CrossRef]
- Naiemi, F.; Ghods, V.; Khalesi, H. Scene text detection and recognition: a survey. Multimedia Tools and Applications 2022, 81, 20255–20290. [Google Scholar] [CrossRef]
- Wang, Q.; et al. LSV-LP: Large-Scale Video-Based License Plate Detection and Recognition. Ieee Transactions on Pattern Analysis and Machine Intelligence 2023, 45, 752–767. [Google Scholar] [CrossRef]
- Chen, T.Y.; et al. WHUVID: A Large-Scale Stereo-IMU Dataset for Visual-Inertial Odometry and Autonomous Driving in Chinese Urban Scenarios. Remote Sensing 2022, 14. [Google Scholar] [CrossRef]
- Pan, J.P.; et al. A Self-Attentive Hybrid Coding Network for 3D Change Detection in High-Resolution Optical Stereo Images. Remote Sensing 2022, 14. [Google Scholar] [CrossRef]
- Yu, W.; et al. A Systematic Review on Password Guessing Tasks. Entropy 2023, 25. [Google Scholar] [CrossRef] [PubMed]
- Gupta, N.; Jalal, A.S. Traditional to transfer learning progression on scene text detection and recognition: a survey. Artificial Intelligence Review 2022, 55, 3457–3502. [Google Scholar] [CrossRef]
- Khan, T.; Sarkar, R.; Mollah, A.F. Deep learning approaches to scene text detection: a comprehensive review. Artificial Intelligence Review 2021, 54, 3239–3298. [Google Scholar] [CrossRef]
- Liang, T.; et al. , A Closer Look at the Joint Training of Object Detection and Re-Identification in Multi-Object Tracking. IEEE Transactions on Image Processing 2023, 32, 267–280. [Google Scholar] [CrossRef] [PubMed]
- Machado, E.M.S.; et al. Visual Attention-Based Object Detection in Cluttered Environments. in 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). 2019.
- Wang, Z.C.; et al. AOGC: Anchor-Free Oriented Object Detection Based on Gaussian Centerness. Remote Sensing 2023, 15. [Google Scholar] [CrossRef]
- Wu, F.L.; et al. Improved Oriented Object Detection in Remote Sensing Images Based on a Three-Point Regression Method. Remote Sensing 2021, 13. [Google Scholar] [CrossRef]
- Wu, Z.; et al. Selecting High-Quality Proposals for Weakly Supervised Object Detection With Bottom-Up Aggregated Attention and Phase-Aware Loss. IEEE Transactions on Image Processing 2023, 32, 682–693. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.Y.; et al. Constraint Loss for Rotated Object Detection in Remote Sensing Images. Remote Sensing 2021, 13. [Google Scholar] [CrossRef]
- Liao, M.H.; Shi, B.G.; Bai, X. TextBoxes plus plus : A Single-Shot Oriented Scene Text Detector. Ieee Transactions on Image Processing 2018, 27, 3676–3690. [Google Scholar] [CrossRef]
- Liao, M.H.; et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Thirty-First Aaai Conference on Artificial Intelligence, 2017: p. 4161-4167.
- Liu, Y.L.; Jin, L.W. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection. 30th Ieee Conference on Computer Vision and Pattern Recognition (Cvpr 2017), 2017: p. 3454-3461.
- Wang, X.B. , et al., Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation. 2019 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr 2019), 2019: p. 6442-6451.
- Xue, C.H., S. J. Lu, and W. Zhang, MSR: Multi-Scale Shape Regression for Scene Text Detection. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019: p. 989-995.
- Zhou, X.Y. , et al., EAST: An Efficient and Accurate Scene Text Detector. 30th Ieee Conference on Computer Vision and Pattern Recognition (Cvpr 2017), 2017: p. 2642-2651.
- Baek, Y. , et al., Character Region Awareness for Text Detection. 2019 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr 2019), 2019: p. 9357-9366.
- Shi, B.G., X. Bai, and S. Belongie, Detecting Oriented Text in Natural Images by Linking Segments. 30th Ieee Conference on Computer Vision and Pattern Recognition (Cvpr 2017), 2017: p. 3482-3490.
- Tang, J.; et al. SegLink plus plus : Detecting Dense and Arbitrary-shaped Scene Text by Instance-aware Component Grouping. Pattern Recognition 2019, 96. [Google Scholar] [CrossRef]
- Tian, Z.; et al. Detecting Text in Natural Image with Connectionist Text Proposal Network. Computer Vision - Eccv 2016, Pt Viii 2016, 9912, 56–72. [CrossRef]
- Zhang, S.X. , et al. Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection. in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020.
- Deng, D. , et al., PixelLink: Detecting Scene Text via Instance Segmentation. Thirty-Second Aaai Conference on Artificial Intelligence / Thirtieth Innovative Applications of Artificial Intelligence Conference / Eighth Aaai Symposium on Educational Advances in Artificial Intelligence, 2018: p. 6773-6780.
- Liao, M.H. , et al. Real-Time Scene Text Detection with Differentiable Binarization. in 34th AAAI Conference on Artificial Intelligence / 32nd Innovative Applications of Artificial Intelligence Conference / 10th AAAI Symposium on Educational Advances in Artificial Intelligence. 2020. New York, NY: Assoc Advancement Artificial Intelligence.
- Liao, M.H.; et al. Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion. Ieee Transactions on Pattern Analysis and Machine Intelligence 2023, 45, 919–931. [Google Scholar] [CrossRef] [PubMed]
- Long, S.B.; et al. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. Computer Vision - Eccv 2018, Pt Ii 2018, 11206, 19–35.
- Tian, Z.T. , et al., Learning Shape-Aware Embedding for Scene Text Detection. 2019 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr 2019), 2019: p. 4229-4238.
- Wang, W.H. , et al., Shape Robust Text Detection with Progressive Scale Expansion Network. 2019 Ieee/Cvf Conference on Computer Vision and Pattern Recognition (Cvpr 2019), 2019: p. 9328-9337.
- Wang, W.H. , et al., Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. 2019 Ieee/Cvf International Conference on Computer Vision (Iccv 2019), 2019: p. 8439-8448.
- Xu, Y.; et al. , TextField: Learning a Deep Direction Field for Irregular Scene Text Detection. IEEE Trans Image Process 2019, 28, 5566–5579. [Google Scholar] [CrossRef] [PubMed]
- Graves, A., A. R. Mohamed, and G. Hinton, Speech Recognition with Deep Recurrent Neural Networks. 2013 Ieee International Conference on Acoustics, Speech and Signal Processing (Icassp), 2013: p. 6645-6649.
- He, K.M. , et al., Deep Residual Learning for Image Recognition. 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr), 2016: p. 770-778.
- LeCun, Y.; et al. , Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Ren, S.; et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Vaswani, A. , et al., Attention Is All You Need. Advances in Neural Information Processing Systems 30 (Nips 2017), 2017. 30.
- Neubeck, A. and L. Van Gool, Efficient non-maximum suppression. 18th International Conference on Pattern Recognition, Vol 3, Proceedings, 2006: p. 850-+.
- Liu, W.; et al. SSD: Single Shot MultiBox Detector. Computer Vision - Eccv 2016, Pt I 2016, 9905, 21–37.
- Lian, Z. , et al. PCBSNet: A Pure Convolutional Bilateral Segmentation Network for Real-Time Natural Scene Text Detection. Electronics, 2023. 12,. [CrossRef]
- Zhang, S. , et al. Irregular Scene Text Detection Based on a Graph Convolutional Network. Sensors, 2023. 23,. [CrossRef]
- Dinh, M.-T., D. -J. Choi, and G.-S. Lee DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection. Sensors, 2023. 23,. [CrossRef]
- Ch'ng, C.K. and C.S. Chan, Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition. 2017 14th Iapr International Conference on Document Analysis and Recognition (Icdar), Vol 1, 2017: p. 935-942.
- Yao, C. , et al., Detecting Texts of Arbitrary Orientations in Natural Images. 2012 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr), 2012: p. 1083-1090.
- Gupta, A., A. Vedaldi, and A. Zisserman, Synthetic Data for Text Localisation in Natural Images. 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr), 2016: p. 2315-2324.






| Module | HTB MSCA FMCS | Total-Text | ||||
| P | R | F | ||||
| DB_res18(baseline) | ╳ | ╳ | ╳ | 88.3 | 77.9 | 82.8 |
| res18 | ✓ | ╳ | ╳ | 90.9 | 77.1 | 83.5 |
| res18 | ╳ | ✓ | ╳ | 89 | 78.9 | 83.7 |
| res18 | ╳ | ╳ | ✓ | 88.5 | 79 | 83.5 |
| HTBNet_res18(Ours) | ✓ | ✓ | ✓ | 86.8 | 81.6 | 84.1 |
| Module | HTB MSCA FMCS | Total-Text | ||||
| P | R | F | ||||
| DB_res50(baseline) | ╳ | ╳ | ╳ | 87.1 | 82.5 | 84.7 |
| res50 | ✓ | ╳ | ╳ | 94.9 | 76.8 | 84.9 |
| res50 | ╳ | ✓ | ╳ | 87.9 | 82.8 | 85.3 |
| res50 | ╳ | ╳ | ✓ | 90.5 | 81.3 | 86 |
| HTBNet_res50(Ours) | ✓ | ✓ | ✓ | 91.3 | 81.3 | 86 |
| Module | HTB MSCA FMCS | MSRA-TD500 | ||||
| P | R | F | ||||
| DB_res18(baseline) | ╳ | ╳ | ╳ | 90.4 | 76.3 | 82.8 |
| res18 | ✓ | ╳ | ╳ | 89.3 | 77.7 | 83.1 |
| res18 | ╳ | ✓ | ╳ | 92.3 | 75.9 | 83.3 |
| res18 | ╳ | ╳ | ✓ | 88.8 | 82 | 85.3 |
| HTBNet_res18(Ours) | ✓ | ✓ | ✓ | 89.8 | 81.4 | 85.4 |
| Module | HTB MSCA FMCS | MSRA-TD500 | ||||
| P | R | F | ||||
| DB_res50(baseline) | ╳ | ╳ | ╳ | 91.5 | 79.2 | 84.9 |
| res50 | ✓ | ╳ | ╳ | 90.3 | 81.4 | 85.6 |
| res50 | ╳ | ✓ | ╳ | 89.7 | 82.3 | 85.8 |
| res50 | ╳ | ╳ | ✓ | 91.9 | 83.3 | 87.4 |
| HTBNet_res50(Ours) | ✓ | ✓ | ✓ | 92.2 | 83.3 | 87.5 |
| Methods | P | R | F | FPS |
| TextSnake[38] | 82.7 | 74.5 | 78.4 | * |
| PixelLink[35] | 53.5 | 52.7 | 53.1 | * |
| ATTR[27] | 76.2 | 80.9 | 78.5 | * |
| SAE[39] | 82.7 | 77.8 | 80.1 | * |
| PAN[41] | 89.3 | 81 | 85 | 39.6 |
| MSR[28] | 73 | 85.2 | 78.6 | * |
| DRRG[34] | 84.9 | 86.5 | 85.7 | * |
| DenseTextPVT[52] | 89.4 | 80.1 | 84.7 | * |
| DB++_res18[37] | 87.4 | 79.6 | 83.3 | 48 |
| DB++_res50[37] | 88.9 | 83.2 | 86 | 28 |
| DB_res18(baseline)[36] | 88.3 | 77.9 | 82.8 | 50 |
| DB_res50(baseline)[36] | 87.1 | 82.5 | 84.7 | 32 |
| HTBNet_res18(Ours) | 86.8 | 81.6 | 84.1 | 49 |
| HTBNet_res50(Ours) | 91.3 | 81.3 | 86 | 30 |
| Methods | P | R | F | FPS |
| TextSnake[38] | 83.2 | 73.9 | 78.3 | 1.1 |
| PixelLink[35] | 83 | 73.2 | 77.8 | 3 |
| ATTR[27] | 82.1 | 85.2 | 83.6 | 10 |
| SAE[39] | 84.2 | 81.7 | 82.9 | * |
| PAN[41] | 84.4 | 83.8 | 84.1 | 30.2 |
| MSR[28] | 76.7 | 87.4 | 81.7 | * |
| DRRG[34] | 82.3 | 88.1 | 85.1 | * |
| PCBSNet[50] | 90 | 76.7 | 82.8 | * |
| TDGCN[51] | 89.7 | 85.1 | 87.4 | * |
| DB++_res18[37] | 87.9 | 82.5 | 85.1 | 55 |
| DB++_res50[37] | 91.5 | 83.3 | 87.2 | 29 |
| DB_res18(baseline)[36] | 90.4 | 76.3 | 82.8 | 62 |
| DB_res50(baseline)[36] | 91.5 | 79.2 | 84.9 | 32 |
| HTBNet_res18(Ours) | 89.8 | 81.4 | 85.4 | 56 |
| HTBNet_res50(Ours) | 92.2 | 83.3 | 87.5 | 30 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).