Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

(HTBNet)Arbitrary Shape Scene Text Detection with Binarization of Hyperbolic Tangent and Cross Entropy

Version 1 : Received: 13 May 2024 / Approved: 15 May 2024 / Online: 15 May 2024 (13:19:51 CEST)

How to cite: Chen, Z. (HTBNet)Arbitrary Shape Scene Text Detection with Binarization of Hyperbolic Tangent and Cross Entropy. Preprints 2024, 2024051040. https://doi.org/10.20944/preprints202405.1040.v1 Chen, Z. (HTBNet)Arbitrary Shape Scene Text Detection with Binarization of Hyperbolic Tangent and Cross Entropy. Preprints 2024, 2024051040. https://doi.org/10.20944/preprints202405.1040.v1

Abstract

The existing segmentation-based scene text detection methods mostly need complicated post-processing, and the post-processing operation is separated from the training process, which greatly reduces the detection performance. The previous method, DBNet successfully simplified post-processing and integrated the post-processing into a segmentation network. However, the training process of the model took a long time for 1200 epochs and the sensitivity to texts of various scales was lacking, leading to some text instances being missed. Considering the above two problems, we design the text detection Network with Binarization of Hyperbolic Tangent(HTBNet). First of all, we propose Binarization of Hyperbolic Tangent (HTB), optimized along with which, the segmentation network can expedite the initial convergent speed by reducing the amount of epochs from 1200 to 600. Because features of different channels in the same scale feature map focus on the information of different regions in the image, to better represent the important features of all objects in the image, we devise the Multi-Scale Channel Attention(MSCA). Meanwhile considering that multi-scale objects in the image cannot be simultaneously detected, we propose a novel module named Fused Module with Channel and Spatial(FMCS), which can fuse the multi-scale feature maps from channel and spatial dimension. Finally we adopt cross entropy as the loss function, which measures the difference between predicted values and ground truths. The experimental results show that HTBNet compared with lightweight models has achieved competitive performance and speed on Total-Text(F-measure:86.0%, FPS:30) and MSRA-TD500 (F-measure:87.5%, FPS:30).

Keywords

Scene Text Detection; binarization; hyperbolic tangent; MSCA; FMCS; cross entropy

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.