Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

An Efficient Transformer-CNN Network for Document Image Binarization

Version 1 : Received: 22 April 2024 / Approved: 23 April 2024 / Online: 25 April 2024 (15:03:24 CEST)

How to cite: Zhang, L.; Wang, K.; Wan, Y. An Efficient Transformer-CNN Network for Document Image Binarization. Preprints 2024, 2024041594. https://doi.org/10.20944/preprints202404.1594.v1 Zhang, L.; Wang, K.; Wan, Y. An Efficient Transformer-CNN Network for Document Image Binarization. Preprints 2024, 2024041594. https://doi.org/10.20944/preprints202404.1594.v1

Abstract

Color image binarization plays a pivotal role in image preprocessing work, significantly impacting subsequent tasks, particularly in text recognition. This paper concentrates on Document Image Binarization (DIB), aiming to separate a image into foreground (text) and background (non-text content). Through a thorough analysis of conventional and deep learning-based approaches, we conclude that prevailing DIB methods leverage deep learning technology. Furthermore, we explore the receptive fields pre- and post-network training to underscore the Transformer model's advantages. Subsequently, we introduce a lightweight model based on the U-Net structure, enhanced with the Mobile ViT module to better capture global information features in document images. Given its adeptness at learning both local and global features, our proposed model exhibits superior performance on two standard datasets (DIBCO2012 and DIBCO2017) compared to state-of-the-art methods. Notably, our proposed DIB method presents a straightforward end-to-end model devoid of additional image preprocessing or post-processing. Moreover, its parameter count is less than a quarter of the HIP'23 model, which achives best results on three datasets(DIBCO2012, DIBCO2017 and DIBCO2018). Finally, two sets of ablation experiments were conducted to verify the effectiveness of the proposed binarization model.

Keywords

document image binarization; U-Net; transformer; mobile ViT

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.