Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Diffusion Denoising Process with Gated U-Net for High-Quality Document Binarization

Version 1 : Received: 28 August 2023 / Approved: 29 August 2023 / Online: 30 August 2023 (08:23:05 CEST)

A peer-reviewed article of this Preprint also exists.

Han, S.; Ji, S.; Rhee, J. Diffusion-Denoising Process with Gated U-Net for High-Quality Document Binarization. Applied Sciences 2023, 13, 11141, doi:10.3390/app132011141. Han, S.; Ji, S.; Rhee, J. Diffusion-Denoising Process with Gated U-Net for High-Quality Document Binarization. Applied Sciences 2023, 13, 11141, doi:10.3390/app132011141.

Abstract

Binarization of degraded documents is an important preprocessing task for various document analysis such as OCR and historical document analysis. Existing studies have applied various convolutional neural network (CNN) models and generative models for document binarization, but they do not show generalized performance for noise that the model has not seen and it suffers from extracting elaborate text strokes. In this paper, to overcome these challenges, we utilize latent diffusion model (LDM), which is known for high-quality image generation model, for the first time in document binarization. By utilizing the iterative diffusion-denoising process in latent space, it shows high-quality cleaned binarized image generation and high generalized performance through using both data distribution and time step while training. Additionally, we apply gated U-Net to the backbone network to preserve text strokes using trainable gating value. Gated convolution can extract elaborate text stroke by allowing the model to focus on text region by combining gating value and feature. Furthermore, we maximize the effectiveness of the proposed model by training it with a combination of LDM loss and pixel-level loss, which is suitable for the model structure. Experiments on H-DIBCO and DIBCO benchmark datasets show that the proposed model outperforms existing methods.

Keywords

document binarization; deep learning; gated convolution; generative model; latent diffusion models; text stroke

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.