Han, S.; Ji, S.; Rhee, J. Diffusion-Denoising Process with Gated U-Net for High-Quality Document Binarization. Applied Sciences 2023, 13, 11141, doi:10.3390/app132011141.
Han, S.; Ji, S.; Rhee, J. Diffusion-Denoising Process with Gated U-Net for High-Quality Document Binarization. Applied Sciences 2023, 13, 11141, doi:10.3390/app132011141.
Han, S.; Ji, S.; Rhee, J. Diffusion-Denoising Process with Gated U-Net for High-Quality Document Binarization. Applied Sciences 2023, 13, 11141, doi:10.3390/app132011141.
Han, S.; Ji, S.; Rhee, J. Diffusion-Denoising Process with Gated U-Net for High-Quality Document Binarization. Applied Sciences 2023, 13, 11141, doi:10.3390/app132011141.
Abstract
Binarization of degraded documents is an important preprocessing task for various document analysis such as OCR and historical document analysis. Existing studies have applied various convolutional neural network (CNN) models and generative models for document binarization, but they do not show generalized performance for noise that the model has not seen and it suffers from extracting elaborate text strokes. In this paper, to overcome these challenges, we utilize latent diffusion model (LDM), which is known for high-quality image generation model, for the first time in document binarization. By utilizing the iterative diffusion-denoising process in latent space, it shows high-quality cleaned binarized image generation and high generalized performance through using both data distribution and time step while training. Additionally, we apply gated U-Net to the backbone network to preserve text strokes using trainable gating value. Gated convolution can extract elaborate text stroke by allowing the model to focus on text region by combining gating value and feature. Furthermore, we maximize the effectiveness of the proposed model by training it with a combination of LDM loss and pixel-level loss, which is suitable for the model structure. Experiments on H-DIBCO and DIBCO benchmark datasets show that the proposed model outperforms existing methods.
Keywords
document binarization; deep learning; gated convolution; generative model; latent diffusion models; text stroke
Subject
Computer Science and Mathematics, Computer Vision and Graphics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.