Submitted:
01 August 2024
Posted:
02 August 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
3. Proposed Methodology


-
Image Preprocessing:To simplify the input image and eliminate color information, we begin by converting it into a gray scale representation.
-
Noise Reduction:A Gaussian blur filter is applied to the gray scale image in order to effectively reduce noise and enhance the quality of the subsequent processing steps.
-
Candidate Table Region Identification:Using the Canny edge detection technique on the smoothed image, we identify regions that exhibit prominent edges, which are indicative of potential table boundaries. These identified regions are referred to as "candidate table regions."
-
Refinement of Candidate Regions:To refine the search for tables, we apply filters based on region size, aspect ratio, and rectangular shape. By considering these geometric characteristics, we aim to eliminate false positives and improve the accuracy of table detection.
-
Tabular Region Labeling:By leveraging XML nodes representation and predefined annotations, we label the identified tabular regions within the image document Figure 3. These annotations serve as crucial reference points for subsequent text extraction steps.
-
Text Extraction Using Encoder-Dual Decoder (EDD) Architecture:Upon successfully identifying the table regions, we employ an advanced Encoder-Dual Decoder (EDD) architecture to extract the textual content contained within the individual cells of the tables. This process enables us to obtain machine-readable data in a structured format.
-
Structural Evaluation:To evaluate the accuracy of our results, we conduct an analysis of the spatial relationships between the table cells. By comparing these relationships against a predefined table structure, we assess the alignment and coherence of the extracted tables.
-
Output Generation:In the final step, we generate the output in HTML format, providing a user-friendly representation of the recognized tables that can be easily viewed and further processed.
-
Language Modeling:Finally, we use Vector Space Model (VSM) for language modeling using HTML dataset of tables. By using VSM, we get resultant tables against query of any word.Through the execution of these steps, our proposed methodology offers a comprehensive solution for table detection and recognition in documents with image format. The combination of preprocessing, region identification, text extraction, and structural evaluation ensures accurate and reliable table extraction from diverse document images. We further extend our methodology by adding language model (VSM) for searching tabular data against any query.
3.1. Dataset Details
3.2. Encoder-Dual Decoder (EDD) Architecture
3.3. Vector Space Model
4. Experimentation and Results
4.1. Qualitative Analysis
4.2. Quantitative Analysis
5. Conclusion
6. Discussion and Future Direction
References
- Naseer, A.; Zafar, K. Meta-feature based few-shot Siamese learning for Urdu optical character recognition. Computational Intelligence 2022, 38, 1707–1727. [Google Scholar] [CrossRef]
- Schönfelder, P.; Aziz, A.; Faltin, B.; König, M. Automating the retrospective generation of As-is BIM models using machine learning. Automation in Construction 2023, 152, 104937. [Google Scholar] [CrossRef]
- Shahzad, M.A.; Noor, R.; Ahmad, S.; Mian, A.; Shafait, F. Feature engineering meets deep learning: A case study on table detection in documents. 2019 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2019, pp. 1–6.
- Tamoor, M.; Naseer, A.; Khan, A.; Zafar, K. Skin lesion segmentation using an ensemble of different image processing methods. Diagnostics 2023, 13, 2684. [Google Scholar] [CrossRef] [PubMed]
- Wali, A.; Ahmad, M.; Naseer, A.; Tamoor, M.; Gilani, S. Stynmedgan: medical images augmentation using a new GAN model for improved diagnosis of diseases. Journal of Intelligent & Fuzzy Systems 2023, 44, 10027–10044. [Google Scholar]
- Saeed, M.; Naseer, A.; Masood, H.; Rehman, S.U.; Gruhn, V. The power of generative ai to augment for enhanced skin cancer classification: A deep learning approach. IEEE Access 2023, 11, 130330–130344. [Google Scholar] [CrossRef]
- Wali, A.; Naseer, A.; Tamoor, M.; Gilani, S. Recent progress in digital image restoration techniques: a review. Digital Signal Processing 2023, 104187. [Google Scholar] [CrossRef]
- Zhong, X.; Tang, J.; Yepes, A.J. Publaynet: largest dataset ever for document layout analysis. 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019, pp. 1015–1022.
- Hao, L.; Gao, L.; Yi, X.; Tang, Z. A table detection method for pdf documents based on convolutional neural networks. 2016 12th IAPR Workshop on Document Analysis Systems (DAS). IEEE, 2016, pp. 287–292.
- Göbel, M.; Hassan, T.; Oro, E.; Orsi, G. ICDAR 2013 table competition. 2013 12th International Conference on Document Analysis and Recognition. IEEE, 2013, pp. 1449–1453.
- Zhong, X.; ShafieiBavani, E.; Jimeno Yepes, A. Image-based table recognition: data, model, and evaluation. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16. Springer, 2020, pp. 564–580.
- Paliwal, S.S.; Vishwanath, D.; Rahul, R.; Sharma, M.; Vig, L. Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019, pp. 128–133.
- Agarwal, M.; Mondal, A.; Jawahar, C. Cdec-net: Composite deformable cascade network for table detection in document images. 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 9491–9498.
- Prasad, D.; Gadpal, A.; Kapadni, K.; Visave, M.; Sultanpure, K. CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 572–573.
- Arif, S.; Shafait, F. Table detection in document images using foreground and background features. 2018 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2018, pp. 1–8.
- Nguyen, D.D. TableSegNet: a fully convolutional network for table detection and segmentation in document images. International Journal on Document Analysis and Recognition (IJDAR) 2022, 25, 1–14. [Google Scholar] [CrossRef]
- Sun, N.; Zhu, Y.; Hu, X. Faster R-CNN based table detection combining corner locating. 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019, pp. 1314–1319.
- Siddiqui, S.A.; Fateh, I.A.; Rizvi, S.T.R.; Dengel, A.; Ahmed, S. Deeptabstr: Deep learning based table structure recognition. 2019 international conference on document analysis and recognition (ICDAR). IEEE, 2019, pp. 1403–1409.
- Siddiqui, S.A.; Malik, M.I.; Agne, S.; Dengel, A.; Ahmed, S. Decnt: Deep deformable cnn for table detection. IEEE access 2018, 6, 74151–74161. [Google Scholar] [CrossRef]
- Kim, J.; Hwang, H. A rule-based method for table detection in website images. IEEE Access 2020, 8, 81022–81033. [Google Scholar] [CrossRef]
- Li, Y.; Gao, L.; Tang, Z.; Yan, Q.; Huang, Y. A GAN-based feature generator for table detection. 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019, pp. 763–768.
- Li, M.; Cui, L.; Huang, S.; Wei, F.; Zhou, M.; Li, Z. Tablebank: Table benchmark for image-based table detection and recognition. Proceedings of the Twelfth Language Resources and Evaluation Conference, 2020, pp. 1918–1925.
- Gao, L.; Huang, Y.; Déjean, H.; Meunier, J.L.; Yan, Q.; Fang, Y.; Kleber, F.; Lang, E. ICDAR 2019 competition on table detection and recognition (cTDaR). 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019, pp. 1510–1515.
- Shahmirzadi, O.; Lugowski, A.; Younge, K. Text similarity in vector space models: a comparative study. 2019 18th IEEE international conference on machine learning and applications (ICMLA). IEEE, 2019, pp. 659–666.
- Latif, A.; Rasheed, A.; Sajid, U.; Ahmed, J.; Ali, N.; Ratyal, N.I.; Zafar, B.; Dar, S.H.; Sajid, M.; Khalil, T. Content-based image retrieval and feature extraction: a comprehensive review. Mathematical problems in engineering 2019, 2019. [Google Scholar] [CrossRef]



Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).