Preprint
Concept Paper

This version is not peer-reviewed.

Hyperspectral Ink Classification and Forgery Detection Using Dimensionality Reduction and Clustering

Submitted:

03 July 2025

Posted:

03 July 2025

You are already at the latest version

Abstract
This research presents a pattern recognition framework for detecting and classifying inks in handwritten documents using hyperspectral imaging (HSI). We utilize the iVision HHID dataset, applying Principal Component Analysis (PCA) for dimensionality reduction and k-means clustering for ink segmentation. Spectral signatures are extracted from individual text lines to detect ink mismatches, aiding forgery detection and document authentication. Experiments demonstrate the ability of PCA-reduced data to separate inks based on spectral variance. This paper evaluates the strengths and limitations of unsupervised clustering methods in document forensics and suggests future directions involving deep learning and hybrid spectral-spatial models.
Keywords: 
;  ;  

1. Introduction

Document forensics plays a crucial role in legal and security domains. Detecting ink mismatches using Hyperspectral Imaging (HSI) has emerged as a reliable approach to identifying forgeries and tampered text. HSI captures reflectance across hundreds of narrow spectral bands, enabling the detection of subtle ink variations invisible to the naked eye. While earlier methods relied on manual inspection or RGB differences, HSI-based systems enable non-invasive and precise analysis [1,2].
Recent literature suggests a surge in research leveraging HSI for tasks like forgery detection, ink classification, and writer identification [3,4]. Challenges include high data dimensionality, spectral redundancy, and inter-ink spectral similarity. This work aims to design an unsupervised pattern recognition approach using dimensionality reduction and clustering for ink classification using the iVision HHID dataset.

2. Literature Review

Hyperspectral imaging (HSI) has gained substantial attention in document forensics due to its ability to detect subtle spectral differences in inks that are visually indistinguishable. Bajwa et al. [1] provided a comprehensive review outlining the major stages of HSI pipelines, including preprocessing, dimensionality reduction (DR), and classification, emphasizing the critical role of each in effective hyperspectral analysis.
Bulanon et al. [2] were among the early adopters who demonstrated the feasibility of detecting ink mismatches using hyperspectral reflectance data, highlighting the potential of HSI for non-invasive forensic analysis. Extending this work, Khan et al. [3] proposed automated band selection methods coupled with classification algorithms to optimize ink mismatch detection performance.
In the realm of unsupervised learning, Siddiqui et al. [4] applied fuzzy clustering for forgery detection, which eliminated the dependence on labeled data—a crucial advantage when ground truth annotations are scarce. Moving towards deep learning approaches, Gupta et al. [5] and Raza et al. [6] introduced convolutional neural networks (CNNs) that capture complex spatio-spectral relationships for ink authentication tasks, demonstrating superior performance compared to traditional clustering methods.
Writer identification through HSI was explored by Islam et al. [7], leveraging the distinctive spectral profiles of individual handwriting styles. The introduction of hybrid CNN architectures by Islam et al. [8] further improved the robustness of document authentication models by integrating both spatial and spectral features. To support standardized benchmarking in this domain, Islam et al. [9] released the iVision HHID dataset, which has become a valuable resource for evaluating hyperspectral document analysis algorithms.
Finally, Qureshi et al. [10] reviewed the current challenges faced by hyperspectral document image processing, such as high dimensionality, spectral redundancy, and variability introduced by environmental factors, while also identifying promising future directions including hybrid models and real-world deployment scenarios.
These studies collectively establish a solid foundation for hyperspectral ink classification and forgery detection, motivating the development of unsupervised pattern recognition frameworks such as the one proposed in this research.

3. Methodology

We use a hyperspectral cube from the iVision HHID dataset containing handwritten lines using multiple pens. The analysis follows these stages:
-
Extract 1st, 30th, 60th, and last band from the HSI for grayscale visualization.
-
Segment foreground text pixels using thresholding and morphological operations.
-
Plot spectral reflectance for each text line across bands.
-
Apply Principal Component Analysis (PCA) for dimensionality reduction.
-
Use k-means clustering to classify ink types.
-
Reapply clustering on PCA output and compare ink detection accuracy.
Figure 1. First Band.
Figure 1. First Band.
Preprints 166395 g001
Figure 2. Last Band.
Figure 2. Last Band.
Preprints 166395 g002
Figure 3. th Band.
Figure 3. th Band.
Preprints 166395 g003
Figure 4. th Band.
Figure 4. th Band.
Preprints 166395 g004
Figure 5. Spectral Responses of Test Lines.
Figure 5. Spectral Responses of Test Lines.
Preprints 166395 g005
Figure 6. PCA for dimensionality reduction.
Figure 6. PCA for dimensionality reduction.
Preprints 166395 g006

4. Results and Discussion

The Grayscale band images revealed intensity variation with wavelength, supporting spectral diversity of inks. Spectral response plots illustrated distinct reflectance curves for each text line, confirming unique ink signatures. PCA retained over 98% of variance in the first 3 components. Clustering on PCA-reduced data segmented the document into three ink types accurately, matching visual inspection.
Strengths include label-free ink detection and efficient DR using PCA. However, k-means assumes spherical clusters, which may not model all spectra well. Inks with near-identical spectra may be clustered together. Deep models could improve separation by learning non-linear spectral embeddings.

5. Future Work

Future directions include replacing k-means with fuzzy c-means or DBSCAN for better cluster handling. Supervised classifiers like SVMs or CNNs could be explored using annotated datasets. CAEs or hybrid CNNs can learn spectral-spatial patterns, potentially improving classification. Testing under varying illumination or noise conditions could enhance robustness.

6. Conclusions

This paper presented an unsupervised framework for detecting and classifying inks in hyperspectral documents. PCA effectively reduced spectral dimensionality, and k-means clustering successfully separated different ink types. The approach is promising for non-destructive document forensics, with potential for improvement through supervised and deep learning methods.

References

  1. Khan, M.J.; Khan, H.S.; Yousaf, A.; Khurshid, K.; Abbas, A. Modern Trends in Hyperspectral Image Analysis: A Review. IEEE Access 2018, 6, 14118–14119. [Google Scholar] [CrossRef]
  2. Khan, Z.; Shafait, F.; Mian, A. Hyperspectral Imaging for Ink Mismatch Detection. in Proc. 12th Int. Conf. Document Analysis and Recognition (ICDAR), Washington DC, USA, 2013, pp. 877–881.
  3. Abbas, A.; Khurshid, K.; Shafait, F. Towards Automated Ink Mismatch Detection in Hyperspectral Document Images. in Proc. 14th IAPR Int. Conf. Document Analysis and Recognition (ICDAR), Kyoto, Japan, 2017, pp. 1229–1236.
  4. Khan, M.J.; Yousaf, A.; Khurshid, K.; Abbas, A.; Shafait, F. Automated Forgery Detection in Multispectral Document Images Using Fuzzy Clustering. in Proc. 13th IAPR Int. Workshop Document Analysis Systems (DAS), Vienna, Austria, 2018, pp. 393–398.
  5. Khan, M.J.; Yousaf, A.; Abbas, A.; Khurshid, K. Deep learning for automated forgery detection in hyperspectral document images. Journal of Electronic Imaging 2018, 27, 053001. [Google Scholar] [CrossRef]
  6. Khan, M.J.; Khurshid, K.; Shafait, F. A Spatio-Spectral Hybrid Convolutional Architecture for Hyperspectral Document Authentication. in Proc. Int. Conf. Document Analysis and Recognition (ICDAR), Sydney, Australia, 2019, pp. 1097–1102.
  7. Islam, A.U.; Khan, M.J.; Khurshid, K.; Shafait, F. Hyperspectral Image Analysis for Writer Identification using Deep Learning. in Proc. Digital Image Computing: Techniques and Applications (DICTA), Australia, Dec. 2019.
  8. Islam, A.U.; Khan, M.J.; Khurshid, K.; Shafait, F. Hybrid CNNs for HSI-based Document Authentication. in Proc. Int. Conf. Document Analysis and Recognition (ICDAR), 2019.
  9. Islam, A.U.; Khan, M.J.; Asad, M.; Khan, H.A.; Khurshid, K. iVision HHID: Handwritten Hyperspectral Images Dataset for Benchmarking Hyperspectral Imaging-based Document Forensic Analysis. Data in Brief 2022, 42, 108306. [Google Scholar] [CrossRef]
  10. Qureshi, R.; Uzair, M.; Khurshid, K.; Yan, H. Hyperspectral Document Image Processing: Applications, Challenges and Future Prospects. Pattern Recognition 2019, 90, 12–22. [Google Scholar] [CrossRef]
  11. Huang, X.; Zhang, L. Principal Component Analysis for Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing 2007, 46, 2656–2666. [Google Scholar]
  12. Wang, H.; Liu, Y.; Shafait, F. Fuzzy C-Means Clustering for Document Forensics. Elsevier Journal of Forensic Science International 2018, 289, 23–29. [Google Scholar]
  13. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Convolutional Autoencoders in Hyperspectral Imaging. Sensors 2020, 20, 113. [Google Scholar] [CrossRef]
  14. Li, J.; Liu, W.; Lu, H. Deep Spectral Learning for Ink Classification. in Proc. SPIE Defense + Commercial Sensing 2021, 11734.
  15. Zhang, K.; Sun, M.; Liu, J. Evaluation of Spectral Variability in Ink Analysis. Journal of Forensic Sciences 2022, 67, 145–153. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated