Automatic Ink Mismatch Detection in Hyper Spectral Images Using K-means Clustering

— Hyper spectral imaging (HSI) is a technique that is used to obtain the spectrum for each pixel in the image. It helps in ﬁnding objects and identifying materials etc. Such an identiﬁcation is very difﬁcult using other imaging techniques. It allows the researchers to investigate the documents without any physical contact. Nowadays detection of unequal Ink mismatch based on HSI has shown vast improvement in distinguishing the inks. Detection of unequal Ink mismatch is an unbalanced clustering problem. This paper used K-means Clustering for ink mismatch detection. K-means Clustering ﬁnd same subgroups in the data based on Euclidean distance. This paper demonstrates performance in unequal Ink mismatch based on HSI.


I. INTRODUCTION
One material can be differentiated from any other material by a unique spectral signature. The human eye can easily distinguish various colors. It is difficult to distinguish between two similar coloured inks as they lie close together in the visible spectrum [1] [2]. However these unique inks have different spectral signatures. This spectral property is utilized to detect a forgery in document images as it tends to be recognized whether a document is unique or manipulated by applying automatic ink mismatch techniques.
The importance of inks analysis is to address significant issues about document images. Hyper spectral images consist of various spectral bands which are useful for automatic ink mismatch detection. It will also be helpful in forgery detection. On the basis of spectral signature inks can be used to accomplish many facts like forgery, ink aging and fraudulent document [3]. It is based on the assumption that how the forgery has been done with different ink or pen. Ink mismatch also plays a vital role in cheque verification in banks, degree testing in universities and different important papers of government offices as well. From past many years researchers have paid attention to propose different techniques for detection of ink mismatch using HSI and multi spectral images.
The ink mismatch technique has typically two methods i.e. destructive and non-destructive analysis [2] [3]. The destructive analysis such as thin-layer chromatography [4] is a chemical solution-based analysis that has been used for the detection of ink mismatch by forensic documents experts. The destructive method has several drawbacks that includes failure to retrieve damage to the document. It is time consuming as it needs a large amount of measurement to be taken [3]. On the other hand, HSI is an efficient tool for non-destructive and non-contact examination of forensic documents to overcome such limitations [3] [5]. HSI is a technique that combines spectroscopy and imaging. In this each image is acquired at a narrow band of the electromagnetic spectrum to capture detailed spectral data. Thus, HSI reveals the unseen details in an image without getting in direct contact with it. Hence non-destructive method is preferred .
Easton et al. [6] presented one of the primary works in multi spectral imaging. Spectral imaging framework i.e. Eureka Vision was established by Christens-Barry et al. [7] Such frameworks are helpful for forensic experts in ink investigation. Ink investigation using a band-by-band assessment of multi spectral images through visual evaluation is tedious. It requires to be physically seen by the analyst under each frequency of light. An advanced and complex HSI system for assessment of forensic document was made by the National Archives of the Netherlands [8]. It gave high spatial and high spectral resolution images which were captured from spectral range (from close to UV to IR). But such forensic documents require near fifteen minutes of exposure to be captured. This HSI system was very powerful but long acquisition time restricts the use of such a system [9]. Hedjam et al. [21] proposed a mathematical model for improving the meaningfulness of very crumbled text. A few techniques for ink mismatch identification dependent on HSI examination have been proposed in the most recent decade [19]. These techniques incorporate fuzzy c-means clustering [3], k-means clustering [5], localized hyper spectral image analysis [16], and deep convolutional network [22].
Abbas et al. [2] used HSI un mixing scheme for ink mismatch detection. It first emply Hyper spectral subspace identification by minimum error algorithm (HySime) [11] for dimensionality reduction .It approximates the number of signatures present in the HSI documents. Secondly, Minimum volume enclosing simplex (MVES) [12] algorithm is employed on a reduced dimensional data. It extracts the hidden end members and corresponding abundances from its hyper spectral observations. This paper used K-means Clustering technique for automatic ink mismatch detection.K-means is Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 28 June 2020 doi:10.20944/preprints202006.0341.v1 a used as a partitional clustering algorithm. It divides the total number of samples into k different group with a condition that groups cannot be greater than number of samples. This employ unsupervised learning technique which find same subgroups in the data and then group the similar data in one cluster on the basis of its similarity, intensity or other features.

II. RELATED WORK
Hyper spectral images consist 100's of bands and each band is rich in information rich hence are useful than multi spectral images. The recent literature shows a high potential of HSI and Spatio-spectral features for document analysis and forensics.
In this context, Morales et al. [11] proposed an approach for ink analysis in pen verification and handwritten documents using Least Square Support Vector Machine (SVM) classification. Silva et al. [14] developed a non-destructive method to detect fraud in documents based on different chemo metric techniques. Khan et al. [15] proposed a joint sparse band selection based hyper spectral imaging document analysis technique to distinguish different metameric inks. Abbas et al. [2] proposed hyper spectral unmixing for ink mismatch detection. Our main focus is to distinguish visually similar inks which are mixed in varying proportions to form an unbalance clustering problem.
Jaleed et al. [3] proposed an efficient automatic ink mismatch detection technique using multi spectral image analysis. Ink pixels are segmented using local thresholding and Fuzzy C-Means Clustering (FCM) . Luo et al. [16] proposed a system for localized forgery detection using anomaly detection algorithm combined with unsupervised learning to handle the cases where the pixels belonging to different classes are highly unbalanced. Aythami et al. [20] proposed a system to detect forgeries in hand written documents particularly for bank cheques based on ink discrimination.
Braun et al. [17] proposed Fourier transform based HSI system to detect forgery. They used fuzzy clustering to group the similar ink spectra. The experiments show that inks can be qualitatively segmented into two clusters. One limitation of the system was the lack of quantitative information and slow imaging process. A visual comparison of black inks done by Hammond et al. [13] using multi spectral document imaging. George et al. [18] used HSI for visual enhancement of documents by separating the text written using two different inks in two directions. Khurshid et al. [19] proposed a CNN based ink mismatch detection method for HSDIs that employs a combination of spectral and spatial features of ink pixels for classification.

III. DATASET
The dataset consists of a hyper spectral cube which has a spatial resolution of 627 by 81 pixels with total 33 bands. Each band represent a gray scale handwritten image. The HSI is taken in the spectral range from 400 nm to 720 nm with a step size of 10 nm which results in 33 bands. A hyper spectral image consists of a phrase "The quick brown fox jumps over the lazy dog" written with either black ink pen or blue ink pen. Furthermore, each pen originated from various brand to ensure that they include varieties inside their ink regardless of whether they have externally same color. Fig.1. shows the 33 bands as 33 gray scale images having handwritten phrase "The quick brown fox jumps over the lazy dog" .

IV. METHODOLOGY
Clustering algorithm is an unsupervised algorithm that is used to partition the data set into clusters (subsets). K-means clustering is one of the most useful unsupervised algorithms. It is simple and less computationally expensive. It clusters the data into k clusters where k is the number of clusters which has to be assigned initially. As it is unsupervised algorithm it is used when un-labelled data is available which has not ground truth.
The clustering algorithm minimizes the squared error between a cluster centroid and its members. This implies that the k-means algorithm tries to optimize the objective function shown in equation 1. In each iteration has to result in better solution, the algorithm always converge. The number of clusters varies with the number of mixed inks. Previous work include the assumption that there are two inks in the image. An implication of this assumption could be that an image with more than two inks could still be grouped into two clusters. Selecting appropriate number of clusters play a vital part in correct segmentation.
The proposed methodology is illustrate in Fig. 2. First, we have images in form of gray scale so, we stacked the images and create a hyperspectral cube. In the next step we apply Kmeans algorithm on hyperspectral cube. By using the labels and cluster centres, we generate a color image which can easily classify text written with different inks.

A. Visualization of Bands
The data consists of a hyper spectral cube which is of size 81x627x33 as 33 PNG images in place of the 33 bands of a hyper spectral image in gray scale form. The document  Signatures are one of the most widely recognized method to verify a record. Financial organizations use signatures for checking individual identity in financial and regulatory exchanges. The utilization of signature as a validating source in everyday life legitimizes the need of a complete authentication system. Spectral signature is the variation of reflectance of a material with respect to wavelengths (i.e., reflectance/ as a function of wavelength). Spectral response of three pixel of HSI image is shown in Fig. 4 as well as the whole spectral response is shown in Fig. 5

C. Applying K-means Clustering
In the next step K-means clustering is applied with number of cluster is 3 i.e. k=3. Each cluster is label with different color such as cluster 1,2 and 3 has red, blue and green colors respectively. Finally visualize the K-means clustering algorithm as shown in Fig. 6   HSI has great potential for determining the legitimacy in forensic document examination. We used K-means clustering for automatic ink mismatch detection. This technique displayed great potential in unequal Ink mismatch detection. It is simple to implement, computationally faster and it guarantees convergence. One of the limitations of this technique is manually choosing the no of cluster i.e. "k" and it is sensitive to outliers as well.

VII. FUTURE WORK
In future implementation of different thresholding techniques i.e. Sauvola, Niblack and Global thresholding can improve the results. There is a high probably that changing value of k (clusters) can improve the performance of proposed technique. However, this limitation can also be overcome by using deep network or CNN based end to end manner network in future work. We hope that the results presented in this paper will be more motivating for the researchers to explore new and exciting challenges [23] towards study of forensic documents.