A Privacy-Preserving Fully Homomorphic Encryption and Parallel Computation Based Biometric Data Matching

One of the most reliable methods of authentication used today is biometric matching. This authentication process, which is done by using biometrics information such as fingerprint, iris, face, etc. is used in many application areas. Authentication at border gates is one of these areas. However, some restrictions have been introduced to storing and using such data, especially with the General Data Protection Regulation (GDPR). The main goal of this work is to find the practical implementation of fully homomorphic encryption-based biometric matching in border controls. In this paper, we propose a biometric authentication system based on hash expansion and fully homomorphic encryption features, considering these restrictions. One of the most significant drawbacks of the homomorphic encryption method is the long execution time. We solved this problem by executing the matching algorithm in parallel manner. The proposed scheme is implemented as proof-of-concept in the SMILE, and its advantages in privacy preservation has been demonstrated.


Introduction
Biometrics matching systems are used for the identification of persons by using fingerprints, iris, and palm, etc. The biometrics matching algorithm uses a plain version of the sensitive biometrics data to achieve high-performance matching for accurate identification [1,2]. Today, many organizations, especially border control police, use such identification system to verify travelers [3]. The main drawbacks of these biometrics datasets are privacy and security concerns. All of these methods require the full right to access private and confidential data to create a feature template or data matching model [4].
Moreover, through applying various anonymization techniques to confidential data sets to protect privacy, an adversary can still retrieve the confidential data via several methods [5][6][7]. Imagine a scenario in which travelers want to approve their identity in border control. In order to prove their identity, the travelers have to share their sensitive biometrics data with the border control police. Due to some vulnerabilities in the identity control system, there is a possibility that the biometric data of travelers may be exposed. In such a case, it is essential to find a new privacy-based matching algorithm for identity matching that can run on confidential biometrics data without retrieving the private data.
In biometrics, a matching algorithm consists of two sequential steps: (1) feature template extraction and storing and (2) new biometric feature template matching with the previously stored biometrics data [8]. The feature template extraction and storing phase of a biometrics matching algorithm creates biometric data representation vector x in enrolment from a user U . In the data validation stage, a new biometric data y is extracted with claimed user identity U , and then the system tries to find the how similar both vectors x, y using a distance metrics d are. The traditional biometrics matching algorithms require direct access to plain data, which poses security and privacy risks, as mentioned above.
In this research, we propose a privacy-preserving biometric data matching model building method based on a fully homomorphic encryption scheme that constructs the feature template from encrypted biometrics data for a cloud system.
The contributions of our work are as follows: • A fully homomorphic encryption scheme based biometrics data storing and matching system is proposed, and thus a privacy-preserving matching model is achieved. • A cloud system does the computation of the distance metrics. Thus the model is very power efficient for mobile devices. • We applied a hash expansion based secondary privacy-enhancing technique.
• Parallel execution techniques are applied to reduce the execution time of homomorphic encryption-based matching.
Our paper is organized as follows: in Section 2, we briefly introduce some of the related works for privacy-preserving training. In Section 3, we describe the Microsoft SEAL, fully homomorphic encryption, parallel computation. In Section 4, we describe the proposed secure biometric data storing and data matching method. Section 5 evaluates our proposed model. Section 6 concludes the paper.

Related Work
In this section, we review the existing works that focus privacy-preserving biometrics matching methods. We highlight individual differences between our proposed matching model and the current research. Encryption methods can be employed to address the security and privacy concerns in biometric data matching [9,10].
Gunasinghe et al. [11] proposed a machine learning-based classification technique. The authors mainly focus on privacy-preserving and user-centric authentication to overcome the drawbacks of the existing biometrics-based authentication models such as storing and transmitting the biometrics data, sensitive data sharing between service providers. Their approach preserves privacy by using zero-knowledge proof of knowledge and a secret provided by the user. They state that they must improve performance optimization for mobile devices.
Im et al. [12] proposed a face-feature based authentication system for mobile phones. They use Yao's garbled circuit based secure-multi party computation to secure the biometric data. The main difference between the previous approach is the usage of client-server based ResNet face recognition model. According to their experimental results, their solution has an Equal Error Rate (EER) of 3.04% with 1.3 seconds matching duration with two publicly available face datasets. Their approach is more complicated than our approach in terms of biometric matching.We use distance metrics to match the fingerprint biometrics.
In another study, the authors proposed a new system based on biometric data encryption and matching algorithms, and they introduced perturb items in every biometric data [13]. They converted the matrix multiplications to vector-matrix multiplications.
Zhang et al. [14] proposed scalable, efficient,privacy-preserving fingerprint authentication using minutiae representation based on Yao's classic Garbled Circuit (GC) protocol. The optimized implementation achieved in their study requires 210 KB for storage.
In another study, Tian et al. [15] proposed a biometric data based remote user authentication system for a cloud server. Their approach can be applied to the honest-but-curious server in an anonymous and unlinkable manner.
Hahn et al. [16] proposed how an attack enrolls fake biometric data and then manipulates them to recover encrypted an identification request in CloudBI. In their paper, authors define an attack scenario at enrollment stage. In this stage, the attacker randomly generate 8-bit integers and send them 3 of 16 to the cloud server to extract the recorded biometrics data. Next, they offered a useful security patch to CloudBI, which is secure against enrollment-level attackers. They introduce more randomness in encrypting the biometric data to find the distance between two biometric data.
Yang et al. [17] proposed several private credential management work models under different trust models between a user and an external party. Their main advantageous parts are feature agnosticism, privacy-enhanced biometric template, and outsourced authentication.

Preliminaries
In this section, we briefly introduce preliminary information about the fully homomorphic encryption concept, parallel computation methods, and biometric data matching.

Fully Homomorphic Encryption
Homomorphic encryption schemes allow arithmetic operations on plaintexts to be performed on their corresponding ciphertexts without exposing the plaintexts when data are shared between two or more individuals as it allows arithmetic operations with ciphertexts such as addition, multiplication. One can say that a public-key encryption scheme such as Paillier is additively homomorphic if, given two encrypted data such as x and y , there exists a public-key summation operation ⊕ such that x ⊕ y is an encryption of the plaintext of x + y. The formal explanation is that an encryption scheme is additively homomorphic if for any private key, public key (key priv , key pub ), the plaintext space P = Z N for x, y ∈ Z N .
Dec Enc x + y mod N; key pub ; key priv = x + y Also, one can say that a public-key encryption scheme such as RSA is multiplicative homomorphic if, given two encrypted data such as x and y , there exists a public-key multiplicaltion operation ⊗ such that x ⊗ y is an encryption of the plaintext of x × y. The formal explanation is that an encryption scheme is multiplicative homomorphic if for any private key, public key (key priv , key pub ), the plaintext space P = Z N for x, y ∈ Z N .
Dec Enc x × y mod N; key pub ; key priv = x × y The homomorphic encryption schemes run only on integer numbers. Therefore, the proposed protocols handle only integers, although biometrics matching is typically applied to continuous data. In the case of an input data set with real numbers in the protocol, we need to map floating-point input data vectors into the discrete domain with a scaling function.
Let ConvertToInteger : R m → Z m be the corresponding function that multiplies its floating point number argument by an exponent (K : 2 K ) and then rounds it to the nearest integer value and thus supports finite precision. Eq. 3 shows the conversion function:

Parallel execution
The main drawback of the homomorphic encryption algorithms is time complexity. To reduce the execution time of biometric matching, we applied block-stripped decomposition to reduce the biometric matching execution time [18]. In this study, we used parallel computation methods in order to overcome this problem. We used the Euclidean distance to match the two biometric data. Consider the problem of computing the distance of two vectors x (1) and x (2) ∈ R n using the Euclidean distance. The standard Euclidean distance metric is defined as: We can apply Block-Striped Decomposition [19] to find the overall distance between two vectors for parallel execution of the Euclidean distance computation. Our parallel execution solution is to divide the whole distance computation into T smaller computations, where T is the number of CPU cores available on a single computer. Accordingly, the vectors x (1) and x (2) can be partitioned into T sub-vectors. As a result, a possible approach for parallelizing the Euclidean distance calculation is to define the calculation task as the problem of calculating the distance between two vectors. To carry out all the necessary calculations of each subtask, must contain sub-partition of vector x (1) and x (2) . Figure  1 shows the distributed and parallel computation of the distance calculation using several CPU cores. The data fusion will find the sqrt of the total value for each vector on each CPU core T.
Processor-T execution We can define the expanded form of Euclidean distance as each element of both vectors is subtracted from each other, and the result is squared. Since homomorphic encryption is used in this study, all calculations are done in the encrypted domain. In homomorphic encryption, even if a number cannot be squared, the square of a number can be obtained by with multiplying the number with itself The computations carried out by each processor denoted by d p as shown in Equation 5.

System model
The main goal of this work is to find the practical implementation of fully homomorphic encryption-based biometric matching in border controls, shown in Figure 5-6. The biometric matching system consists of two phases: (1) enrollment, and (2) matching stage. The identification system performs the database query during the enrollment stage of the model by the acquisition of unique fingerprint images. The fingerprint images are recorded in the cloud biometric database with SHA256 based hash expansion and The Brakerski/Fan-Vercauteren (BFV) based somewhat homomorphic encryption scheme [20]. In the matching stage, the model identifies the person based on the query fingerprint images by matching the distance metrics of fingerprint features of the enrollment images, saved in the biometric template database. Figure 2 shows the proposed privacy-preserving biometric matching system.
The main drawback of the homomorphic encryption algorithms is time complexity. To reduce the execution time of biometric matching, we applied block-stripped decomposition to reduce the biometric matching execution time.

Security Model
In this work, the aim is to enable a mobile device to authenticate a traveler with a public cloud system to cooperatively match the biometric data without retrieving the traveler's own confidential biometric data. Our primary assumption is that the input biometric data is matched with the user's feature template, which has been stored at the enrollment stage in the cloud. Our other assumption is that both mobile device and cloud system follow our protocol; this is called a semi-honest security model [21,22].

Enrollment
Fingerprint enhancement techniques are necessary steps for the feature extraction and fingerprint matching process. We applied several fingerprint enhancement techniques, including image rotation and edge detection. The first stage is image rotation representing all fingerprint images in the same direction, then, we implemented the Harris corner based feature extraction for fingerprint detection. Figure 3 shows the Harris corner detection for an example fingerprint. The original fingerprint images are cropped and resized to 30 × 28 pixels for the actual representation.
The extracted corners are transformed into the feature vectors and applying them to the binarization technique with a threshold value. Our feature vector contains only 0s and 1s to represent a feature template for the fingerprint images, denoted as x ∈ R n .   Figure 5 shows the general architecture of the enrollment stage. We applied the random number generator (RNG) based hash expansion and fully homomorphic encryption schemes to biometrics data in order to enhance privacy.
During the enrollment stage, the user U who wants to register himself to the border control system provides his biometrics data including fingerprint, face, and iris, from which a feature extracted representation of x ∈ R n using a mobile device is generated. The RNG generates a unique random number r for the user U , transmits it to the mobile device, and stores it in its internal RNG database. In order to enhance the privacy and to prevent data leakage of private biometrics data of user U , we applied hash expansion technique. A random number for each user in the biometrics control system is created uniquely using the RNG. The RNG generated random number r will be expanded as the same size of feature template x, then the final perturbation vector p is created. Figure 4 shows the proposed hash expansion technique.
Algorithm 1 explains the detailed steps of the RNG-based random vector generation. The mobile device then adds these two vectors to increase the privacy protection before encryption of the input biometrics data. We can represent this transformation with a function f and initial random key for user U as; where f is transformation function, x is feature template and r U initial random number for user U .  This perturbation technique allows us to calculate the distance (i.e., biometrics template matching) in the transformed version in the same value; where d is the distance function. The perturbation enables us to increase the privacy protection of the users' biometrics.
The second step in the enrollment stage is encryption. The encryption stage starts with the key generation. The mobile device creates public/private key pairs and protects its encryption key internally. The encryption key is not accessible to external users or systems and can only be accessed by itself. The mobile device creates public/private key pairs to protect the perturbated version of the feature template vector x per . The perturbated template vector is encrypted ( x per ) using the public encryption key and transmitted to the encrypted cloud biometrics database with user id U . The database stores an encrypted template. Figure 5 shows the enrollment stage of the proposed system. As shown in the figure, there are three components in this stage; the mobile device, RNG, and the encrypted cloud biometrics database. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 July 2020 doi:10.20944/preprints202007.0658.v1

Algorithm 1 RNG seed based perturbation vector creation
Input: RNG seed r, vector size n, mod p /* create zero vector rng_gen ∈ R n */ 1: rng_gen = zeros(1,n) /* the first item is the SHA256 of r */ 2: rng_gen[0] = Hash(r) mod p LOOP Process 3: for (i = 1; i < n; i + +) do 4: rand_total = 0.0 {/* temporary variable for previous items addition */} 5: for (j = 0; j < i; j + +) do 6: rand_total += rng_gen[j] mod p If an attacker has access to n random numbers, then it is impracticable to produce the next (n + 1)th random number. If the random number generator is protected, then it is impossible to get the following number. The attacker has to have the initial number of r U , which is assigned and saved in the generator. Moreover, the hashing function prevents the attacker from extracting the previous random number. All hash methods are a one-way function. It is impossible to obtain the original data from the hash value.

Matching
During the matching stage, the mobile device executes the following mechanism: (1) user's biometrics data is extracted into representation vector y, (2) initial random number r U for user U that is stored in the enrollment stage is obtained from random number generator, (3) the perturbation vector p U are generated using Algorithm 1, (4) the biometrics template y is perturbated using perturbation vector p U , (5) perturbated biometrics data vector y per is encrypted using device's own public encrypted key key pub , (6) the obtained encrypted and perturbated biometrics data y per with claimed user identity U and key pub are sent to cloud biometrics database.
The cloud biometrics database (server) calculates the distance score of the both biometrics data d( x per , y per , key pub ) using the Euclidean distance ( ∑ n i=1 (q i − p i ) 2 ). The result of this calculation is in the encrypted domain, and the cloud server cannot get the plain version of the distance value. The decryption key required for biometric matching is only available on the mobile client. In this study, the server does not trust the mobile client in the same way. Therefore, instead of sending distance metrics directly to the mobile client, a client puzzle is added to the distance metrics which is shown in Figure 7. Then the server transmits it to the mobile client to protect the distance from manipulation.
In order to build a client puzzle, two random number r 0 and r 1 are created and encrypted using pub key , and the distance metrics are transformed using Equation 10.
The server transmits the encrypted puzzle value to the mobile device. The mobile device decrypts the puzzle value and transmits it into the cloud server. The cloud server gets the matching score using Equation 11.

Fully Homomorphic Encryption
Encrypted Cloud Biometrics Database Fingerprint Figure 5. In enrollment stage, the mobile device expands the biometric feature vector x, encrypt the hashed feature template with the Key pub and transmits the vector to the encrypted cloud database with the user id c. Figure 6 shows the sequence diagram of the matching phase.

Results
The main goal of this work is to explore the practical implementation of a fully homomorphic encryption scheme based biometric data matching for border controls. This process should be completed in less than 2 seconds. In the experimental results section, we will show the execution results in both sequential and parallel implementation of the proposed method. As explained before, the biggest drawback of all fully homomorphic encryption schemes is execution time. We applied the image reduction method for sequential and parallel implementations of the Euclidean distance calculation to increase the time performance of the proposed method.
To evaluate the experimental results of an encrypted biometric matching system, we have implemented our model using Python language.

Data preprocessing
In this work, we used the Sokoto Coventry Fingerprint Dataset (SOCOFing) biometric fingerprint database to show our work efficiency [23]. Figure 8 shows the example fingerprints images. Each fingerprint picture is recorded in 96 × 103 dimension and grayscale format. These fingerprint images have been converted from matrix format to vector format (R 96×103 → R 9888 ), and the final feature vector for each fingerprint images contains 6776 features after fingerprint enhancement techniques applied.
The high-dimensional dataset is one of the highest computation costs in biometric data matching processes. Although parallel calculation methods are used in this study, total computation time can be reduced by applying size reduction methods. We resize the fingerprint images to reduce the dimensions.
Although there are some other dimensionality reduction techniques such as principal component analysis (PCA), we just resized all fingerprint images then convert them into a compressed vector representation. The Harris corner detection method is applied [24] to find the corners. The function runs the Harris corner detector on the image. For each pixel (x, y), it calculates a 2 × 2 gradient covariance matrix M (x,y) over a blockSize × blockSize neighborhood. Then, it computes the following characteristic: Corners in the image can be found as the local maxima of this response map 1 . After these operations are applied, the matrix size of each fingerprint image file is 30 × 28. The size of the vector created after flattening the matrix is 840. Thus, the original vector length decreases from 9888 to 840. As a result, the newly created vector's is 1 in 12 of the initial vector, reduces the size of the encrypted data, and consequently shorten the calculation time.
We applied the BFV based implementation of Microsoft SEAL library to homomorphically encrypted representation, ( x ), of our feature template of the fingerprint image x. The Microsoft SEAL library deals with the plaintexts that are abstract. According to the SEAL implementation of Template Extracted Figure 7. The overall privacy-preserving biometrics data matching process. Accordingly, (1) mobile device extracts user's biometrics data feature, (2) the feature vector is expanded using RNG database, (3) expanded vector is encrypted and transmitted to the cloud server, (4) the server gets user's template vector that is stored in enrollment stage and calculates the distance in the encrypted domain, (5) server creates an encrypted client puzzle and send it to the mobile device, (6) the mobile device decrypts and sends the plain value to the server. BFV, the plaintext space is a polynomial ring over a prime number ring. Thus, plaintext values are polynomials which have coefficients with integer value modulo and a prime number. The library has some critical parameters; plaintext modulus (p), and polynomial modulus (m). The parameter p is the prime number for the coefficients. m is the degree of the irreducible polynomial x m + 1.
Each ciphertext has a connected noise resource, and with each arithmetic operation, the noise resources shorten. Noise is defined to be the amount that must be rounded away correctly for decryption to obtain. A small fraction is not very beneficial to work with. If n is the size of the noise in a ciphertext, we define the noise budget of the ciphertext to be log2(2n). With this definition, every ciphertext has a positive noise budget, and every operation spends a portion of this budget. Once the noise budget approaches 0, the ciphertext becomes undecryptable [25]. The noise influence of addition arithmetic operation is much less than the one of multiplications arithmetic operation. Also, spending noise resources is not a great approach. Thus, spending noise resources makes ciphertexts indecipherable.

Sequential Fully Homomorphic Encryption
In sequential matching, the system calculates the matching score without splitting the feature vector. Instead, the system calculates the Euclidean distance metric using all sub-vectors on just one CPU to match two biometrics data input. We want to show the time execution improvement on the proposed parallel matching protocol with the sequential approach. Other critical parameters that affect the execution time are polynomial coefficient modulus and security level equivalent in AES (128,192 or 256). Figure 9 shows the execution time of two biometrics data matching process in seconds. The execution duration increases exponentially in time, with the increase in both parameters. The execution time is increasing with the increasing polynomial coefficient modulus and AES security level as expected. As can be seen from the figure, we observed that the value of plaintext modulus, p does Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 July 2020 doi:10.20944/preprints202007.0658.v1   Figure 10 shows the speed up on the AES security level over the polynomial coefficient modulus on the biometrics dataset. To asses the effectiveness of the parallel matching algorithm, the time is measured with varying modulus size. As can be seen from the figure, the system achieved performance improvement in matching time. The system achieves a non-linear speed up as the number of polynomial coefficients modulus increases. For example, if the m = 2048, p = 1024 and AES-256 parameters in Table 1 are used, the execution time is 1.274226, while the execution time with the same parameters in Table 2  After that, the system encrypts and store the sensitive biometric data in the cloud database. We also applied parallel execution techniques to reduce the matching process execution time, which is calculated in the encrypted domain. Our results suggest that using parallel computation together with hash expansion and homomorphic encryption is a promising solution for the border control systems. Therefore, our solution scales gracefully with the size of the data. Moreover, many of the ideas presented here can be used to match other biometric data, for example, iris or face. The suggested matching system's deficiency is that the model consumes time during matching of fingerprint in the encrypted domain. The matching of travelers in real-time identification using large fingerprint images needs more processing capability, computing resources, and storage.