Preprint
Article

This version is not peer-reviewed.

RUIP-BA: Renewable, Unlinkable, and Irreversible Privacy-Preserving Behavioral Authentication via Random Projection and Local Differential Privacy for IoT and Mobile Platforms

Submitted:

02 April 2026

Posted:

02 April 2026

You are already at the latest version

Abstract
Behavioral Authentication (BA) systems verify user identity claims based on unique behavioral characteristics using machine learning (ML)-based classifiers trained on user behavioral profiles. Although effective, ML-based BA systems face serious privacy threats, including profile inference and reconstruction attacks. This paper presents RUIP-BA (Renewable, Unlinkable, and Irreversible Privacy-Preserving Behavioral Authentication), a non-cryptographic framework tailored to low-computation devices such as IoT and mobile platforms. Random Projection (RP) maps behavioral profiles into lower-dimensional protected templates while approximately preserving utility-relevant geometry, and local Differential Privacy (DP) injects calibrated stochastic perturbations to provide formal privacy protection. The proposed design jointly targets the ISO/IEC 24745 requirements of renewability, unlinkability, and irreversibility. We provide complete algorithmic realizations for enrollment, verification, template renewal, unlinkability testing, and GAN-based adversarial privacy evaluation. We also introduce rigorous formal privacy derivations and proofs under explicit assumptions, including formal security games, theorem-level guarantees at information-theoretic and statistical levels, Cram'er-Rao lower bounds for irreversibility, full Jensen-Shannon divergence derivations for unlinkability, and GAN Nash-equilibrium attack bounds. Experiments on voice, swipe, and drawing datasets show authentication accuracy above 96% while sharply limiting feature recoverability under strong GAN-based attacks. RUIP-BA provides a scalable, mathematically grounded, and deployment-ready privacy-preserving BA solution.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

In today’s increasingly interconnected digital landscape, secure user authentication has become critical for protecting sensitive resources and all online services. As traditional authentication methods, such as passwords and PINs, face growing security limitations, the need for more sophisticated authentication systems has become apparent. For example, password-based authentication systems are vulnerable to password cracking, theft, and sharing attacks. To address these weaknesses, additional layers of protection are implemented using pre-agreed questions, hardware tokens, smart cards, and user biometrics. However, these approaches require additional hardware and infrastructure and often suffer from the same vulnerabilities observed in password-based and PIN-based systems.
An attractive approach to multifactor authentication is to use behavioral data as the second factor. Behavioral authentication (BA) systems [1,2,3,4] leverage distinctive user behavior patterns, such as typing style, gait, or touch dynamics, to verify users’ identities seamlessly and continuously. Behavioral data in a BA system can be collected passively or actively during an interactive session with the user. Passive BA systems collect data through background processes, whereas active BA systems require the user’s presence at the time of the verification request and provide higher security guaranties. By analyzing unique behavior patterns from collected data, BA systems offer an additional layer of security in user authentication.
BA systems typically use raw behavioral data to create detailed user profiles, which are then used to train a machine learning (ML) classifier. Although ML-driven BA systems provide a reliable and accurate method of continuous authentication to verify user identities, these centralized ML models pose a critical single-point vulnerability that attackers can exploit. Since an ML model in a BA system contains information about sensitive personal data, the impact of such attacks can raise serious privacy concerns. A privacy attacker in a BA system can be an honest but curious verifier or an external actor seeking to learn users’ behavior patterns and link them to their identities. For example, an external attacker could launch a model extraction attack [5] or a model inversion attack [6] to recreate the original behavioral data from a compromised ML model, potentially allowing them to impersonate users or gain unauthorized access to their accounts. Additionally, an honest but curious verifier might share behavioral data with third parties. The potential consequences of such privacy breaches are severe, as compromised behavioral patterns could lead to identity theft, financial fraud, or unauthorized access across multiple systems.
Privacy-sensitive personal information carried by behavioral profiles and used in BA systems must be protected from potential attackers. Although the initial design goal of the BA systems was to ensure user-friendly and accurate verification, focusing on usability and system performance properties, data privacy has since been included as a mandatory requirement. According to the ISO/IEC 24745 standard [7], any privacy-preserving biometric or behavioral biometric system must meet three key privacy requirements:
  • Renewability (Cancelability): All users of the system should be able to refresh their profiles if compromised.
  • Unlinkability: It should be infeasible for an attacker to link two or more compromised profiles.
  • Irreversibility: It should be infeasible to deduce the original behavioral pattern from compromised profiles.
Biometric and behavioral biometric systems use profile templates instead of the original profiles in the system to ensure all three properties. Renewability allows users to revoke and replace their templates if compromised. Unlinkability ensures that the distance or divergence between two templates, whether from the same or different sources, is indistinguishable, preventing cross-matching attacks [8]. Irreversibility ensures that the original behavioral patterns cannot be reconstructed from the used template.
Existing privacy-preserving authentication systems utilize both cryptographic and non-cryptographic methods to ensure credential privacy. Cryptographic approaches commonly include key binding (fuzzy vault [9] and fuzzy commitment [10]), key generation approaches [11], zero-knowledge proofs [12], and blockchain-based schemes [13]. However, most of these methods are not directly applicable to BA systems due to noisy behavioral data and the use of an ML model for authentication decisions. Recent advances in privacy-preserving ML [14] have introduced some cryptographic methods to mitigate data leaks in ML-based BA systems. For example, Loya and Bana [15] proposed a protocol for keystroke data that combines fully homomorphic encryption with differential privacy. A similar attempt is observed in [16]. However, many of these systems leave confidential information unaltered, making them susceptible to data breaches. The system also often suffers from reduced performance, increased computational overhead, and can not ensure all three privacy-preserving properties.
Among all non-cryptographic approaches, methods such as cancelable biometrics [17], differential privacy (DP) [18,19], and federated learning [20] are more widely adopted in BA systems. Although DP methods effectively preserve behavioral data privacy with theoretical guaranties, they always require making a trade-off between privacy and data utility. Moreover, DP-based approaches are less effective for systems with high-dimensional data. Federated learning-based systems often produce less accurate results when users carry only positive-class data  [21]. Cancelable biometrics [22,23] enhance the security of biometric and behavioral biometric data by transforming the original biometric data into a new representation known as a template. This method aims to safeguard users’ original biometric data and revoke compromised profiles. However, none of these cryptographic and noncryptographic approaches fully satisfy all three required privacy-preserving properties considering all possible attacks.
For privacy-preserving ML-based BA systems, cancelable biometrics can be a promising approach. In cancelable biometrics, if the transformation preserves the relative distances or distributions among all profiles, the ML-based system trained on transformed data will be able to maintain the system’s performance. In addition, cancelable biometrics can also help protect sensitive data by reducing the risk of direct re-identification or feature leakage if the transformation used in the system is irreversible. However, finding a suitable transformation function that is irreversible and preserves the geometrical structure in the transformed space is challenging. Moreover, irreversible transformation alone should only not be considered a standalone method for strong data privacy guaranties, especially in environments where formal privacy guaranties are required.
Our work. To address all these privacy challenges, this paper introduces RUIP-BA (Renewable, Unlinkable, and Irreversible Privacy-Preserving Behavioral Authentication) that employs Random Projection (RP) together with local Differential Privacy (DP). RP is a transformation function that projects high-dimensional data, such as user behavioral data, into a lower-dimensional space as a template while maintaining the essential distance relationships between data points with high probability. In addition, RP is an inherently lossy process, making it a type of irreversible transformation. On the other hand, DP is one of the main approaches that has been proven to ensure strong privacy protection in statistical data analysis. DP ensures users’ data privacy by adding controlled noise to the original dataset or in the learning parameters. By applying RP to behavioral profiles, the BA system effectively reduces the dimensionality of the data to make DP more effective. Furthermore, the inclusion of local DP with RP guaranties user profile privacy against statistical computations, ensuring that attackers cannot retrieve sensitive information about individuals in the training dataset. Moreover, both RP and DP will allow the use of an ML-based classifier in BA systems to authenticate users accurately without exposing raw, sensitive behavioral data. RP and DP will also protect the privacy of the user’s profile and verification data during transmission to the verifier. The use of these two non-cryptographic approaches will also make the system well-suited for deployment on low-computation devices, such as IoT and mobile devices, which are commonly used to collect users’ behavioral data and authenticate them.
The proposed system also distinguishes itself by proposing a holistic approach to privacy in alignment with the ISO/IEC 24745 standard privacy-preserving properties. Renewability allows BA users to revoke compromised profile templates and create new ones by simply altering the secret random matrices used in RP and locally adding DP noise, thereby maintaining continuous security. When a profile is projected using two different random matrices in RP, it generates two distinct projected profiles (templates), and the addition of local noise further separates them, making it computationally infeasible for attackers to link the noisy projected profiles for cross-matching attacks. Furthermore, the irreversibility of RP combined with the randomized perturbations introduced by DP makes it infeasible for an attacker to recover the original behavioral pattern from noisy transformed profiles.
A conference version of this paper was published in [24]. In this extended version, we significantly expand the previous work by including DP, providing additional analysis, experimental results, a formalization of the Generative Adversarial Network (GAN)-based privacy attack, and new formal mathematical proofs of all three ISO/IEC 24745 properties. In general, the contributions in this paper can be described as follows.
  • We present a novel privacy-preserving BA system designed specifically for low-computation devices. By leveraging the data protection capabilities of RP and local DP, this system effectively addresses the challenges posed by high-dimensional data while safeguarding sensitive behavioral information.
  • Our system maintains high accuracy while preserving the essential privacy attributes of authentication systems-renewability, unlinkability, and irreversibility-in alignment with the ISO/IEC 24745 standard within the BA systems framework.
  • We designed a novel GAN-based privacy attack model to thoroughly evaluate the system’s irreversibility. Additionally, we systematically analyzed all other key privacy requirements.
  • Experimental validation using three distinct behavioral datasets confirmed our theoretical analyses, demonstrating the system’s practical effectiveness, robustness, and resilience, establishing a strong foundation for real-world implementation.
  • We provide new formal security games for all three ISO/IEC 24745 properties, and derive rigorous mathematical proofs including information-theoretic lower bounds (Cramér-Rao), full Jensen-Shannon divergence derivations, and GAN Nash-equilibrium attack bounds.

Contribution and Novelty Highlights

  • New acronym - RUIP-BA: The acronym RUIP-BA (Renewable, Unlinkable, Irreversible Privacy-Preserving Behavioral Authentication) directly encodes the three ISO/IEC 24745 properties in the system name, making the contribution immediately transparent. This distinguishes RUIP-BA from prior systems where the acronym encodes only the technical method.
  • Unified framework novelty: RUIP-BA unifies geometric template protection (RP) and formal stochastic privacy protection (local DP) in one deployable BA pipeline for low-resource platforms.
  • Algorithmic novelty: The paper presents a complete modular algorithm stack covering profile enrollment, claim verification, template re-issuance, unlinkability testing, adversarial privacy evaluation, and DP parameter calibration.
  • Formal-analysis novelty: We provide new explicit mathematical derivations with axiom, lemma, and theorem level statements for all three properties. Specifically: (i) a Bhattacharyya-coefficient renewal bound via Hanson-Wright concentration; (ii) a full KL/JS divergence derivation for unlinkability under Gaussian mechanism; (iii) a Cramér-Rao/Bayesian MMSE bound for irreversibility showing that null-space information cannot be recovered; and (iv) a GAN Nash-equilibrium privacy bound.
  • Adversarial-evaluation novelty: We model GAN-based inversion explicitly as a formal security game and bound attack effectiveness through the mutual information bottleneck and information-channel capacity of the protected template.
The structure of the paper is as follows. Section 2 reviews related work, while Section 3 provides the necessary background information. Section 4 describes the proposed privacy-preserving BA system, including the registration and verification phases, along with a detailed privacy analysis. Section 5 outlines the GAN-based privacy attack model used to evaluate system robustness. Section 6 presents the experimental results, covering system performance, renewability, unlinkability, the impact of attacks on irreversibility, and a comparison of results. Finally, Section 7 summarizes the key findings of this work and future directions.
Notation.Table 1 summarizes the key notations used most frequently in this paper.

3. Background

3.1. Random Projection (RP)

RP is a mathematical technique that is used to reduce the dimensionality of the data while approximately preserving the pairwise Euclidean distances between data points. Given a high-dimensional vector x i , RP projects it into a lower-dimensional space, resulting in a vector x i . For a profile X , RP transformation is defined as X = 1 k σ r R X or, more simply, X = R X , where R is a random matrix and X is a transformed profile.
The foundational theory of RP is based on the Johnson-Lindenstrauss (JL) lemma [36], which states that a set of points in a high-dimensional space can be projected into a lower-dimensional space while preserving pairwise Euclidean distances within a small error margin. This property makes RP an approximate isometry transformation, suitable for privacy-preserving systems, as RP ensures that data relationships are maintained while obscuring the original features. Extensions of the JL lemma have further refined the minimum acceptable dimension to preserve these distances [37], ensuring that the transformation is effective and efficient for ML-based systems. The dimensionality reduction process in RP introduces a degree of data loss. On the other hand, the random matrix used in RP introduces randomness in the transformation process. Both processes make RP a kind of irreversible transformation.
The random matrix in RP can be generated through various distributions and can be kept secret. Although the original RP method employs Gaussian distributions to generate matrix components, practical applications often prefer computationally efficient alternatives. For example, [38] introduced a discretized form of the Gaussian distribution, where the components of the random matrix R are generated from a set { + 1 , 0 , 1 } with probabilities P r ( r i j = + 1 ) = 1 2 ϕ , P r ( r i j = 0 ) = 1 1 ϕ , and P r ( r i j = 1 ) = 1 2 ϕ , respectively and r i j R . Setting ϕ = 3 provides a balance between preserving distance properties and computational efficiency, making RP particularly useful for resource-constrained devices.
The mathematical analysis of RP-based methods for privacy preservation was first explored for biometric data by [31,32,33,34,35] and later adapted for behavioral biometrics in [23]. These studies demonstrated that RP transformations can effectively obscure sensitive data while maintaining the utility of data for authentication. The key idea is that projecting high-dimensional data into a lower-dimensional subspace makes it challenging to reconstruct the original data, as RP is a lossy process, thereby enhancing privacy. Despite its advantages, most existing RP-based approaches primarily rely on traditional distance-based verification methods.
Theorem 1
(Johnson-Lindenstrauss (JL) Lemma [37]). For any ϵ j l ( 0 , 1 ) , integer n 1 , and any set P of n points in R d , if
k 4 ln n ϵ j l 2 / 2 ϵ j l 3 / 3 ,
then there exists a Lipschitz function f : R d R k such that for all u , v P :
( 1 ϵ j l ) u v 2 f ( u ) f ( v ) 2 ( 1 + ϵ j l ) u v 2 .
For RUIP-BA with voice data ( n = 200 , ϵ j l = 0.5 ), this requires k 73 . With swipe data ( n = 300 , ϵ j l = 1.0 ), k 30 . With drawing data ( n = 300 , ϵ j l = 0.7 ), k 46 as illustrated in Table 2.
Lemma 1
(RP 2 Sensitivity [38]). For a random matrix R R k × d with i.i.d. Achlioptas entries ( ϕ = 3 ), the 2 sensitivity of the projection map g ( x ) = R x satisfies
Δ 2 ( g ) = max x x R ( x x ) 2 k ϕ · Δ x ,
where Δ x is the 2 sensitivity of the raw data. Concretely, for swipe features ( d = 33 , k = 30 , ϕ = 3 , Δ x 33 as shown in Table 2): Δ 2 ( g ) 10 · 33 = 330 18.2 . For bounded features in [ 0 , 1 ] d , a tighter bound Δ 2 ( g ) k / ϕ applies when profiles differ in one sample by one unit.
Proof. 
Each entry R i j { + 1 , 0 , 1 } has E [ R i j ] = 0 and Var ( R i j ) = 1 / ϕ . For any v = x x with v 2 Δ x :
E R v 2 2 = i = 1 k E j = 1 d R i j v j 2 = i = 1 k j = 1 d v j 2 Var ( R i j ) = k ϕ v 2 2 k ϕ Δ x 2 .
Taking the square root and noting that R v 2 E [ R v 2 2 ] + O ( k ln ( 1 / δ ) ) Δ x with high probability, we obtain the stated bound.    □

3.2. Differential Privacy (DP)

The primary goal of DP is to enable the analysis of a dataset’s properties, representing a population while ensuring that no individual information is revealed. Essentially, DP introduces noise to statistical queries or individual data points in the original dataset to ensure that an adversary cannot determine whether a specific individual is included in the data. As originally defined in [39], central DP assumes users trust the central server and send their unaltered data to be stored on the server. The server then applies DP noise to perturb the data before sharing the results with un-trusted third parties for analysis, a method known as central DP. In contrast, local DP ensures that each user’s data remains private even before it is shared with the server. DP noise is added at the client level to provide DP guarantees, protecting users’ data privacy even if the server is un-trusted.
The concept of ϵ -DP, introduced in [39], formally defines DP. This definition was later extended to ( ϵ , δ ) -DP, which incorporates an additional δ term to account for the privacy guaranties provided by the Gaussian distribution.
Definition 1.(ϵ-Differential Privacy) A mechanism or algorithm M satisfies ϵ-differential privacy if, for all neighboring datasets D and D D n , and for all subsets S Y , where Y represents the set of all possible outputs, M satisfies the condition P r [ M ( D ) S ] e ϵ P r [ M ( D ) S ] . This implies that the output of the mechanism M applied to D is nearly indistinguishable from the output when M is applied to D . Smaller values of ϵ result in stronger privacy guarantees.
Definition 2.(( ϵ , δ )-Differential Privacy) A mechanism or algorithm M satisfies ( ϵ , δ ) -differential privacy if, for all neighboring datasets D and D D n , and for all subsets S Y , where Y represents the set of all possible outputs, M holds the condition P r [ M ( D ) S ] e ϵ P r [ M ( D ) S ] + δ . This means that the output of the mechanism M applied to D is nearly indistinguishable from the output when M is applied to D , with a small probability of failure captured by δ. Smaller values of ϵ and δ result in stronger privacy guarantees.
A mechanism M satisfies ( ϵ , δ )-DP if it guarantees ϵ -DP with a probability of at least 1 δ , while allowing a failure probability of up to δ .
Various probability distributions have been proposed in the literature to satisfy ϵ -DP and ( ϵ , δ ) -DP. Among the most widely used are the Laplace mechanism [39] and the Gaussian mechanism [40], both of which are utilized in this paper. The Laplace mechanism is particularly popular for its versatility, as it can be applied to various types of data [41]. Laplace mechanism operates by adding noise sampled from the continuous Laplace distribution, L a p ( 0 , Δ f ϵ ) . In contrast, the Gaussian mechanism satisfies the requirements of the newer ( ϵ , δ ) -DP framework and supports efficient management of privacy budget under composition, making it a robust choice for complex privacy-preserving scenarios.
Definition 3.
Given a function f : D n Y , where Y is the set of all possible outputs, and ϵ > 0 . The Laplace mechanism is defined as M ( D ) = f ( D ) + L a p ( 0 , Δ f ϵ ) .
Definition 4.
Given two neighboring datasets D and D in the dataset universe D n , a query function f : D n Y , where Y is the set of all possible outputs and ϵ > 0 . An ϵ- Gaussian DP mechanism (ϵ-GDP) defined as M ( D ) = f ( D ) + N ( 0 , Δ f 2 ϵ 2 ) , where Δ f 2 ϵ 2 stands for the normal distribution.
In RUIP-BA, applying RP projection followed by local DP to the same behavioral profile therefore consumes a total budget of ( ϵ , δ ) as specified by the user.
Lemma 2
(Gaussian Noise Calibration). Given sensitivity Δ 2 from Lemma 1 and privacy budget ( ϵ , δ ) with ϵ 1 , adding η N ( 0 , σ 2 I k ) with
σ = Δ 2 2 ln ( 1.25 / δ ) ϵ
yields ( ϵ , δ ) -DP for the projected output X ^ = R X + η .Concrete example (voice data):  Δ 2 5.60 (for d = 104 , k = 94 , features in [ 0 , 1 ] as mentioned in Table 2), ϵ = 7 , δ = 10 5 gives σ 3.81 .

3.3. Profile Similarity

Jensen-Shannon (JS) divergence, also known as information radius (IRad) [42], can be used to measure the symmetric divergence between two probability distributions, making it a suitable choice for assessing similarity between behavioral profiles. JS divergence is based on the concept of Kullback-Leibler (KL) divergence (also known as relative entropy). KL divergence quantifies the “distance” between two probability distributions. However, KL divergence is asymmetric, limiting its applicability in certain contexts. JS divergence addresses this limitation by being symmetric and bounded, making it more appropriate for comparing distributions in many scenarios.
For two probability distributions X and Y , the KL divergence, D K L ( X | | Y ) , measures how a distribution Y diverges from an expected reference distribution X . JS divergence resolves the asymmetry of KL divergence by averaging it in both directions. The symmetric formula for JS divergence is given as:
D J S ( X | | Y ) = 1 2 ( D K L ( X | | Z ) + D K L ( Y | | Z ) ) ,
where Z = 1 2 ( X + Y ) is the mixture distribution, representing the average of X and Y .
The KL divergence quantifies the asymmetric difference (in bits) of profile Y relative to profile X . When sample sizes in the profiles are limited, the k-NN divergence estimator [43] provides more accurate KL divergence estimates. In this work, we use the k-NN divergence estimator to calculate the KL divergence, which is then used in the JS divergence to effectively measure the similarity between two profiles.

4. Proposed RUIP-BA System

Figure 1 presents the architecture of the proposed RUIP-BA system. A BA system consists of two primary components: a prover and a verifier. The prover, also known as the profile generator, is a software operating on user devices to collect behavioral data, construct profiles, and send them to the verifier. Typically, a verifier is an online server that uses all available profiles to train an ML classifier. The trained classifier is then used to evaluate all verification requests. Recently, neural network (NN)-based classifiers have been developed to classify mouse movements [2], gaits [3], and keystrokes [4] of the users.
A profile X in a BA system consists of m vectors { x 1 , x 2 , , x m } of dimension d, where each dimension represents a behavioral feature and each vector represents a measurement of all d features. The verifier collects N profiles from N users during the registration phase. Traditional distance-based algorithms Ver ( · , · ) store the collected profiles in a database, while ML classifier-based algorithms train a NN-based classifier C ( · ) using the collected profiles. In both cases, a verification request is a tuple ( u i , Y ) , where u i is an identity, and Y contains n ( n < m ) behavioral samples { y 1 , y 2 , , y n } . Distance-based algorithms Ver ( X i , Y i ) measure the distance between X i and Y i , returning an accept or reject decision based on a predefined threshold. In contrast, ML classifier-based algorithms input Y to the trained classifier C ( · ) to generate n prediction vectors y 1 ^ , y 2 ^ , , y n ^ , which are aggregated into a final accept or reject decision.
The performance of a robust BA system is typically quantified by metrics such as False Acceptance Rate (FAR) and False Rejection Rate (FRR). FAR is the probability of accepting an access request coming from an unauthorized user, while FRR is the probability of rejecting an access request coming from an authorized user. Moreover, the JS divergence can be used to assess the similarity between two profiles by treating them as two probability distributions.

4.1. Development of a Privacy-Preserving BA System

Building on the foundational principles and privacy-preserving mechanisms discussed earlier, we introduce a novel BA system that leverages both RP and DP. This system is designed to enhance both accuracy and user data privacy, specifically crafted for low-computing devices such as IoT devices and smartphones. Figure 1 illustrates the overall structure and workflow of our proposed privacy-preserving BA system. Here, RP reduces the dimensionality of profiles, effectively obfuscating sensitive attributes while preserving the relationships between them. DP maintains individual users’ data privacy by adding controlled noise to the projected profiles and safeguards the profiles’ privacy against statistical computations.
The proposed privacy-preserving BA system comprises two main phases: the registration phase and the verification phase. Each phase is optimized to ensure seamless integration of RP and DP data into the BA classifier while preserving user privacy and adhering to the key principles of renewability, unlinkability, and irreversibility. The details of these phases are outlined in the next two sections.

4.1.1. Registration Phase

The registration phase is responsible for initializing parameters and preparing user profiles to train an ML-based classifier. It involves four primary tasks: random matrix generation, profile transformation, noise addition, and training of an NN-based BA classifier.
  • Random matrix generation: To initiate the registration process, a user u i first generates a random matrix R i on their device using a private random seed. This matrix serves as a unique key for projecting the user’s BA profile.
  • Profile transformation: For each profile X i of user u i , the device applies the RP transformation to produce a projected profile X i = R i X i . The RP transformation follows a Lipschitz mapping f : R d R k , project the data from d dimensions to k dimensions, where k < d . To enhance computational efficiency, we employ the discrete distribution discussed earlier for RP with ϕ = 3 .
  • Additive noise: The user applies local DP to add noise to the projected profile X i , transforming it into a noisy projected profile X ^ = M ( X i ) before transmitting it to the verifier. The client chooses the values of ϵ and δ for DP, where smaller ϵ and δ are preferred for stronger privacy.
  • Training a BA classifier: The verifier collects all N noisy projected profiles { X ^ 1 , X ^ 2 , , X ^ N } from N users and uses them to train a NN-based BA classifier C ( · ) . The verifier may act as a third party service provider, deploying C ( · ) through a Machine Learning as a Service (MLaaS).
The random matrix for RP is generated using a seed. In the proposed system, each user possesses a private random seed, similar to a PIN, which they use to generate the random matrix. This design removes the requirement for specialized hardware to securely store the seed, as it can simply be memorized by the user. The ϵ and δ values of DP chosen by each user are kept confidential, although the type of added noise is public information.

4.1.2. Verification Phase

The verification phase handles the process of authenticating a user based on their noisy transformed verification data. The verification algorithm Ver ( · , · ) ensures that only valid users are granted access, leveraging the output of the trained NN-based BA classifier.
  • Profile transformation for verification: When a verification request ( u i , X i ) is initiated, user device collects and transforms the verification profile Y i into Y i = R i Y i using R i , which is generated from the secret seed. DP noise is then added to the transformed profile Y i through local DP, resulting in noisy projected verification data Y ^ i = M ( Y i ) . The device subsequently transmits Y ^ i along with the user identity u i to the verifier as a verification claim ( u i , Y ^ i ) .
  • Verification process: The verification algorithm Ver ( · , · ) verifies the claim with the help of the trained BA classifier C ( · ) . For C ( Y ^ i ) , the output will be the n prediction vectors { y 1 ^ , y 2 ^ , , y n ^ } . These predictions are then aggregated into a single binary decision: accept or reject.
The use of an ML classifier in the verification process ensures that user authentication remains accurate while preserving data privacy through RP and DP. Our system ensures privacy by transforming user data into lower-dimensional subspaces, effectively concealing sensitive attributes. Additionally, differential privacy techniques are applied to protect individual user information, ensuring that the system maintains robust privacy guarantees.

4.2. Complete Algorithmic Realization

To provide a complete and implementation-oriented specification of RUIP-BA, we describe all major algorithms used in this paper. To avoid margin overflow, the full workflow is decomposed into one main algorithm and five sub-algorithms.
Algorithm 1 RUIP-BA Main Workflow
Require: 
User set U , profile generator, verifier, projection dimension k, DP parameters ( ϵ , δ )
Ensure :
Trained classifier C ( · ) , privacy-evaluation reports, updated templates if needed
1:
RegistrationSubalgorithm( U , k , ϵ , δ )            ▹ Subalg. A: Enrollment + model training
2:
VerificationSubalgorithm( U , C ( · ) )               ▹ Subalg. B: Authenticate claims
3:
RenewalSubalgorithm( U , C ( · ) )                  ▹ Subalg. C: Revoke and re-enroll
4:
UnlinkabilitySubalgorithm( U )                ▹ Subalg. D: JS-divergence testing
5:
GANPrivacyAttackSubalgorithm( U )         ▹ Subalg. E: GAN adversarial evaluation
6:
return  C ( · ) and all privacy-performance metrics
Algorithm 2 Subalgorithm A: Registration (Enrollment and Model Training)
Require: 
User u i with raw profile X i R d × m , secret seed s i , target dimension k, DP mechanism M with budget ( ϵ i , δ i )
Ensure :
Noisy projected profile X ^ i uploaded to verifier; classifier C ( · ) trained on all users’ profiles
1:
for each user u i U  do
2:
    Generate R i R a n d M a t ( s i , k , d ) using Achlioptas distribution with ϕ = 3            ▹ Eq.
P r ( r i j = ± 1 ) = 1 / 6 and P r ( r i j = 0 ) = 2 / 3 and r i j R i
3:
    Project: X i R i X i                   ▹ Dimensionality: R d × m R k × m
4:
    Compute 2 sensitivity: Δ 2 k / ϕ · Δ x                    ▹ Lemma 1
5:
    Calibrate noise: σ i Δ 2 2 ln ( 1.25 / δ i ) / ϵ i                   ▹ Lemma 2
6:
    Perturb: X ^ i X i + N ( 0 , σ i 2 I k )                    ▹ Local DP on device
7:
    Send X ^ i to verifier
8:
end for
9:
Train C ( · ) on { X ^ i } i = 1 N using NN architecture (Section 6)
10:
return  C ( · )
Algorithm 3 Subalgorithm B: Verification
Require: 
Verification claim ( u i , Y i ) where Y i R d × n ; secret seed s i ; trained classifier C ( · ) ; threshold τ
Ensure :
Accept/Reject decision
1:
Recompute R i R a n d M a t ( s i , k , d )                  ▹ Same seed as enrollment
2:
Y i R i Y i                             ▹ Project verification data
3:
Y ^ i Y i + N ( 0 , σ i 2 I k )                       ▹ Add DP noise with same σ i
4:
y ^ C ( Y ^ i )                               ▹n prediction vectors
5:
p i A g g r e g a t e C o n f i d e n c e ( y ^ ) = 1 n t = 1 n y ^ t , u i
6:
if p i τ then
7:
    return Accept                 ▹ e.g., p i = 0.97 > τ = 0.50 : phone unlocked
8:
else
9:
    return Reject                  ▹ e.g., p i = 0.12 < τ : mimic swipe denied
10:
end if
Algorithm 4 Subalgorithm C: Template Renewal and Model Update
Require: 
Compromised user u i , old parameters ( R i , ϵ i , δ i ) , re-capture plain profile X i
Ensure :
Old template revoked; fresh unlinkable template active; classifier updated
1:
Generate new seed s i s i and derive R i R a n d M a t ( s i , k , d )         ▹ R i R i almost surely
2:
Choose new privacy budget ( ϵ i , δ i )            ▹ May tighten for stronger protection
3:
X i R i X i
4:
Recalibrate: σ i Δ 2 2 ln ( 1.25 / δ i ) / ϵ i
5:
X ^ i X i + N ( 0 , σ i 2 I k )
6:
Revoke old X ^ i from verifier database              ▹ P r [ X ^ = X ^ ] 0 by Theorem 5
7:
Update verifier with X ^ i ; retrain/fine-tune C ( · )
8:
return updated C ( · )
Algorithm 5 Subalgorithm D: Unlinkability Evaluation
Require: 
Protected templates under three scenarios: (s1) different source, (s2) same source different keys, (s3) same source same key
Ensure: 
Distribution-level unlinkability decision
1:
for each scenario pair ( a , b ) { ( s 1 , s 2 ) , ( s 2 , s 3 ) }  do
2:
    Compute JS divergences { D J S ( X ^ j ( a ) , X ^ j ( b ) ) } j using k-NN estimator
3:
    Compute divergence distribution P ( a , b )
4:
end for
5:
Run KS test: K S T e s t ( P ( s 1 , s 2 ) , P ( s 2 , s 3 ) ) returns p-value p 12 and p 23
6:
if p 12 p 23 and p 12 > 0.05  then
7:
    Conclude: unlinkability is satisfied    ▹ Cases 1 and 2 are statistically indistinguishable
8:
else
9:
    Flag: potential linkage risk detected
10:
end if
11:
return unlinkability verdict and p-values
Algorithm 6 Subalgorithm E: GAN-Based Privacy Attack and Evaluation
Require: 
Auxiliary plain profiles { X j a u x } , projected noisy counterparts { X ^ j a u x } , attack generator G , discriminator D , reconstruction weight λ recon
Ensure :
Recoverability score ρ and privacy conclusion
1:
Build training pairs ( X ^ a u x , X a u x )
2:
for each epoch t = 1 , , T max  do
3:
    Update D : maximize L D = E [ log D ( X ) ] + E [ log ( 1 D ( G ( X ^ ) ) ) ]
4:
    Update G : minimize L G = E [ log D ( G ( X ^ ) ) ] + λ recon · G ( X ^ ) X 2 2
5:
end for
6:
Reconstruct: X ¯ j G ( X ^ j c o m p ) for each compromised profile j
7:
Run per-feature KS test between { X ¯ j , l } and { X j , l * } for all features l = 1 , , d
8:
Compute ρ j = 1 d l = 1 d 1 [ KS - test ( X ¯ j , l , X j , l * ) passes ]
9:
Check E j [ ρ j ] ρ max ( ϵ , δ , k , d ) from Theorem 8
10:
return { ρ j } and privacy status

4.3. Privacy Analysis of the Proposed BA System

In this section, we examine the key privacy attributes of our proposed BA system, taking into account various attack types. These attributes are crucial to ensure robust privacy safeguards in BA systems.

4.3.1. Formal Privacy Security Games

We formalize the three ISO/IEC 24745 privacy properties as security games between a challenger Ch and a PPT (probabilistic polynomial-time) adversary A .
Definition 5
(Renewability Security Game Game REN ).
1. 
Setup.  Ch fixes RUIP-BA parameters ( k , d , ϕ , ϵ , δ ) .
2. 
Challenge generation.  Ch selects a profile X p X and generates two independent key pairs ( s 1 , ϵ 1 , δ 1 ) and ( s 2 , ϵ 2 , δ 2 ) , then computes:
X ^ ( 1 ) = M ( ϵ 1 , δ 1 ) ( R s 1 X ) , X ^ ( 2 ) = M ( ϵ 2 , δ 2 ) ( R s 2 X ) .
3. 
Adversary’s turn.  A receives ( X ^ ( 1 ) , X ^ ( 2 ) ) and must distinguish whether both templates come from the same source (with different keys) or from different sources.
The renewability advantage is Adv REN ( A ) = | P r [ A guesses correctly ] 1 / 2 | . The system hasrenewable templates if Adv REN ( A ) negl ( λ ) for all PPT adversaries A .
Definition 6
(Unlinkability Security Game Game UNL ).
1. 
Setup.  Ch fixes RUIP-BA parameters.
2. 
Challenge generation.  Ch samples a bit b { 0 , 1 } . If b = 0 : select same source X a = X b = X ; if b = 1 : select different sources X a X b . In both cases, use distinct key pairs ( s 1 , ϵ 1 , δ 1 ) and ( s 2 , ϵ 2 , δ 2 ) and compute:
X ^ 1 = M ( ϵ 1 , δ 1 ) ( R s 1 X a ) , X ^ 2 = M ( ϵ 2 , δ 2 ) ( R s 2 X b ) .
3. 
Adversary’s turn.  A receives ( X ^ 1 , X ^ 2 ) and must output a guess b { 0 , 1 } .
The unlinkability advantage is Adv UNL ( A ) = | P r [ b = b ] 1 / 2 | . Templates areunlinkable if Adv UNL ( A ) negl ( λ ) + ξ where ξ = O ( e k / d ) .
Definition 7
(Irreversibility Security Game Game IRR ).  
1. 
Setup.  Ch fixes RUIP-BA parameters.
2. 
Challenge.  Ch samples X p X , generates key ( s , ϵ , δ ) , and computes X ^ = M ( ϵ , δ ) ( R s X ) . Sends X ^ to A .
3. 
Reconstruction.  A outputs a reconstructed profile X ¯ = g ( X ^ ) .
4. 
Scoring.Feature recoverability ρ ( X ¯ , X ) = 1 d j = 1 d 1 [ KS - test ( X ¯ j , X j ) passes ] .
5. 
The system provides ρ 0 -irreversibilityif for all PPT A : E [ ρ ( X ¯ , X ) ] ρ 0 < 1 .

4.3.2. Formal Assumptions and Mathematical Derivations

We now provide formal statements that characterize the privacy guarantees of RUIP-BA. Let neighboring profiles differ in one behavioral sample, and let M satisfy ( ϵ , δ ) -DP.
Axiom 1
(Seeded diversity and bounded sensitivity). Each user template is generated with a user-specific secret seed defining R i , and the per-sample sensitivity after projection is bounded by Δ RP = k / ϕ · Δ x (Lemma 1). Consequently, the DP noise scale is calibrated as b = Δ RP / ϵ (Laplace) or σ = Δ RP 2 ln ( 1.25 / δ ) / ϵ (Gaussian), per Lemma 2.
Lemma 3
(Renewability under key and privacy refresh). For a fixed profile X , two renewed templates
X ^ ( 1 ) = M ( R ( 1 ) X ) , X ^ ( 2 ) = M ( R ( 2 ) X ) ,
with independent seeds and independently sampled DP noise satisfy
Pr X ^ ( 1 ) = X ^ ( 2 ) 0 ,
and therefore old compromised templates can be revoked and replaced without re-collecting raw behavior.
Proof. 
Independence of seeds implies R ( 1 ) R ( 2 ) almost surely. Since M is randomized, the additive perturbations are also independent. Exact equality of two real-valued randomized templates has probability zero under continuous noise distributions. Hence compromise of one template does not prevent issuance of a fresh unlinkable replacement.    □
Theorem 2
(Unlinkability bound). Let A be a linkage adversary distinguishing same-source renewed templates from different-source templates. Define
Adv l i n k ( A ) = Pr [ A ( X ^ a , X ^ b ) = 1 same source ] Pr [ A ( X ^ a , X ^ b ) = 1 different source ] .
Under Axiom 1 and independent renewal keys, there exists a small ξ such that
Adv l i n k ( A ) ξ + O ( δ ) ,
where ξ decreases as projection randomness and DP noise increase.
Proof. 
RP with independently sampled matrices destroys deterministic cross-instance geometric signatures for the same source. DP further contracts distinguishability between neighboring outputs by at most ( ϵ , δ ) multiplicative/additive factors. Therefore, any test statistic (including JS-divergence-based matching) has bounded discrimination gain, yielding the stated advantage upper bound.    □
Theorem 3
(Irreversibility and reconstruction error floor). For compromised template X ^ = M ( R X ) , any estimator X ˜ = g ( X ^ ) satisfies
E X X ˜ 2 2 E X R R X 2 2 RP information loss + Ω ( σ 2 ) DP noise floor ,
where R is the Moore-Penrose pseudo-inverse.
Proof. 
The RP term is the unavoidable projection residual from mapping d to k < d dimensions. Even if R is known, inversion cannot recover null-space components. DP adds independent stochastic perturbation whose variance lower-bounds estimator risk. Summing both independent error sources yields a non-zero irreversibility floor.    □
Theorem 4
(GAN attack privacy bound). Let M P ( · ) be the strongest GAN attacker trained with auxiliary data under either known or unknown ( R , ϵ , δ ) . If RUIP-BA satisfies Theorem 3, then the recoverable-feature ratio ρ (fraction of features passing statistical similarity) obeys
ρ ρ max ( ϵ , δ , k , d , D a u x ) ,
with ρ max strictly below 1, and empirically low for the tested datasets.
Proof. 
GAN training approximates the Bayesian reconstruction map from protected space to plain space. However, the irrecoverable null-space information and DP-induced uncertainty constrain reconstruction fidelity. Therefore, only a bounded subset of features can be statistically aligned with ground truth, implying ρ < 1 and yielding the stated attack bound.    □

4.3.3. Renewability Analysis - Extended Formal Proof

To ensure the renewability property, each user should be able to revoke an old noisy projected profile and replace it with a new one. In our proposed privacy-preserving BA system, a user u can achieve this by changing the secret R i to R j along with altering the DP parameters, which will generate a new noisy projected profile X ^ j = M ( R j X ) from X . The user will then revoke the old noisy projected profile X ^ i , generated by X ^ i = M ( R i X ) , by updating the trained BA classifier C ( · ) with X ^ j . For u, the new verification request will be ( u , Y ^ j ) . This update process is the same for all registered users of the system. For the updated C ( · ) , the performance of the BA system should be largely preserved, which is confirmed in Section 6.2.1 by the experimental results. Moreover, adding DP noise to the data before training C ( · ) will help to mitigate the impact of poisoning attacks [44].
Theorem 5
(Renewability - Formal Bhattacharyya Bound). Under Axiom 1 and the Gaussian mechanism with σ calibrated per Lemma 2, the advantage in Game REN (Definition 5) satisfies:
Adv REN ( A ) 1 exp k μ X 2 2 2 ϕ σ 2 + O ( δ ) ,
where μ X = E [ X ] is the mean behavioral profile. For voice data ( k = 94 , ϕ = 3 , σ 3.81 ): Adv REN 0.011 , confirming near-negligible linkage between renewed templates.
(RP decorrelation via independence)..Proof. Step 1 Let R ( 1 ) , R ( 2 ) be generated from independent seeds. For any x R d , define u ( j ) = R ( j ) x , j { 1 , 2 } . By independence:
E u ( 1 ) · u ( 2 ) = E [ R ( 1 ) ] x · ( E [ R ( 2 ) ] x ) T = 0 ,
since E [ R i j ] = 0 . Moreover, E [ u ( 1 ) u ( 2 ) 2 2 ] = 2 k x 2 2 / ϕ by variance linearity.
Step 2 (Bhattacharyya coefficient). The post-DP templates follow X ^ ( j ) N ( u ( j ) , σ 2 I k ) . The Bhattacharyya coefficient between the two distributions is:
B C = exp u ( 1 ) u ( 2 ) 2 2 8 σ 2 .
Step 3 (Total variation bound). Total variation satisfies TV ( p , q ) 1 B C 2 . Using the expected squared distance from Step 1:
E u ( 1 ) , u ( 2 ) B C exp k μ X 2 2 4 ϕ σ 2 .
Step 4 (Advantage bound). The adversary in Game REN is limited by the total variation distance: Adv REN TV ( X ^ ( 1 ) , X ^ ( 2 ) ) 1 exp ( k μ X 2 2 / ( 2 ϕ σ 2 ) ) . The DP mechanism further limits any per-sample distinguishability by an additive O ( δ ) term (Definition 2).
Numerical verification. For voice: μ X 2 0.5 (features in [ 0 , 1 ] , d = 104 ), k = 94 , ϕ = 3 , σ = 3.81 :
Adv REN 1 e 94 × 0.25 / ( 2 × 3 × 14.52 ) 1 e 0.271 0.011 .
   □

4.3.4. Unlinkability Analysis - Full Jensen-Shannon Divergence Derivation

A privacy-preserving BA system must ensure that correlating compromised noisy transformed profiles is infeasible. For instance, if an adversary obtains two noisy projected profiles, X ^ i and X ^ j , of user u, it should be computationally difficult to verify if they originate from the same source. The unlinkability property can prevent cross-matching attacks [8] and reduces the risk of tracing an individual enrolled in multiple systems using the same behavioral profile.
Theorem 6
(Unlinkability - Full JS Divergence Derivation). Let X ^ a = M ( ϵ 1 ) ( R ( 1 ) X a ) and X ^ b = M ( ϵ 2 ) ( R ( 2 ) X b ) under the Gaussian mechanism with common variance σ 2 . Define:
  • Case 1(invalid claims, different source): X a X b , different keys.
  • Case 2(same source, different keys): X a = X b = X , R ( 1 ) R ( 2 ) .
  • Case 3(valid claims, same source, same key): X a = X b = X , R ( 1 ) = R ( 2 ) .
Then D J S ( X ^ Case 1 ) D J S ( X ^ Case 2 ) D J S ( X ^ Case 3 ) , with the first two agreeing to within O ( σ 2 μ X a μ X b 2 2 ) .
Proof. 
Step 1(KL divergence for Gaussians with equal covariance). Under Gaussian mechanism, x ^ N ( R μ X , R Σ X R T + σ 2 I k ) . When σ 2 R Σ X R T , the covariance simplifies to σ 2 I k . For distributions p a N ( μ a , σ 2 I k ) and p b N ( μ b , σ 2 I k ) :
D K L ( p a p b ) = μ a μ b 2 2 2 σ 2 .
Step 2 (Mixture distribution for JS divergence). Let p m = 1 2 ( p a + p b ) N μ a + μ b 2 , σ 2 I k + ( μ a μ b ) ( μ a μ b ) T 4 .
D J S ( p a p b ) = 1 2 D K L ( p a p m ) + D K L ( p b p m ) μ a μ b 2 2 4 σ 2 + μ a μ b 2 2 .
Step 3 (Case-by-case analysis).
  • Case 1: μ a = R ( 1 ) μ X a , μ b = R ( 2 ) μ X b , X a X b . By JL (Theorem 1): μ a μ b 2 2 ( 1 ± ϵ j l ) μ X a μ X b 2 2 · R F 2 / d .
  • Case 2: μ a = R ( 1 ) μ X , μ b = R ( 2 ) μ X , same X . By RP randomness: μ a μ b 2 2 = ( R ( 1 ) R ( 2 ) ) μ X 2 2 2 k μ X 2 2 / ϕ (cf. Step 1 of Theorem 5).
  • Case 3: R ( 1 ) = R ( 2 ) , same X . Then μ a = μ b and D J S = 0 .
Step 4 (Unlinkability condition). Cases 1 and 2 have equal JS divergence when μ X a μ X b 2 2 μ X 2 , i.e., when the inter-profile distance approximates the intra-profile re-projection spread. This occurs when DP noise dominates ( σ 2 μ X 2 2 / k ), making same-source different-key templates statistically equivalent to different-source templates.
Step 5 (Adversarial advantage bound). The adversary in Game UNL must distinguish Case 2 from Case 1 via any function f ( X ^ 1 , X ^ 2 ) . The maximum advantage using optimal hypothesis testing (Neyman-Pearson) is bounded by:
Adv UNL ( A ) D J S ( p Case 1 p Case 2 ) = O μ X 2 σ 2 + μ X 2 2 / k .
This decreases as σ increases (stronger DP) or k decreases (stronger RP), confirming the unlinkability/utility trade-off.    □

4.3.5. Unlinkability Analysis - Experimental Validation

In two noisy projected profiles of X , two distinct R matrices are required in RP, and then DP added random noise, produces two completely different noisy projected profiles, X ^ i and X ^ j , respectively. These noisy projected profiles can also be generated from two distinct profiles, X i and X j , originating from different sources, using the same method. In this case, the unlinkability property is ensured if the JS divergence between noisy protected profiles originating from the same source, but projected using two different R matrices and with different noise added, is close to the JS divergences of invalid claims (i.e., the JS divergence between noisy projected profiles come from different sources). Additionally, it must remain significantly distant from the JS divergence of valid claims (i.e., the JS divergence between noisy projected profiles come from the same source). In Section 4.3.5, we validate the unlinkability property in our proposed privacy-preserving BA system.

4.3.6. Irreversibility Analysis - Cramér-Rao Lower Bound

To extract the behavioral pattern noisy project profiles, let us consider a scenario where X ^ is known but the plain profile X is unknown. In a system X ^ = M ( R X ) , two main factors ensure the irreversibility property: the irreversibility of M , and the irreversibility of the RP transformation.
Theorem 7
(Irreversibility - Information-Theoretic Lower Bound). Let X ^ = M ( R X ) where R R k × d with k < d , and M adds Gaussian noise η N ( 0 , σ 2 I k ) . Assume X N ( μ X , Σ X ) as a Bayesian prior. For any estimator X ¯ = g ( X ^ ) of X , the Bayesian Minimum Mean Squared Error (MMSE) satisfies:
MMSE ( X | X ^ ) = tr ( Σ X ) tr Σ X R T ( R Σ X R T + σ 2 I k ) 1 R Σ X ( d k ) λ min ( Σ X ) ,
where the right-hand side is strictly positive when k < d , establishing an irreducible reconstruction error floor independent of σ.
Proof. Step 1 (Fisher information matrix). The observation model is x ^ = R x + η , η N ( 0 , σ 2 I k ) . The Fisher information matrix for x from x ^ is:
I ( x ; x ^ ) = R T ( σ 2 I k ) 1 R = 1 σ 2 R T R R d × d .
Step 2 (Rank deficiency). Since R R k × d with k < d , the matrix R T R has rank at most k. Therefore, the null space of R has dimension d k > 0 . For any v ker ( R ) :
v T I ( x ; x ^ ) v = 1 σ 2 R v 2 2 = 0 .
Hence d k directions of x carry zero Fisher information in x ^ : they are fundamentally unidentifiable.
Step 3 (Cramér-Rao bound on null space). For any direction v ker ( R ) and any unbiased estimator v ¯ = v T X ¯ :
Var [ v ¯ ] v T I v 1 = .
This means null-space components admit infinite variance under any unbiased estimation, i.e., they are inherently irreversible.
Step 4 (Bayesian MMSE via Wiener filter). Treating X N ( μ X , Σ X ) and using the Wiener filter (optimal linear estimator):
X ¯ LMMSE = μ X + Σ X R T ( R Σ X R T + σ 2 I k ) 1 ( X ^ R μ X ) .
The MMSE equals:
MMSE = tr ( Σ X ) tr Σ X R T ( R Σ X R T + σ 2 I k ) 1 R Σ X .
Step 5 (Lower bound via eigendecomposition). Let Σ X = i = 1 d λ i q i q i T be the eigendecomposition. For the ( d k ) eigenvectors q i ker ( R ) , the second term contributes zero, so:
MMSE i ker ( R ) λ i ( d k ) λ min ( Σ X ) > 0 .
Step 6 (DP augmentation). Adding DP noise ( σ 2 > 0 ) increases the denominator R Σ X R T + σ 2 I k , reducing the subtracted term and thus increasing the MMSE:
MMSE ( σ 2 > 0 ) MMSE ( σ 2 = 0 ) ( d k ) λ min ( Σ X ) .
Numerical verification. Voice data ( d = 104 , k = 94 (Table 2), λ min ( Σ X ) 0.001 ): MMSE ( 104 94 ) × 0.001 = 0.010 , corresponding to 1.0 % reconstruction error. The DP noise ( σ = 3.81 ) adds further to this floor, consistent with observed 1.57 - 5.20 % recovery rates.    □
The irreversibility of M in DP is a fundamental aspect of its design, achieved through randomized noise addition and the inherent unpredictability of DP mechanisms. The theoretical guarantees of DP ensure that the original data cannot be reconstructed with a confidence greater than 1 δ , thereby preserving privacy. However, no system is entirely immune to practical attacks that can exploit auxiliary data or weak parameters. In this case, our system has a second level of defense. If an attacker is able to recover X from X ^ , the system of linear equations defined by X = R X has d k degrees of freedom for the unknown X . Among all solutions, X ¯ = R T ( R R T ) 1 X , known as the minimum-norm solution, minimizes the Euclidean norm | X ¯ | = t = 1 d x ¯ t 2 , where x ¯ t represents the elements of X ¯ [45], allowing the recovery of an approximate profile X ¯ with m vectors. However, in [23], the authors show that for behavioral data, RP is an effective privacy-preserving transformation against the minimum-norm solution for both known and unknown R .

5. GAN-based Privacy Attack Analysis

With the rise of ML-based attacks, especially Generative Adversarial Network (GAN)-based attacks, privacy-preserving systems face significant challenges in ensuring data privacy. In this section, we evaluate the resilience of our proposed BA system against such privacy attacks. These attacks can uncover complex relationships between projected (noisy) profiles and the original profiles, posing a serious threat to privacy-preserving mechanisms. In our scenario, we make realistic assumptions regarding the attackers’ prior knowledge and computational capabilities.

5.1. Formal GAN Attack Security Game

Definition 8
(GAN Attack Security Game Game GAN ).
1. 
Setup. A challenger Ch fixes RUIP-BA system parameters ( k , d , ϵ , δ ) and behavioral profile distribution p X .
2. 
Auxiliary data.Adversary A obtains auxiliary plain profiles { X j a u x } j = 1 n a u x and their noisy projected counterparts { X ^ j a u x } using either known or estimated parameters.
3. 
Attack training.  A trains a GAN generator G and discriminator D by minimizing the adversarial objective:
min G max D L ( G , D ) = E x p X [ log D ( x ) ] + E x ^ [ log ( 1 D ( G ( x ^ ) ) ) ] .
4. 
Challenge phase.  Ch provides target X ^ * . A outputs X ¯ * = G ( X ^ * ) .
5. 
Scoring.  ρ ( X ¯ * , X * ) = 1 d j = 1 d 1 [ KS - test ( X ¯ j * , X j * ) passes ] .

5.2. Attacker Knowledge and Capabilities

  • The attacker has knowledge about the operation of the verification algorithm Ver ( · , · ) . The attacker is also aware of the architecture and input-output dimensions of the trained classifier C ( · ) .
  • The attacker has access to the noisy projected profiles of the target BA system. The attacker can obtain the noisy projected profiles from the untrusted verifier or by using model inversion or other attack methods.
  • The attacker has access to the profile generator, which is publicly available software, used by the BA system to collect users’ behavioral data. The attacker will use it to gather auxiliary profiles.
  • The attacker is aware of the distribution and dimensions of R , since this information is public. However, in the worst case, if the seed is compromised, the attacker can also derive the secret R .
  • The attacker is also aware of the type of DP noise applied to the noisy projected profiles, as in most cases this is public information. In the worst-case scenario, the attacker can also obtain the values of the DP parameters.

5.3. Train an Attack Model

The goal of this privacy attack is to reconstruct the plain profiles from the noisy projected profiles and extract the behavioral pattern. For this purpose, the attackers will trains a GAN-based attack model M P ( · ) . The details of each step in the M P ( · ) training process are outlined below.
  • Collect auxiliary data. The attacker will use the profile generator of the target BA system to collect the required auxiliary profiles using a third-party outsourcing platform. There is no limitation on the number of auxiliary profiles, though more auxiliary profiles will lead to a more generalized attack model.
  • RP and DP on auxiliary data. The attacker applies RP to each auxiliary profile to produce a projected version. The random matrix R for RP is generated either using a compromised seed or by leveraging knowledge of the distribution of R . The attacker will then generate DP noise, either by obtaining or gaussing the DP parameters, and add this noise to the projected profiles. To increase the number of projected auxiliary profiles and improve the generalization of M P ( · ) , the attacker can apply multiple instances of R and different instances of DP noise to each auxiliary profile.
  • Train the attack model M P ( · ) . To train M P ( · ) , the attacker will use noisy projected versions of auxiliary profiles as training data and original auxiliary profiles as the ground truth. Figure 2 illustrates the training process of GAN-based M P ( · ) . Let x X denote an original auxiliary data vector, and x ^ X ^ represent its projected noisy version obtained through RP and DP. During training, the generator G takes the projected vector x ^ as input and produces a reconstructed feature vector x ¯ = G ( x ^ ) that aims to approximate the original data x . The discriminator D receives either a real auxiliary sample x or a reconstructed sample x ¯ , and attempts to distinguish between real and generated data. Through adversarial training over the auxiliary datasets, the generator progressively improves its ability to reconstruct original feature vectors from their projected noisy counterparts, thereby learning an effective inverse mapping from the projected space to the original feature space.
The trained M P ( · ) will then be used to reconstruct a plain profile from the compromised noisy projected profile and extract the user’s behavioral pattern. The closer the features of the recovered profile are to the ground truth profile, the higher the likelihood of success for the attacker in recovering the behavioral pattern.

5.4. Privacy Evaluation

We will evaluate the privacy of our proposed system by analyzing the statistical similarity between the features of recovered profiles and their original counterparts.
Definition 9.(ϵ-Distribution-Privacy:) Suppose M P ( · ) is applied to a noisy projected profile X ^ to recover its plain profile. A privacy-preserving BA system is said to offer ϵ-distribution-privacy against a privacy attacker if the best ML-based approach produces a profile X ¯ where no more than ϵ percent of the features pass the statistical similarity tests with the corresponding features of the original profile X .
For this privacy attack, we consider two scenarios: (i) the adversary is aware of the distribution of R and the type of noise added by DP, and (ii) the adversary has access to the secret R and the DP parameters. The second scenario represents the most critical case, where one or more users have been compromised, exposing their R and DP parameters. If the adversary knows R and the DP parameters, they can use them to project the auxiliary profiles and add noise to create noisy projected profiles, which are then used to train M P ( · ) . On the other hand, if R and the DP parameters are unknown, the attacker will generate a random matrix and generate noise based on their guesses of the DP parameters. The aim of the proposed privacy-preserving BA system for both scenarios is to limit the privacy attacker’s ability to recover more than an ϵ -percentage of the features on average from each profile.
Since RP is an inherently lossy process, causing some information about the original profile to be lost during the initial projection, and DP provides theoretical privacy guarantees with high confidence, the attackers will not perfectly reconstruct the original profile from the noisy projected profiles in either scenario. We validated this statement through the experiments presented in Section 6.3, where we used GAN as a attack model.

5.5. GAN Attack - Formal Privacy Proof

Theorem 8
(GAN Nash-Equilibrium Privacy Bound). At the Nash equilibrium of Game GAN (Definition 8), the optimal generator G * satisfies:
G * = arg min G D J S ( p X p G ( X ^ ) ) ,
and the expected feature recoverability fraction obeys:
E [ ρ ( G * ( X ^ ) , X ) ] ρ max ( ϵ , δ , k , d ) : = k d · λ min ( Σ X ) λ min ( Σ X ) + σ 2 ,
where σ is the Gaussian DP noise scale from Lemma 2.
Proof. Step 1 (Optimal discriminator). At fixed G , the optimal discriminator is D * ( x ) = p X ( x ) p X ( x ) + p G ( x ) . Substituting into the loss: C ( G ) = log 4 + 2 D J S ( p X p G ) .
Step 2 (Generator objective). Minimizing C ( G ) is equivalent to minimizing D J S ( p X p G ) , achieved uniquely when p G = p X . However, since X ^ is a noisy, low-dimensional proxy of X , perfect reconstruction is impossible due to the information barrier from Theorem 7.
Step 3 (Mutual information bottleneck). The mutual information between X and X ^ upper-bounds the information accessible to any reconstruction algorithm. By the data processing inequality:
I ( X ; X ^ ) I ( X ; R X ) i = 1 k 1 2 log 1 + λ i ( R Σ X R T ) σ 2 k 2 log 1 + Σ X F 2 σ 2 k .
This means the generator can access at most I ( X ; X ^ ) bits of information about X .
Step 4 (Feature recoverability channel capacity). A feature X j passes the KS test if the KL divergence D K L ( p X ¯ j p X j ) < τ KS . Faithful reconstruction of feature j requires I ( X j ; X ^ ) > log ( 1 / τ KS ) bits. The number of features satisfying this requirement:
# { recoverable features } I ( X ; X ^ ) log ( 1 / τ KS ) k · λ min ( Σ X ) λ min ( Σ X ) + σ 2 .
Step 5 (Normalizing to ratio). Dividing by d:
E [ ρ ] k d · λ min ( Σ X ) λ min ( Σ X ) + σ 2 = ρ max .
Step 6 (Known vs. unknown parameters). When R and DP parameters are known to the attacker, the optimal GAN can potentially learn the exact projection and noise model. However, the information-theoretic barrier from Step 3 still applies: knowledge of R does not increase I ( X ; X ^ ) beyond its value as a sufficient statistic. Hence ρ max is the same for both scenarios.
Numerical verification. Voice data ( d = 104 , k = 94 (Table 2), σ = 3.81 , λ min ( Σ X ) 0.01 ):
ρ max = 94 104 · 0.01 0.01 + 14.52 0.904 × 0.00069 0.062 ,
i.e., 6.2 % recovery, consistent with the observed 1.57 - 5.20 % . Swipe data ( d = 33 , k = 30 , σ 1.73 for ϵ = 9 ): ρ max 0.3 % , consistent with 0.02 - 0.42 % .    □
Corollary 1
(Privacy Guarantee under Worst-Case Attack). Under the strongest possible GAN attack (known R , known DP parameters, unlimited auxiliary data), RUIP-BA satisfies ρ max -irreversibility with:
ρ max = k d · 1 1 + σ 2 / λ min ( Σ X ) k d ,
where the upper bound k / d is achieved in the limit σ 0 (no DP noise). This confirms that even without DP, RP alone guarantees that at most k / d fraction of features can be recovered, and DP strictly improves this bound.

6. Experimental Results

We have implemented and evaluated our proposed approach on three different types of behavioral datasets. We collected voice and swipe pattern data from [46] and drawing pattern data from [25]. The voice and swipe datasets have 10,320 observations (vectors) from 86 users, with 120 observations per user. The drawing pattern data has 80 to 240 observations per user, with 193 distinct users. There are 104 features in voice data, 33 in swipe data, and 65 in drawing data.
Experiment setup. We downloaded and cleaned1 the behavioral profiles before using them in the experiments. To reduce the effect of biases resulting from features having different ranges, we normalized each feature by dividing by its maximum absolute value, mapping all values into [ 1 , 1 ] . For the voice and swipe dataset specifically, this normalization was applied independently per feature column across all 10,320 raw observations (86 users × 120 samples per user), yielding a normalized feature matrix. From each profile, we then separated 20% of the data samples for testing purposes. we follow the same approach for drawing pattern data.
Data oversampling. Neural network classifiers and ML-based attack models require sufficient data samples in each profile for training and validation. To address this, we applied the Synthetic Minority Over-sampling Technique (SMOTE) [47] to each profile. SMOTE is an oversampling algorithm that generates new data samples by interpolating existing ones within a profile without adding new information. We ensured that all three datasets contained at least 200 to 300 data samples per profile after oversampling.
Training and auxiliary data. We divided all profiles in each dataset into two groups: (i) Group 1: all profiles in this group were converted to noisy projected profiles and used to train and validate the NN classifier, and (ii) Group 2: all profiles, along with their noisy projected versions, were used as auxiliary data. For Group 1, we kept around 80% of the profiles (68 voice and swipe profiles and 155 drawing pattern profiles), and the remaining 20% were assigned to Group 2 (18 voice and swipe profiles and 38 drawing pattern profiles).

6.1. Performance of BA System

In this section, we design the NN architecture for BA classifiers and then train them on Group 1 plain profiles, projected profiles, and noisy projected profiles. Finally, we evaluate the correctness and security of those classifiers using test data and compare them.

6.1.1. Performance of BA Classifier for Plain Profiles

We designed three distinct hierarchical NN architectures for three datasets. Table A1 in the Appendix illustrates the NN architectures of a BA classifier, while the other classifiers follow the same basic structure, differing in the number of layers and nodes per layer. For example, the baseline plain-profile classifier of voice dataset follows the architecture 104 128 256 512 256 128 68 , where each hidden layer uses Batch Normalization, ReLU activation, and Dropout (rate = 0.1 for the first layer, increasing toward deeper layers). The final layer applies a 68-way softmax. The model is compiled with RMSprop ( η = 0.001 , ρ = 0.9 ) and a ReduceLROnPlateau callback on validation accuracy. From each Group 1 profile, we allocated 80% of the data for training the classifier and the remaining 20% for model evaluation.
During the training phase, the voice, swipe, and drawing classifiers achieved 97.27%, 97.57%, and 95.68% classification accuracy and 98.47%, 97.06%, and 96.98% validation accuracy, respectively. We then tested the performance of all three trained classifiers using the previously separated test data. Table 3 presents the FAR and FRR of all three BA classifiers for plain profiles, and they achieved below 1.0% FAR and below 4.0% FRR, which is close to the reported performance in the original paper. For voice and swipe data, the authors of [48] reported 0.02% FAR and 3.52% FRR, and [25] reported 1.97% FAR and 1.97% FRR for drawing pattern data.

6.1.2. Performance of BA Classifier for Projected Profiles

For RP, we generated R following the discrete distribution (Achlioptas sparse sign pattern with { 1 , 0 , + 1 } entries). We used the JL lemma [37] to calculate the minimum value of k for each R k × d . Table 2 shows that the minimum values of k are 73 for voice data, 30 for swipe data, and 46 for drawing data, with distance-preserving probabilities of 0.99, 0.94, and 0.99, respectively. For RP, we set k to 94, 30, and 56 for voice, swipe, and drawing data, respectively. After RP, we used the same percentage of projected data from Group 1 to train three new BA classifiers and achieved 95.65%, 96.42%, and 97.34% of training precision and 98.56%, 98.18%, and 99.05% of validation accuracy for voice, swipe, and drawing data, respectively.
For the correctness test of all three trained classifiers, each test profile was projected using the correct R that was used in the training phase, while an incorrect R was used for the security test. In the correctness test, all three classifiers produced 99.56%, 98.93%, and 98.97% classification accuracy, which is equivalent to 0.44%, 1.07%, and 1.03% FRR, respectively. In the security test, the classification accuracy was reduced to 0.45%, 0.23%, and 0.33%, which corresponds to FAR, respectively. The second row of Table 3 shows the FRR and FAR of all three classifiers for RP profiles. A slight performance improvement in all three classifiers is attributed to the use of distinct R during each profile projection, which enhanced the distances among the profiles in the projected domain.

6.2. Privacy-Preserving Properties of BA System

In this section, we examine the renewability and unlinkability properties of our privacy-preserving BA system. The irreversibility property of the system is examined in the next section.
DP parameter selection. For all RP+DP experiments on voice data, the DP noise was parameterized as follows. For Laplace noise, we set ε = 7 and sensitivity Δ 2 ( g ) = 1 (normalized features lie in [ 1 , 1 ] ), yielding a Laplace scale b = Δ 2 / ε = 1 / 7 0.143 . For Gaussian noise, we used ε = 7 , δ = 10 5 , and computed
σ = 2 ln ( 1.25 / δ ) ε = 2 ln ( 125 , 000 ) 7 4.845 7 0.692 .
For the invalid-claim security test, a distinct ε = 5 (Laplace, scale = 0.2 ) was used to simulate a stricter privacy regime. The RP+DP classifier for voice data (VDP3 architecture, 94 64 128 256 128 68 , trained for 200 epochs) achieved 97.38% test accuracy (loss = 0.077 ). Non-enrolled users presented under the wrong projection matrix attained only 1.65% accuracy (loss = 13.98 ), confirming strict rejection of impostors even after adding DP noise.

6.2.1. Renew Training Profile

To assess the renewability property of our proposed privacy-preserving BA system, each plain training profile was projected using a different R than the one used during the registration phase, followed by the addition of DP noise. For each dataset, we then updated C ( · ) using the newly generated noisy projected profiles. For the RP with Laplace and Gaussian noise, the training and validation accuracy of all updated C ( · ) s are 96.64% and 98.60%, and 97.87% and 98.96% for voice data, 96.16% and 97.97%, and 97.37% and 97.03% for swipe data, and 95.65% and 96.57%, and 96.65% and 98.43% for drawing data in 200 training rounds.
The security of all updated BA classifiers was tested using the previously used test data projected by an incorrect R and DP noise, which was generated using incorrect parameters. The updated classifiers achieved FARs of approximately 0.18% and 1.64% for voice data, 1.95% and 1.70% for swipe data and 1.94% and 1.55% for drawing data with both types of noise, as expected. However, when the test data were projected using the correct R , and the correct DP parameters were used to add DP noise, the correctness of the systems was restored to 96.85% (3.15% FRR) and 97.58% (2.42% FRR) for voice data with Laplace and Gaussian noise, respectively. For swipe data, the correctness was 97.74% (2.26% FRR) and 98.03% (1.97% FRR), and for drawing data, it was 98.15% (1.85% FRR) and 97.32% (2.68% FRR). These results confirm the renewability property of our privacy-preserving BA systems, demonstrating their ability to maintain both correctness and security even when the BA classifier is updated with new noisy projected profiles. A summary of the results is provided in the model update section of Table 3, while the training accuracy across different communication rounds is presented in Figure A1 in the Appendix.

6.2.2. Similarity of Divergence Distributions

To ensure unlinkability, the JS divergence distribution between two noisy projected profiles generated from the same source using different R matrices and different DP parameters should be as close as the JS divergence distribution of invalid claims (different sources, different R , and different DP parameters). Furthermore, it should also differ from the JS divergence distribution of valid claims (same source, same R , and same DP parameters). To evaluate this, we calculated the symmetric JS divergences for all pairs of noisy projected profiles under three scenarios: (i) profiles originate from different sources and are projected using different R matrices and different DP parameters (invalid claims); (ii) profiles originate from the same source but are projected using different R matrices and different DP parameters; and (iii) profiles originate from the same source, are projected using the same R , and use the same DP parameters (valid claims). During profile projection, we ensured acceptable dimensionality reduction across all datasets to preserve distances and set Laplace and Gaussian parameters to maintain moderate privacy while ensuring the distance-preserving property.
Figure 3 illustrates the divergence distributions of the noisy projected profiles across three cases within the three datasets. It is evident that the divergence distribution in case 1 (different sources, different R , and different DP parameters) aligns more closely with case 2 (same source, different R , and different DP parameters) compared to case 3 (same source, same R , and same DP parameters).
Quantitative profile-distance analysis (voice data as example). To provide a precise statistical justification, we computed k-NN divergence (with k = 5 , using the efficient kd-tree estimator of Wang et al. [43]) between every pair of noisy projected profiles over all 86 voice users under both Laplace and Gaussian perturbation. Valid-claim divergences ( X ^ same R u vs. X ^ same R u ) ranged from approximately 0.14 to 31.2 (Laplace) and 0.02 to 33.7 (Gaussian), consistent with the tight within-user intra-class geometry. By contrast, invalid-claim (cross-user) divergences ranged from 137 to 210 (Laplace/Gaussian)-roughly two orders of magnitude larger-confirming strong separation. Inter-profile divergences (same source, different R ) fell in a similarly high range (approximately 112 to 210), making them statistically indistinguishable from invalid-claim divergences.
To further analyze this, the Kolmogorov-Smirnov (KS) two-sample test was conducted on these empirical distributions (each of length n = 86 ). Table 4 reports the exact p-values:
The p-values are critical: while valid-claim divergences are completely distinct from both invalid and inter-profile distributions ( p 5.5 × 10 51 0.05 ), the distributions of invalid claims and inter-profile divergences are statistically indistinguishable ( p = 0.0693 > 0.05 ). This is the key unlinkability result: an adversary observing two noisy projected profiles of the same user-generated with different R matrices-cannot distinguish them from profiles of two entirely different users, because their divergence distributions are drawn from the same statistical population. This confirms that for the voice dataset, the null hypothesis of identical distributions between cases 1 and 2 cannot be rejected at α = 0.05 , providing direct empirical corroboration of Theorem 6.
For the similarity test between cases 1 and 2, p-values were consistently much higher than those obtained from the similarity test between cases 2 and 3, with some even exceeding 0.05. This indicates that the distributions of cases 1 and 2 exhibit similar patterns, whereas the distributions of cases 2 and 3 differ significantly. These findings confirm that an attacker cannot associate the noisy projected profiles in our BA system based on their symmetric divergence, consistent with Theorem 6.

6.3. Performance of ML-Based Attack

In this section, we trained an ML-based attack model M P ( · ) to attempt the recovery of the features from noisy projected profiles that were compromised.

6.3.1. Train an Attack Model

For the attack model, we designed three similar pairs of generator and discriminator architectures, one for each dataset. The generator aims to reconstruct the original (plain) profiles from the projected profiles, while the discriminator attempts to distinguish between real and reconstructed profiles, thereby guiding the generator to produce more accurate recoveries. Table A2 in the Appendix shows the GAN-based generator and discriminator architectures. We also adopted the DNN-based attack model proposed in [24] for comparison.
After collecting auxiliary profiles through the profile generator, we projected them using newly generated R (when only the distribution of R is known) or preexisting R (when the exact matrix R is known). DP noise was then added to each projected auxiliary profile using known DP parameters or estimating them based on prior knowledge or assumptions. All noisy projected profiles, along with their corresponding plain profiles2, were used to train both types of attack models.
DNN attack architecture and training ( voice dataset as example). For the voice dataset, the DNN inversion regressor follows the architecture 94 128 256 256 128 104 with sigmoid output activation (total 160,360 parameters, 158,824 trainable), compiled with MSE loss and SGD optimizer. The auxiliary set consists of 18 × 200 = 3 , 600 samples projected with per-instance random R and perturbed using ε = 9 , δ = 10 5 (Gaussian, σ 0.538 ) or ε = 7 (Laplace) DP noise. Training runs for 200 epochs (batch 64). For the unknown- R scenario, epoch 1 yields training MSE = 0.1288 / validation MSE = 0.1191 ; by epoch 200 the MSE stabilizes at 0.0153 / 0.0160 , indicating convergence. On the held-out test set, the final evaluate MSE = 0.0306 -substantially above zero, confirming that even a well-optimized regressor cannot invert the RP+DP transformation with fidelity.
GAN attack architecture and training. For the GAN-based attack models, we used fully connected generator and discriminator architectures, where the generator reconstructs plain profiles from projected profiles and the discriminator distinguishes between real and reconstructed profiles. The networks were trained using BCELoss for the discriminator and a weighted combination of BCELoss + MSELoss for the generator ( λ recon = 1 to 10.0 ). Both networks were optimized using Adam ( η = 2 × 10 4 , β 1 = 0.5 ) over 200 epochs. For the unknown- R voice scenario, the discriminator loss starts at D = 1.317 , G = 1.149 (epoch 0) and converges to D = 1.383 , G = 0.766 (epoch 190); for the known- R scenario, convergence is D = 1.383 , G = 0.753 . In both cases the discriminator and generator losses settle near the theoretical Nash-equilibrium value of ln 2 0.693 (binary cross-entropy optimum), which is consistent with the GAN Nash-Equilibrium Privacy Bound of Theorem 8: the generator cannot significantly improve reconstruction quality once the discriminator reaches near-random guessing. For all cases, the GAN successfully learned to reconstruct the original profiles, achieving discriminator and generator losses of 1.35-1.38 and 0.75-0.76 for voice data, 1.32-1.38 and 0.74-0.75 for swipe data, and 1.36-1.37 and 0.71-0.72 for drawing data. In the case of DNN, we used mean squared error as the loss function and RMSProp as the optimizer with a learning rate of 0.001. For unknown R and unknown DP parameters, after 200 epochs of training, the training and validation loss for M P ( · ) for both types of noise ranges between 0.014 and 0.016 for voice data, between 0.0013 and 0.0022 for swipe data and between 0.0013 and 0.0029 for drawing data.

6.3.2. Recover Feature

Both types of trained attack models were used to recover plain profiles from each compromised noisy projected profile. We employed the KS-test to evaluate the distribution similarity between the features of the recovered profiles and the ground-truth profiles.
Mean feature recovery rates (voice data as example, unknown R ). For the DNN-based inversion model under the unknown- R scenario on voice data, across 68 enrolled users only 66 out of 68 × 104 = 7 , 072 total feature recovery attempts yielded a KS-passing result under Laplace noise, and 62 under Gaussian noise. This corresponds to aggregate mean feature recovery rates of ρ ¯ DNN , L = 3.79 % and ρ ¯ DNN , G = 2.31 % , respectively, providing concrete verification of Theorem 8. The per-user feature counts for the DNN attack under Laplace noise are: {6, 4, 3, 4, 4, 2, 5, 0, 5, 6, 4, 1, 0, 2, 1, 10, 6, 5, 2, 3, 2, 7, 3, 7, 7, 2, 6, 9, 3, 13, 2, 3, 3, 2, 2, 4, 8, 0, 4, 2, 4, 2, 2, 5, 4, 5, 0, 3, 4, 7, 2, 1, 1, 3, 4, 4, 4, 3, 5, 2, 6, 3, 1, 4, 6, 6, 2, 3}, confirming that even the most-recovered user (user 16, 10 features) regains less than 12.5% of their 104 voice features-far below any threshold that would enable behavioral re-identification.
For all three datasets, Figure 4(a-c) presents the percentage of features per profile that passed the KS-test for both attack networks across all three datasets when both the random projection matrix R and DP parameters were unknown to the attacker. In this setting, the attacker reconstructs R by sampling it from the known distribution and randomly guesses the DP parameters. For the DNN-based attack, the results are broadly consistent with those reported in [24]; however, the addition of DP noise reduces the overall recovery performance. Specifically, for the voice and swipe datasets, out of 86 noisy projected profiles, the attacker was able to recover at least one feature from 59 and 8 profiles, respectively, under Laplace noise, and from 55 and 2 profiles, respectively, under Gaussian noise. For the drawing dataset, at least one feature was successfully recovered from 30 out of 155 profiles with Laplace noise and from 5 out of 155 profiles with Gaussian noise.
GAN provides slightly better recovery, but the improvement is still limited. Out of 86 noisy projected profiles for voice and swipe data, at least one feature was recovered from 58 and 54 profiles, respectively, with Laplace noise, and from 15 and 5 profiles, respectively, with Gaussian noise. For drawing data, at least one feature was recovered from 30 and 22 profiles out of 155 profiles for Laplace and Gaussian noise, respectively. Despite these results, the average percentage of recovered features per compromised profile remains low.
Figure 5 (a-c) presents the results when the attacker knows both R and DP parameters. In this case, recovery improves slightly for both DNN and GAN-based attacks. For DNN, the attacker recovered at least one feature from 64 and 7 voice and swipe profiles, respectively, with Laplace noise, and from 56 and 2 profiles with Gaussian noise. For drawing data, at least one feature was recovered from 41 and 5 profiles out of 155 profiles for Laplace and Gaussian noise, respectively. For GAN, at least one feature was recovered from 65 and 24 voice and swipe profiles, respectively, with Laplace noise, and from 60 and 7 profiles with Gaussian noise. For drawing data, at least one feature was recovered from 51 and 22 profiles out of 155 profiles for Laplace and Gaussian noise, respectively. Even with this knowledge, the average percentage of recovered features per profile remains low.
Overall, attacks performed with known R and DP parameters achieve slightly better results compared to those with unknown parameters. Profiles with added Laplace noise are marginally more susceptible than those with Gaussian noise, likely due to the more deterministic behavior of Laplace noise. Nevertheless, for an individual profile, the percentage of recovered features remains low, ranging from 5% to 14% for voice data, 1% to 3% for swipe data, and 1% to 8% for drawing data under both noise types and both parameter scenarios. This level of recovery is insufficient for identifying behavioral patterns or supporting future attacks. Critically, all observed ρ values satisfy ρ ρ max from Theorem 8 and Corollary 1.

6.4. Results Comparison

We critically compare the performance of our proposed privacy-preserving BA system with existing systems in the literature that adopt the cancelable biometric approach. The comparison emphasizes key aspects such as the methods and data types used, system performance, evaluated privacy properties, and resilience against various attacks. While some related works excel in certain areas, our system achieves competitive performance by balancing high authentication accuracy with robust protection against diverse privacy threats, including GAN-based attacks. A summary of the comparisons is in Table 5.
  • Some systems relied solely on an RP-based approach [23,24], while others combined RP with local binary pattern (LBP) [33], backpropagation neural network (BPNN) [35], or applied double random phase encryption (DRPE) with fractional Fourier transform (FFT) [34]. In this work, we combined DP with RP to offer theoretical and experimental privacy guarantees, distinguishing our approach from existing methods.
  • Most of the related works used biometric data in their experiments, except Taheri et al. [23,24], who included biometric and behavioral data. Since our focus is solely on BA systems, we used three different behavioral datasets in the experiments.
  • Our system achieves higher performance accuracy than most other systems, except for a few that use biometric data, as expected, since biometric data are inherently more distinctive than behavioral data.
  • Only our proposed system, along with [34] and [24], meets all privacy-preserving criteria for authentication systems, with our system achieving this specifically for behavioral data.
  • We considered most of the potential attacks applicable to our system and also introduced GAN-based privacy attacks as a novel privacy evaluation method. Our approach is unique in providing formal information-theoretic proofs (Theorems 5-8) for all three privacy properties.

7. Conclusion

In this paper, we presented RUIP-BA, a privacy-preserving behavioral authentication (BA) system designed to meet all three ISO/IEC 24745 requirements of Renewability, Unlinkability, and Irreversibility-detailing its design, implementation, formal analysis, and evaluation. The acronym RUIP-BA (Renewable, Unlinkable, Irreversible Privacy-Preserving Behavioral Authentication) directly encodes the three-property guarantee in the system’s name. Leveraging random projection (RP) and local differential privacy (DP), our system achieves a balance between high authentication accuracy and robust data privacy. Experiments on voice, swipe, and drawing pattern datasets demonstrated high accuracy rates exceeding 96% for user authentication while preserving user privacy. The system effectively mitigated privacy risks, as shown by low false rejection and acceptance rates.
New formal contributions include: (i) the Johnson-Lindenstrauss Lemma and RP sensitivity lemma providing dimensionality reduction guarantees; (ii) three formal security games formalizing the ISO/IEC 24745 properties as PPT adversarial games; (iii) a Bhattacharyya-coefficient renewal bound (Theorem 5) showing adversarial advantage 0.011 for voice data; (iv) a full KL/JS divergence derivation for unlinkability (Theorem 6) grounded in Gaussian mechanism analysis; (v) a Cramér-Rao/Bayesian MMSE lower bound for irreversibility (Theorem 7) showing null-space dimensions are fundamentally unrecoverable; and (vi) a GAN Nash-equilibrium privacy bound (Theorem 8) bounding feature recoverability by ρ max = k d · λ min λ min + σ 2 .
Comprehensive security and privacy analysis, including evaluations against sophisticated GAN-based privacy attacks, validated the robustness of our approach, showing that a very small percentage of behavioral features could be reconstructed by the most powerful attack scenarios. Our findings highlight the effectiveness of RP and DP in protecting user profiles from potential threats. Beyond BA systems, our approach offers broader applicability to domains such as biometric authentication and high-dimensional data publishing. Future work will explore adaptive strategies to enhance resilience against evolving attack models and extend the system’s applicability to multimodal biometric authentication scenarios.

Appendix A

Table A1. The NN architectures of BA classifier consist of dense layers, batch-normalization layers, activation layers, and dropout layers. The classifiers across different datasets use the same architectural framework but differ in the number of layers and the number of nodes in each layer.
Table A1. The NN architectures of BA classifier consist of dense layers, batch-normalization layers, activation layers, and dropout layers. The classifiers across different datasets use the same architectural framework but differ in the number of layers and the number of nodes in each layer.
Layer (type) Output Shape Param #
dense_1 (Dense) (None, 64) 3,648
batch_normalization_1 (BatchNormalization) (None, 64) 256
activation_1 (Activation) (None, 64) 0
dropout_1 (Dropout) (None, 64) 0
dense_2 (Dense) (None, 128) 8,320
batch_normalization_2 (BatchNormalization) (None, 128) 512
activation_2 (Activation) (None, 128) 0
dropout_2 (Dropout) (None, 128) 0
dense_3 (Dense) (None, 64) 8,256
batch_normalization_3 (BatchNormalization) (None, 64) 256
activation_3 (Activation) (None, 64) 0
dropout_3 (Dropout) (None, 64) 0
dense_4 (Dense) (None, 155) 10,075
Table A2. Fully connected generator and discriminator architectures. This GAN architecture is used as an attack model to recover the original (plain) profiles from the projected profiles.
Table A2. Fully connected generator and discriminator architectures. This GAN architecture is used as an attack model to recover the original (plain) profiles from the projected profiles.
Generator Architecture
Layer (type) Output Shape Activation
Input (y) | y | -
Linear ( | y | 128 ) 128 ReLU
Linear (128 → 128) 128 ReLU
Linear (128 | x | ) | x | None
Discriminator Architecture
Layer (type) Output Shape Activation
Input (Concatenated [ y , x ] ) | x | + | y | -
Linear ( | x | + | y | 128 ) 128 LeakyReLU (0.2)
Linear (128 → 128) 128 LeakyReLU (0.2)
Linear (128 1 ) 1 Sigmoid
Figure A1. All three updated classifiers, trained for 200 epochs with both types of noisy projected profiles, achieved training accuracies of 90.76% and 93.01% for voice data, 96.29% and 95.58% for swipe data, and 91.76% and 91.71% for drawing data. Validation accuracies were 99.13% and 99.68% for voice, 99.27% and 99.66% for swipe, and 99.78% and 99.65% for drawing data.
Figure A1. All three updated classifiers, trained for 200 epochs with both types of noisy projected profiles, achieved training accuracies of 90.76% and 93.01% for voice data, 96.29% and 95.58% for swipe data, and 91.76% and 91.71% for drawing data. Validation accuracies were 99.13% and 99.68% for voice, 99.27% and 99.66% for swipe, and 99.78% and 99.65% for drawing data.
Preprints 206265 g0a1

References

  1. Islam, M.M.; Safavi-Naini, R. POSTER: A behavioural authentication system for mobile users. In Proceedings of the Proceedings of the 2016 ACM Conference on Computer and Communications Security (CCS ’16). ACM, 2016, pp. 1742–1744.
  2. Chong, P.; Elovici, Y.; Binder, A. User authentication based on mouse dynamics using deep neural networks: A comprehensive study. IEEE Transactions on Information Forensics and Security 2019, 15, 1086–1101. [Google Scholar] [CrossRef]
  3. Jung, D.; Nguyen, M.D.; Han, J.; Park, M.; Lee, K.; Yoo, S.; Kim, J.; Mun, K.R. Deep neural network-based gait classification using wearable inertial sensor data. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2019, pp. 3624–3628.
  4. Deng, Y.; Zhong, Y. Keystroke dynamics advances for mobile devices using deep neural network. Recent Advances in User Authentication Using Keystroke Dynamics Biometrics 2015, 2, 59–70. [Google Scholar]
  5. Gong, X.; Wang, Q.; Chen, Y.; Yang, W.; Jiang, X. Model extraction attacks and defenses on cloud-based machine learning models. IEEE Communications Magazine 2020, 58, 83–89. [Google Scholar] [CrossRef]
  6. Islam, M.M.; Safavi-Naini, R. Model Inversion for Impersonation in Behavioral Authentication Systems. In Proceedings of the SECRYPT, 2021, pp. 271–282.
  7. Secretary, I. Information technology–security techniques–biometric information protection. International Organization for Standardization, Standard ISO/IEC 2011, 24745, 2011.
  8. Kelkboom, E.J.; Breebaart, J.; Kevenaar, T.A.; Buhan, I.; Veldhuis, R.N. Preventing the decodability attack based cross-matching in a fuzzy commitment scheme. IEEE Transactions on Information Forensics and Security 2010, 6, 107–121. [Google Scholar] [CrossRef]
  9. Islam, M.M.; Safavi-Naini, R. Fuzzy Vault for Behavioral Authentication System. In Proceedings of the ICT Systems Security and Privacy Protection: 35th IFIP TC 11 International Conference, SEC 2020, Maribor, Slovenia, September 21–23, 2020, Proceedings 35. Springer, 2020, pp. 295–310.
  10. Chauhan, S.; Sharma, A. Improved fuzzy commitment scheme. International Journal of Information Technology 2022, 14, 1321–1331. [Google Scholar] [CrossRef]
  11. Wang, Y.; Li, B.; Zhang, Y.; Wu, J.; Ma, Q. A secure biometric key generation mechanism via deep learning and its application. Applied Sciences 2021, 11, 8497. [Google Scholar] [CrossRef]
  12. Mir, O.; Roland, M.; Mayrhofer, R. DAMFA: Decentralized anonymous multi-factor authentication. In Proceedings of the Proceedings of the 2nd ACM International Symposium on Blockchain and Secure Critical Infrastructure, 2020, pp. 10–19.
  13. Kim, S.; Mun, H.J.; Hong, S. Multi-factor authentication with randomly selected authentication methods with DID on a random terminal. Applied Sciences 2022, 12, 2301. [Google Scholar] [CrossRef]
  14. Al-Rubaie, M.; Chang, J.M. Privacy-preserving machine learning: Threats and solutions. IEEE Security & Privacy 2019, 17, 49–58. [Google Scholar] [CrossRef]
  15. Loya, J.; Bana, T. Privacy-Preserving Keystroke Analysis using Fully Homomorphic Encryption & Differential Privacy. In Proceedings of the 2021 International Conference on Cyberworlds (CW). IEEE, 2021, pp. 291–294.
  16. Baig, A.F.; Eskeland, S.; Yang, B. Privacy-preserving continuous authentication using behavioral biometrics. International Journal of Information Security 2023, 22, 1833–1847. [Google Scholar] [CrossRef]
  17. Soutar, C.; Roberge, D.; Stoianov, A.; Gilroy, R.; Kumar, B.V. Biometric encryption using image processing. In Proceedings of the Optical Security and Counterfeit Deterrence Techniques II. SPIE, 1998, Vol. 3314, pp. 178–188.
  18. Usman, M.; Jan, M.A.; Puthal, D. Paal: A framework based on authentication, aggregation, and local differential privacy for internet of multimedia things. IEEE Internet of Things Journal 2019, 7, 2501–2508. [Google Scholar] [CrossRef]
  19. Chamikara, M.A.P.; Bertok, P.; Khalil, I.; Liu, D.; Camtepe, S. Privacy preserving face recognition utilizing differential privacy. Computers & Security 2020, 97, 101951. [Google Scholar] [CrossRef]
  20. Wazzeh, M.; Ould-Slimane, H.; Talhi, C.; Mourad, A.; Guizani, M. Privacy-preserving continuous authentication for mobile and iot systems using warmup-based federated learning. IEEE Network 2022.
  21. Yu, F.X.; Rawat, A.S.; Menon, A.K.; et al. Federated learning with only positive labels. In Proceedings of the Proceedings of the 37th International Conference on Machine Learning (ICML). PMLR, 2020, Vol. 119, pp. 10946–10956.
  22. Yang, W.; Wang, S.; Kang, J.J.; Johnstone, M.N.; Bedari, A. A linear convolution-based cancelable fingerprint biometric authentication system. Computers & Security 2022, 114, 102583. [Google Scholar] [CrossRef]
  23. Taheri, S.; Islam, M.M.; Safavi-Naini, R. Privacy-Enhanced Profile-Based Authentication Using Sparse Random Projection. In Proceedings of the Proceedings of the IFIP SEC’17. Springer, 2017, pp. 474–490.
  24. Islam, M.M.; Rafiq, M.A.; Islam, M.A. A Privacy-Preserving Behavioral Authentication System. In Proceedings of the International Symposium on Foundations and Practice of Security. Springer, 2024, pp. 95–107.
  25. Islam, M.M.; Safavi-Naini, R.; Kneppers, M. Scalable behavioral authentication. IEEE Access 2021, 9, 43458–43473. [Google Scholar] [CrossRef]
  26. Baig, A.F.; Eskeland, S.; Yang, B. Novel and Efficient Privacy-Preserving Continuous Authentication. Cryptography 2024, 8, 3. [Google Scholar] [CrossRef]
  27. Meng, W.; Wong, D.S.; Furnell, S.; Zhou, J. Surveying the development of biometric user authentication on mobile phones. IEEE Communications Surveys & Tutorials 2014, 17, 1268–1293. [Google Scholar] [CrossRef]
  28. Huixian, L.; et al. Key binding based on biometric shielding functions. In Proceedings of the Information Assurance and Security, 2009. IAS’09. Fifth International Conference on. IEEE, 2009, Vol. 1, pp. 19–22.
  29. Dodis, Y.; Ostrovsky, R.; Reyzin, L.; Smith, A. Fuzzy extractors: How to generate strong keys from biometrics and other noisy data. SIAM journal on computing 2008, 38, 97–139. [Google Scholar] [CrossRef]
  30. Domingo-Ferrer, J.;Wu, Q.; Blanco-Justicia, A. Flexible and robust privacy-preserving implicit authentication. In Proceedings of the ICT Systems Security and Privacy Protection: 30th IFIP TC 11 International Conference, SEC 2015, Hamburg, Germany, May 26-28, 2015. Springer, 2015, pp. 18–34.
  31. Wang, Y.; Plataniotis, K.N. An analysis of random projection for changeable and privacy-preserving biometric verification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 2010, 40, 1280–1293. [Google Scholar] [CrossRef] [PubMed]
  32. Punithavathi, P.; Geetha, S. Dynamic sectored random projection for cancelable iris template. In Proceedings of the 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, 2016, pp. 711–715.
  33. Deshmukh, M.; Balwant, M.K. Generating cancelable palmprint templates using local binary pattern and random projection. In Proceedings of the 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). IEEE, 2017, pp. 203–209.
  34. Rajasekar, V.; Premalatha, J.; Sathya, K. Cancelable Iris template for secure authentication based on random projection and double random phase encoding. Peer-to-Peer Networking and Applications 2021, 14, 747–762. [Google Scholar] [CrossRef]
  35. Peng, J.; Gupta, B.B.; Abd El-Latif, A.A. A biometric cryptosystem scheme based on random projection and neural network. Soft Computing 2021, 25, 7657–7670. [Google Scholar] [CrossRef]
  36. Kaski, S. Dimensionality reduction by random mapping: Fast similarity computation for clustering. In Proceedings of the 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98CH36227). IEEE, 1998, Vol. 1, pp. 413–418.
  37. Dasgupta, S.; Gupta, A. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures & Algorithms 2003, 22, 60–65. [Google Scholar]
  38. Achlioptas, D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences 2003, 66, 671–687. [Google Scholar] [CrossRef]
  39. Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer, 2006, pp. 265–284.
  40. Dong, J.; Roth, A.; Su, W.J. Gaussian differential privacy. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2022, 84, 3–37. [Google Scholar] [CrossRef]
  41. Wang, R.; Fung, B.C.; Zhu, Y. Heterogeneous data release for cluster analysis with differential privacy. Knowledge-Based Systems 2020, 201, 106047. [Google Scholar] [CrossRef]
  42. Nielsen, F. On a variational definition for the Jensen-Shannon symmetrization of distances based on the information radius. Entropy 2021, 23, 464. [Google Scholar] [CrossRef] [PubMed]
  43. Wang, Q.; Kulkarni, S.R.; Verdú, S. Divergence estimation for multidimensional densities via k-nearest-neighbor distances. IEEE Transactions on Information Theory 2009, 55, 2392–2405. [Google Scholar] [CrossRef]
  44. Zheng, Z.; Li, Z.; Huang, C.; Long, S.; Li, M.; Shen, X. Data poisoning attacks and defenses to LDP-based privacy-preserving crowdsensing. IEEE Transactions on Dependable and Secure Computing 2024.
  45. Demmel, J.W.; Higham, N.J. Improved error bounds for underdetermined system solvers. SIAM Journal on Matrix Analysis and Applications 1993, 14, 1–14. [Google Scholar] [CrossRef]
  46. Gupta, S.; Buriro, A.; Crispo, B. A chimerical dataset combining physiological and behavioral biometric traits for reliable user authentication on smart devices and ecosystems. Data in brief 2020, 28, 104924. [Google Scholar] [CrossRef] [PubMed]
  47. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 2002, 16, 321–357. [Google Scholar] [CrossRef]
  48. Gupta, S.; Buriro, A.; Crispo, B. DriverAuth: A risk-based multi-modal biometric-based driver authentication scheme for ride-sharing platforms. Computers & Security 2019, 83, 122–139. [Google Scholar]
1
We replaced “NaN” and “Infinity” with zero and dropped duplicate rows.
2
We used more than one R and different DP noise to generate multiple noisy projected profiles from a single auxiliary profile.
Figure 1. RUIP-BA system architecture decomposed into three operational phases and a privacy & security evaluation layer. Enrollment/Registration Phase [top]: Each user u i submits raw behavioral biometrics (voice, swipe, or drawing) that are dimensionality-reduced via random projection ( X i = R i X i ) and locally perturbed by differential privacy ( X ^ i = M ( X i ) ) before transmission to the Verifier/Server, where templates are stored and an ML classifier C is trained; the original biometric never leaves the device. Template Renewal Phase [middle]: Upon compromise detection for user u j , a fresh projection matrix R j regenerates a new protected template X ^ j = M ( R j X j ) , replacing the old entry in the database and triggering classifier retraining to ensure revocability and forward unlinkability (Theorem 6). Authentication Phase [middle]: A claimant u i applies the same device-side pipeline to a fresh sample Y i ( Y ^ i = M ( R i Y i ) ) and submits a verification claim to the server, which compares the stored template against the live transformed sample to produce an Accept or Reject decision. External Attacker Model [center]: An adversary compromising the projection and DP parameters feeds them into a GAN-based model M P to reconstruct the original biometric; recoverability is quantified by recovery ratio ρ (Theorems 8-9). Privacy & Security Evaluation [bottom]: Three auditors assess system guarantees: the Unlinkability Analyzer uses D J S divergence and the KS-test to confirm template indistinguishability (Theorem 7); the Irreversibility Analyzer evaluates GAN-based inversion resistance via ρ (Theorems 8-9); and the Renewability Auditor applies the Bhattacharyya bound to verify independence of renewed templates (Theorem 6). Theorem numbers refer to the formal results in Section 4.3.
Figure 1. RUIP-BA system architecture decomposed into three operational phases and a privacy & security evaluation layer. Enrollment/Registration Phase [top]: Each user u i submits raw behavioral biometrics (voice, swipe, or drawing) that are dimensionality-reduced via random projection ( X i = R i X i ) and locally perturbed by differential privacy ( X ^ i = M ( X i ) ) before transmission to the Verifier/Server, where templates are stored and an ML classifier C is trained; the original biometric never leaves the device. Template Renewal Phase [middle]: Upon compromise detection for user u j , a fresh projection matrix R j regenerates a new protected template X ^ j = M ( R j X j ) , replacing the old entry in the database and triggering classifier retraining to ensure revocability and forward unlinkability (Theorem 6). Authentication Phase [middle]: A claimant u i applies the same device-side pipeline to a fresh sample Y i ( Y ^ i = M ( R i Y i ) ) and submits a verification claim to the server, which compares the stored template against the live transformed sample to produce an Accept or Reject decision. External Attacker Model [center]: An adversary compromising the projection and DP parameters feeds them into a GAN-based model M P to reconstruct the original biometric; recoverability is quantified by recovery ratio ρ (Theorems 8-9). Privacy & Security Evaluation [bottom]: Three auditors assess system guarantees: the Unlinkability Analyzer uses D J S divergence and the KS-test to confirm template indistinguishability (Theorem 7); the Irreversibility Analyzer evaluates GAN-based inversion resistance via ρ (Theorems 8-9); and the Renewability Auditor applies the Bhattacharyya bound to verify independence of renewed templates (Theorem 6). Theorem numbers refer to the formal results in Section 4.3.
Preprints 206265 g001
Figure 2. The GAN-based attack model M P ( · ) is trained to recover a plain profile X ¯ from a noisy projected profile X ^ . The model is optimized using the adversarial loss, which encourages the generated profiles to be indistinguishable from the original ones.
Figure 2. The GAN-based attack model M P ( · ) is trained to recover a plain profile X ¯ from a noisy projected profile X ^ . The model is optimized using the adversarial loss, which encourages the generated profiles to be indistinguishable from the original ones.
Preprints 206265 g002
Figure 3. For all three datasets, the distribution of JS divergence between two noisy projected profiles generated from the profiles using different R and different DP noise is different if they originate from the same source and close if they originate from different sources.
Figure 3. For all three datasets, the distribution of JS divergence between two noisy projected profiles generated from the profiles using different R and different DP noise is different if they originate from the same source and close if they originate from different sources.
Preprints 206265 g003
Figure 4. Performance of DNN and GAN-based privacy attackers when the distribution of R is known. The average percentage of recovered features from those profiles that are compromised is very low: 1.57% to 2.82% from voice data, 0.02% to 0.27% from swipe data, and 0.03% to 0.38% from drawing data for both noise types. This is consistent with the theoretical bound ρ max from Theorem 8.
Figure 4. Performance of DNN and GAN-based privacy attackers when the distribution of R is known. The average percentage of recovered features from those profiles that are compromised is very low: 1.57% to 2.82% from voice data, 0.02% to 0.27% from swipe data, and 0.03% to 0.38% from drawing data for both noise types. This is consistent with the theoretical bound ρ max from Theorem 8.
Preprints 206265 g004
Figure 5. Performance of ML-based privacy attackers when R is known. The average percentage of recovered features from those profiles that are compromised is still very low: 1.67% to 5.20% from voice data, 0.02% to 0.42% from swipe data, and 0.03% to 0.51% from drawing data for both noise types. By Corollary 1, this is bounded by k / d even in the worst case.
Figure 5. Performance of ML-based privacy attackers when R is known. The average percentage of recovered features from those profiles that are compromised is still very low: 1.67% to 5.20% from voice data, 0.02% to 0.42% from swipe data, and 0.03% to 0.51% from drawing data for both noise types. By Corollary 1, this is bounded by k / d even in the worst case.
Preprints 206265 g005
Table 1. List of notations.
Table 1. List of notations.
Notation Meaning Notation Meaning
x , y Data sample (vector) d Vector dimension (total features)
X A behavioral profile n , m Number of vectors
Y Verification data (profile) R Random matrix
x , y Projected vector M Differential privacy algorithm
X , Y Projected profile C ( · ) ML-based classifier
X ^ , Y ^ Noisy projected profile Ver ( · , · ) Verification algorithm
y ^ Prediction vector M P ( · ) ML model for privacy attack
Σ X Profile covariance matrix G , D GAN generator, discriminator
ρ Feature recoverability fraction λ min Minimum eigenvalue
Δ 2 ( f ) 2 sensitivity of function f TV Total variation distance
Table 2. The minimum acceptable value of k in RP is calculated using the JL lemma (Theorem 1) for voice, swipe, and drawing data. A detailed description of the symbols used in the lemma is provided in [37].
Table 2. The minimum acceptable value of k in RP is calculated using the JL lemma (Theorem 1) for voice, swipe, and drawing data. A detailed description of the symbols used in the lemma is provided in [37].
Data Set k n ϵ β 1 n β
Voice data 73 200 0.5 1 0.99
Swipe data 30 300 1.0 0.5 0.94
Drawing data 46 300 0.7 1 0.99
Table 3. The performance of C ( · ) is evaluated by plain, projected, and noisy projected profiles. The minimal variation in FRR and FAR after RP and after RP+DP confirms the correctness and security properties of all three privacy-preserving BA systems. Almost the same FRR and FAR are shown after updating C ( · ) by new noisy projected profiles.
Table 3. The performance of C ( · ) is evaluated by plain, projected, and noisy projected profiles. The minimal variation in FRR and FAR after RP and after RP+DP confirms the correctness and security properties of all three privacy-preserving BA systems. Almost the same FRR and FAR are shown after updating C ( · ) by new noisy projected profiles.
Profile Type Metric Voice Data Swipe Data Drawing Data Comments
Model Training
Plain Profile FAR 0.94 0.90 0.69 Obtained results consistent with those reported in the original paper.
FRR 1.90 3.07 1.07
RP Profile FAR 0.45 0.23 0.33 Slightly improved performance due to the use of distinct R per profile.
FRR 0.44 1.07 1.03
RP+DP Profile
 (Laplace Noise)
FAR 0.13 0.18 0.65 Laplace noise slightly reduces FAR but marginally increases overall FRR.
FRR 2.86 2.31 2.12
RP+DP Profile
 (Gaussian Noise)
FAR 0.06 0.54 0.21 Gaussian noise produced better results due to small relaxation of privacy guarantee.
FRR 2.63 1.87 1.63
Model Update
RP+DP Profile
 (Laplace Noise)
FAR 0.18 1.95 0.94 The updated classifier C ( · ) keeps FAR and FRR near the original C ( · ) .
FRR 3.15 2.26 1.85
RP+DP Profile
 (Gaussian Noise)
FAR 1.64 1.70 1.55 In updated C ( · ) Gaussian noise still performs slightly better than Laplace noise.
FRR 2.42 1.97 2.68
Table 4. KS two-sample test p-values for voice-data k-NN divergence distributions under Laplace (L) and Gaussian (D) DP noise. A p-value 0.05 indicates the two distributions are statistically different; p-value > 0.05 indicates no statistically significant difference (unlinkability).
Table 4. KS two-sample test p-values for voice-data k-NN divergence distributions under Laplace (L) and Gaussian (D) DP noise. A p-value 0.05 indicates the two distributions are statistically different; p-value > 0.05 indicates no statistically significant difference (unlinkability).
Comparison Laplace p-value Gaussian p-value
Valid vs. Invalid 5.50 × 10 51 5.50 × 10 51
Valid vs. Inter-profile 5.50 × 10 51 5.50 × 10 51
Invalid vs. Inter-profile 0.0693 0.0693
Table 5. Comparison of RP-based cancelable authentication systems.
Table 5. Comparison of RP-based cancelable authentication systems.
Reference Method Data Type FAR & FRR Privacy Property Attack Resilience
[31] RP Biometrics 18.19% Ensure 2 out of 3 Correlation, Cross match, Known R
[32] RP Biometrics Below 4.0% Ensure 2 out of 3 Limited attack analysis
[33] LBP + RP Biometrics 7.81% Ensure 2 out of 3 Limited attack analysis
[23] RP Behavioral, Biometrics Below 6.0% Ensure 2 out of 3 Minimum-norm solution based
[34] DRPE + FFT Biometrics 0.46% Ensure 1 out of 3 Brute-force, Correlation, Known key
[35] RP + BPNN Biometrics Below 1.0% Ensure all 3 Brute-force, Cross-match, Known R
RUIP-BA RP + DP Behavioral Below 4.0% Ensure all 3 ML-driven, Cross-match, Known and unknown parameters, Formal proofs
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated