Submitted:
10 June 2025
Posted:
12 June 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
- It utilizes a combined approach using HMC and LDP to address vulnerabilities in mobility data anonymization, which has not been explored together before.
- It integrates LDP through LDPTrace to enhance HMC’s resistance against Aggregate Privacy (AP-Attacks), a novel improvement over using HMC alone.
- The approach emphasizes preserving both individual privacy and utility of aggregate mobility data, achieving a balance not previously demonstrated in comparable methods.
2. Heat Map Confusion
- Heat Map Creation: The first step in the HMC process is to turn a user’s mobility trace . A mobility Trace of length is a sequence of latitude, longitude and timestamp: into a heat map . The heat map creation step involves dividing a specific geographical area into a grid of squares, where each cell represents a location. The intensity of each cell shows how often a specific user has visited that spot (see Figure 1). This approach makes raw GPS data more general, creating a wider view of movement patterns and hiding precise location details.
- Heat Map Alteration: To anonymize the data, HMC changes each user’s heat map by comparing and merging it with the heat map of a similar user from a reference set . This step ensures that the final heat map does not show unique patterns of the original user. To calculate the similarity of two different heat maps, the Topsøe Divergence is used.
- 3.
- Mobility Trace Reconstruction: After the heat map is modified into the obfuscated heat map , the next step is to reconstruct an anonymized mobility trace that corresponds to . The process is described in detail in [2].
3. Local Differential Privacy
- Feature Extraction: LDPTrace extracts three main features from individual users trajectories and obfuscates them with a privacy budget :
- Intra-trajectory Transitions: The movement between consecutive cells in a grid of a trajectory, capturing local movement behavior.
- Start and End Points: Virtual markers indicating where trajectories begin and terminate, which help in preserving trajectory structures.
- Trajectory Length: With the obfuscated trajectory length of each trajectory a probability distribution of trajectory lengths is determined by a central data curator.
- 2.
- Frequency Estimation: To estimate frequencies for trajectory synthesis, Optimized Unary Encoding (OUE) is used for each of the features extracted. OUE represents each feature as a binary vector of length equaling the maximal value of the feature, where at the index which equals the feature value and zero otherwise. OUE adds noise to binary data to protect privacy before combining the data [3]. As depicted in Equation 6, the probability of a perturbed vector at index i being 1 is defined as:
- 3.
- Adaptive Synthesis Process: The framework builds a probabilistic model using these extracted features, allowing it to generate synthetic trajectories that mimic real movement patterns [12]. The synthesis process is adaptive, meaning it selects transitions and trajectory lengths based on learned distributions without needing exact user data, thus enhancing privacy.
- Enhanced Privacy Protection: By employing local differential privacy, LDPTrace reduces the risk associated with data aggregation and central storage, ensuring that users' real movement data is not exposed [4].
- Improved Utility: The framework maintains high data utility, as synthetic trajectories generated by LDPTrace closely match real-world movement patterns [4]. This is particularly beneficial for analyses that depend on aggregated mobility trends rather than specific individual behaviors.
- Low Computational Cost: Unlike older methods that rely on intensive computations, such as linear programming or external data integration, LDPTrace simplifies the synthesis process, making it feasible for use on devices with limited resources [4].
- Resistance to Attacks: The method is designed to withstand common location-based attacks, such as reidentification and outlier analysis. By generating trajectories that do not closely mirror any specific user’s data, LDPTrace effectively minimizes vulnerabilities [4].
4. Combination of HMC and LDPTrace
- Synthetic Trajectory Generation with LDPTrace: First, LDPTrace creates synthetic mobility traces based on real mobility data from the San Francisco taxi dataset. These synthetic trajectories simulate movement patterns without directly revealing actual user locations. By introducing randomness in location data, LDPTrace makes it harder to link specific movements to real individuals while maintaining patterns that resemble real-world data.
- Heat Map Confusion on Synthetic Data: Once the synthetic trajectories are generated, HMC further anonymizes the data. HMC creates heat maps from these synthetic traces and then alters them by merging similar patterns. This step reduces the likelihood of reidentification based on movement patterns, as individual traces are grouped into less specific patterns. By transforming synthetic data into altered heat maps, this combined approach provides additional protection for individual locations and routes.
5. Aggregate Privacy Attack
6. Experimental Evaluation
6.1. Dataset
6.2. Evaluation with Aggregate Privacy Attack
- AP-Attack on unaltered Heat Maps: The AP-Attack is first applied to the original heat maps, which have not been anonymized with HMC. This provides a baseline reidentification rate, showing how easily a taxi can be reidentified without any further anonymization.
- AP-Attack on obfuscated Heat Maps: The AP-Attack is applied to the heat maps after the HMC anonymization process. Comparing the reidentification rates from the original and altered maps shows whether the HMC process is effective. A lower reidentification rate indicates successful anonymization.
- (Non-)Reidentified Total: Refers to the total number of (un-)successful reidentifications.
- (Non-)Reidentified (%): Refers to the percentage of (un-)successful reidentifications.
- Seed: NumPy seed set with np.random. It influences how the data sets are partitioned into train and test data sets.
- Average Distance: The average minimum distance determined by calculating Topsøe Divergence, see Equation 1, over all 536 data sets.
7. Conclusions
- HMC alone is effective at lowering Reidentified (%) rates, with average distances from 0.05 to 0.10, as shown in Table 6. This indicates that HMC can offer privacy protection while keeping data reasonably useful.
- HMC combined with LDPTrace provides stronger privacy, as shown by the lower Reidentified (%) rates between 0% and 0.37% in Table 7, and larger average distances up to 0.20. This demonstrates that the additional application of HMC further strengthens anonymization.
- Application-specific studies: Analyzing how different data modification levels impact specific applications or use cases, i.e. environmental pollution or traffic simulations
- Including other utility metrics: Creating or utilizing other metrics beyond average distance, such as checking for time-based consistency and the accuracy of points of interest (POI), can offer a clearer view of data usability.
- Experiments with other attacks: To strengthen the robustness of privacy protection methods and ensure comprehensive evaluation, future work should focus on testing additional attack scenarios and analyzing their impact on reidentification rates and data utility. This could include exploring attacks such as outlier detection, POI (Point of Interest)-based attacks, and PIT (Point-in-Time)-based attacks, see [2].
Author Contributions
Funding
Appendix A
| Metrics | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Trial 6 | Trial 7 | Trial 8 | Trial 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| #Taxi Id | 536 | 536 | 536 | 536 | 536 | 536 | 536 | 536 | 536 | |
| Reidentified Total | 152 | 82 | 141 | 45 | 140 | 83 | 90 | 96 | 97 | |
| Non-Reidentified Total | 384 | 454 | 395 | 491 | 396 | 453 | 446 | 440 | 439 | |
| Reidentified (%) | 28.36 | 15.30 | 26.31 | 8.40 | 26.12 | 15.49 | 16.79 | 17.91 | 18.10 | |
| Non-Reidentified (%) | 71.64 | 84.70 | 73.69 | 91.60 | 73.88 | 84.51 | 83.21 | 82.09 | 81.90 | |
| Seed | 845 | 286 | 742 | 301 | 87 | 123 | 581 | 445 | 4 | |
| ⌀ Topsøe Divergence | 0.02 | 0.03 | 0.03 | 0.07 | 0.03 | 0.03 | 0.04 | 0.03 | 0.03 |
| Metrics | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Trial 6 | Trial 7 | Trial 8 | Trial 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| #Taxi Id | 536 | 536 | 536 | 536 | 536 | 536 | 536 | 536 | 536 | |
| Reidentified Total | 11 | 12 | 10 | 3 | 9 | 11 | 7 | 12 | 5 | |
| Non-Reidentified Total | 525 | 524 | 526 | 533 | 527 | 525 | 529 | 524 | 531 | |
| Reidentified (%) | 2.50 | 2.24 | 1.87 | 0.56 | 1.68 | 2.50 | 1.31 | 2.24 | 0.93 | |
| Non-Reidentified (%) | 97.50 | 97.76 | 98.13 | 99.44 | 98.32 | 97.50 | 98.69 | 97.76 | 99.07 | |
| Seed | 845 | 286 | 742 | 301 | 87 | 123 | 581 | 445 | 4 | |
| ⌀ Topsøe Divergence | 0.08 | 0.05 | 0.10 | 0,09 | 0,08 | 0.06 | 0,07 | 0,05 | 0.05 |
| Metrics | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Trial 6 | Trial 7 | Trial 8 | Trial 9 | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #Taxi Id | 536 | 536 | 536 | 536 | 536 | 536 | 536 | 536 | 536 | |||||
|
Reidentified Total |
1 | 1 | 1 | 1 | 2 | 1 | 0 | 1 | 1 | 1 | ||||
| 1.5 | 2 | 2 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | |||||
| 22 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | |||||
| Non-Reidentified Total | 1 | 535 | 535 | 535 | 534 | 535 | 536 | 535 | 535 | 535 | ||||
| 1.5 | 534 | 534 | 535 | 535 | 534 | 534 | 534 | 534 | 534 | |||||
| 2 | 535 | 536 | 536 | 535 | 535 | 535 | 535 | 535 | 535 | |||||
| Reidentified (%) | 1 | 0.19 | 0.19 | 0.19 | 0.37 | 0.19 | 0 | 0.19 | 0.19 | 0.19 | ||||
| 1.5 | 0.37 | 0.37 | 0.19 | 0.19 | 0.37 | 0.37 | 0.37 | 0.37 | 0.37 | |||||
| 2 | 0.19 | 0 | 0 | 0.19 | 0.19 | 0.19 | 0.19 | 0.19 | 0.19 | |||||
| Non-Reidentified (%) | 1 | 99.81 | 99.81 | 99.81 | 99.63 | 99.81 | 100 | 99.81 | 99.81 | 99.81 | ||||
| 1.5 | 99.63 | 99.63 | 99.81 | 99.81 | 99.63 | 99.63 | 99.63 | 99.63 | 99.63 | |||||
| 2 | 99.81 | 100 | 100 | 99.81 | 99.81 | 99.81 | 99.81 | 99.81 | 99.81 | |||||
| Seed | 845 | 286 | 742 | 301 | 87 | 123 | 581 | 445 | 4 | |||||
| ⌀ Topsoe Divergence | 1 | 0.15 | 0.15 | 0.16 | 0.2 | 0.16 | 0.15 | 0.17 | 0.16 | 0.16 | ||||
| 1.5 | 0.13 | 0.14 | 0.15 | 0.18 | 0.14 | 0.13 | 0.15 | 0.14 | 0.14 | |||||
| 2 | 0.13 | 0.13 | 0.14 | 0.18 | 0.13 | 0.13 | 0.14 | 0.13 | 0.14 | |||||
| Metrics | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Trial 6 | Trial 7 | Trial 8 | Trial 9 | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #Taxi Id | 536 | 536 | 536 | 536 | 536 | 536 | 536 | 536 | 536 | |||||
|
Reidentified Total |
1 | 2 | 2 | 1 | 2 | 1 | 2 | 1 | 3 | 1 | ||||
| 1.5 | 2 | 3 | 2 | 2 | 2 | 3 | 3 | 4 | 2 | |||||
| 22 | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | |||||
| Non-Reidentified Total | 1 | 534 | 534 | 535 | 534 | 535 | 534 | 535 | 533 | 535 | ||||
| 1.5 | 534 | 533 | 534 | 534 | 534 | 533 | 533 | 531 | 534 | |||||
| 2 | 535 | 535 | 534 | 535 | 534 | 534 | 535 | 535 | 535 | |||||
| Reidentified (%) | 1 | 0.37 | 0.37 | 0.19 | 0.37 | 0.19 | 0.37 | 0.19 | 0.56 | 0.19 | ||||
| 1.5 | 0.37 | 0.56 | 0.37 | 0.37 | 0.37 | 0.56 | 0.56 | 0.75 | 0.37 | |||||
| 2 | 0.19 | 0.19 | 0.37 | 0.19 | 0.37 | 0.37 | 0.19 | 0.19 | 0.1 | |||||
| Non-Reidentified (%) | 1 | 99.63 | 99.63 | 99.81 | 99.63 | 99.81 | 99.63 | 99.81 | 99.44 | 99.81 | ||||
| 1.5 | 99.63 | 99.44 | 99.63 | 99.63 | 99.63 | 99.44 | 99.44 | 99.25 | 99.63 | |||||
| 2 | 99.81 | 99.81 | 99.63 | 99.81 | 99.63 | 99.63 | 99.81 | 99.81 | 99.81 | |||||
| Seed | 845 | 286 | 742 | 301 | 87 | 123 | 581 | 445 | 4 | |||||
| ⌀ Topsoe Divergence | 1 | 0.15 | 0.15 | 0.16 | 0.2 | 0.16 | 0.15 | 0.17 | 0.15 | 0.16 | ||||
| 1.5 | 0.13 | 0.13 | 0.14 | 0.18 | 0.14 | 0.13 | 0.15 | 0.14 | 0.14 | |||||
| 2 | 0.12 | 0.13 | 0.14 | 0.18 | 0.13 | 0.13 | 0.14 | 0.13 | 0.14 | |||||
References
- Khalfoun, B.; Maouche, M.; Ben Mokhtar, S.; Bouchenak, S. MooD: MObility Data Privacy as Orphan Disease – Experimentation and Deployment Paper. Proc. ACM/IFIP/USENIX Int. Middleware Conf., 2019, Davis, CA, USA.
- Maouche, M.; Ben Mokhtar, S.; Bouchenak, S. HMC: Robust Privacy Protection of Mobility Data against Multiple Re-Identification Attacks. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2018, 2, 1–25. [Google Scholar] [CrossRef]
- Gatzert, N.; Knorre, S.; Müller-Peters, H.; Wagner, F.; Jost, T. Big Data in der Mobilität: Akteure, Geschäftsmodelle und Nutzenpotenziale für die Welt von morgen. Springer Gabler, Wiesbaden, Deutschland, 2023.
- Du, Y.; Hu, Y.; Zhang, Z.; Fang, Z.; Chen, L.; Zheng, B.; Gao, Y. LDPTrace: Locally Differentially Private Trajectory Synthesis. Proc. VLDB Endow. 2023, 16. [Google Scholar] [CrossRef]
- Xu, F.; Tu, Z.; Li, Y.; Zhang, P.; Fu, X.; Jin, D. Trajectory Recovery from Ash: User Privacy is NOT Preserved in Aggregated Mobility Data. Proc. 26th Int. World Wide Web Conf., 2017, Perth, Australia.
- Buchholz, E.; Abuadbba, A.; Wang, S.; Nepal, S.; Kanhere, S.S. SoK: Can Trajectory Generation Combine Privacy and Utility? Proc. Privacy Enhancing Technologies Symposium, 2024, 3, 75–93. [Google Scholar] [CrossRef]
- Primault, V.; Ben Mokhtar, S.; Lauradoux, C.; Brunie, L. Differentially Private Location Privacy in Practice. Transactions on Data Privacy, 2023, 16.
- Maouche, M.; Ben Mokhtar, S.; Bouchenak, S. AP-Attack: A Novel User Re-identification Attack on Mobility Datasets. Proc. 14th EAI Int. Conf. Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous 2017), Melbourne, Australia, Nov 2017.
- Walter, P., Efremidis, A., & Gühring, G. Anonymization of Mobility Data and its Meta Information using Local Differential Privacy in Combination with Bidirectional Graphs. IEEE Transactions on Dependable and Secure Computing, preprint 2025.
- Kapp, A., Nuñez von Voigt, S., Mihaljević, H., Tschorsch, F., Towards mobility reports with user-level privacy. Journal of Location Based Services, 2023, 17(2), 95–121. [CrossRef]
- Topsøe, F. Some inequalities for information divergence and related measures of discrimination. In: IEEE Transactions on Information Theory 46.4 (2000), pp. 1602–1609. [CrossRef]
- Abul, O.; Bonchi, F.; Nanni, M. Anonymization of Moving Objects Databases by Clustering and Perturbation. Information Systems, 2010, 35, 884–910.
- Piorkowski, M., Sarafijanovic‐Djukic, N., & Grossglauser, M. (2022). CRAWDAD epfl/mobility. IEEE Dataport. [CrossRef]



| Metrics | Mean | Standard Deviation | |
|---|---|---|---|
| Reidentified Total | 103 | 32.784 | |
| Non-Reidentified Total | 433 | ||
| Reidentified (%) | 19.20 | 6.116 | |
| Non-Reidentified (%) | 80.80 | ||
| Average Distance (Topsøe Divergence) |
0.034 | 0.014 |
| Metrics | Mean | Standard Deviation |
|---|---|---|
| Reidentified Total | 9 | 3.035 |
| Non-Reidentified Total | 527 | |
| Reidentified (%) | 1.76 | 0.659 |
| Non-Reidentified (%) | 98.24 | |
| Average Distance (Topsøe Divergence) |
0.070 | 0.019 |
| Metrics | Mean | Standard Deviation | |||
|---|---|---|---|---|---|
| Reidentified Total | 1 | 1 | 0.5 | ||
| 1.5 | 1.78 | 0.44 | |||
| 2 | 0.78 | 0.44 | |||
| Non-Reidentified Total | 1 | 535 | 0.5 | ||
| 1.5 | 534.22 | 0.44 | |||
| 2 | 535.22 | 0.44 | |||
| Reidentified (%) | 1 | 0.19 | 0.09 | ||
| 1.5 | 0.33 | 0.08 | |||
| 2 | 0.15 | 0.08 | |||
| Non-Reidentified (%) | 1 | 99.81 | 0.09 | ||
| 1.5 | 99.67 | 0.08 | |||
| 2 | 99.85 | 0.08 | |||
| Average Distance | 1 | 0.162 | 0.016 | ||
| (Topsøe Div.) | 1.5 | 0.144 | 0.015 | ||
| 2 | 0.139 | 0.016 | |||
| Metrics | Mean | Standard Deviation | |||
|---|---|---|---|---|---|
| Reidentified Total | 1 | 1.67 | 0.71 | ||
| 1.5 | 2.56 | 0.73 | |||
| 2 | 1.33 | 0.5 | |||
| Non-Reidentified Total | 1 | 534,33 | 0.71 | ||
| 1.5 | 533.33 | 0.73 | |||
| 2 | 534.67 | 0.5 | |||
| Reidentified (%) | 1 | 0.31 | 0.13 | ||
| 1.5 | 0.48 | 0.14 | |||
| 2 | 0.24 | 0.1 | |||
| Non-Reidentified (%) | 1 | 99.69 | 0.13 | ||
| 1.5 | 99.52 | 0.14 | |||
| 2 | 99.75 | 0.1 | |||
| Average Distance | 1 | 0.161 | 0.016 | ||
| (Topsøe Div.) | 1.5 | 0.142 | 0.016 | ||
| 2 | 0.138 | 0.017 | |||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).