Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: microaggregation; k-anonymity; privacy; data utility
Online: 23 July 2019 (11:42:34 CEST)
With a data revolution underway for some time, there is an increasing demand for formal privacy protection mechanisms that are not so destructive. Hereof microaggregation is a popular high-utility approach designed to satisfy the popular k-anonymity criteria while applying low distortion to data. However, standard performance metrics are commonly based on mean square error, which will hardly capture the utility degradation related to a specific application domain of data. In this work, we evaluate the performance of k-anonymous microaggregation in terms of the loss in classification accuracy of the machine learned models built from perturbed data. Systematic experimentation is carried out on four microaggregation algorithms that are tested over four data sets. The empirical utility of the resulting microaggregated data is assessed using the learning algorithm that obtains the highest accuracy from original data. Validation tests are performed on a test set of non perturbed data. The results confirm k-anonymous microaggregation as a high-utility privacy mechanism in this context and distortion based on mean squared error as a poor predictor of practical utility. Finally, we corroborate the beneficial effects for empirical utility of exploiting the statistical properties of data when constructing privacy preserving algorithms.