-
oa Preserving Privacy from Unsafe Data Correlation
- Publisher: Hamad bin Khalifa University Press (HBKU Press)
- Source: Qatar Foundation Annual Research Forum Proceedings, Qatar Foundation Annual Research Forum Volume 2011 Issue 1, Nov 2011, Volume 2011, CSP9
Abstract
With the emergence of cloud computing, providing safe data outsourcing has become an active topic. Several regulations have been issued to foresee that individual and corporate information would be kept private in a cloud computing environment. To guarantee that these regulations are fully maintained, the research community proposed new privacy constraints such as k-anonymity, l-diversity and t-closeness. These constraints are based on generalization which, transforms identifying attribute values into a general form and partitions to eliminate possible linking attacks. Despite their efficiency, generalization techniques affect severely the quality of outsourced data and their correlation. To cope with such defects, Anatomy has been proposed. Anatomy releases quasi-identifiers values and sensitive values into separate tables which, essentially preserve privacy and at the same time capture large amount of data correlation. However, there are situations where data correlation could lead to an unintended leak of information. In this example, if an adversary knows that patient Roan (P1) takes a regular drug, the join of Prescription (QIT) and Prescription (SNT) on the attribute GID leads to the association of RetinoicAcid to patient P1 due to correlation.
In this paper, we present a study to counter privacy violation due to data correlation and at the same time improve aggregate analysis. We show that privacy requirements affect table decomposition based on what we call correlation dependencies. We propose a safe grouping principle to ensure that correlated values are grouped together in unique partitions that obey to l-diversity and at the same time preserve the correlation. An optimization strategy is designed as well to reduce the number of anonymized tuples. Finally, we extended the UTD Anonymization Toolbox to implement the proposed algorithm and demonstrate its efficiency.