A group of researchers partly supported by SAP has taken a look at one of the big problems with so-called “anonymised” data: the way spatial correlations in mobile data can be used to re-identify individuals in large data sets.
Location data is the big problem, the Singapore-led group says: even if the resolution of a phone's GPS records is reduced in a stored dataset, following a user's track (trajectory in the paper) for long enough will easily identify that user.
“Removing identifiers from location information, or reducing the granularity of the location or time, does not prevent disclosure of personally identifiable information,” the paper states. “Individuals are highly re-identifiable with only a few spatio-temporal points”.
Just how revealing location trajectories are is revealed in their analysis of 56 million records: “with two random points, more than 60 per cent of the trajectories are unique”, they write.
The researchers say anonymisation of mobile datasets is improved if the “trajectories” – literally, the “where the user has been” location datasets – are reduced. This way, they write, anonymity can be better protected, without trashing the utility of the dataset.
The researchers, Yi Song of the National University of Singapore (working under an SAP internship), Daniel Dahlmeier of SAP and Stephane Bressan of the National University of Singapore, note that the longer a user's location data can be strung into a trajectory, the easier it is to identify that user. In other words: a couple of location data points is nowhere near as useful as 24-hours' worth of the user's movements.
Splitting a user's trajectory can help preserve anonymity
The attraction of their approach, they hope, is that only one parameter needs to be adjusted to give users better anonymity: the time window that trajectories are cut into. That's a simple enough operation that they believe it will be scalable to very large data sets. ®