More destructively to privacy, once you know the source, you end up learning a tremendous amount of information about the source. Medical publications, for example, routinely publish information about patients, but attempt to "anonomise" this information by stating things like "patient X, a 24-year-old Caucasian from Mobile, Alabama presented to the Emergency Department with a history of..."
Given enough material with which to cross reference the data, you can probably figure out who the patient was, and from this, his entire medical history. Massive databases make this job easier and therefore threaten privacy.
This problem is illustrated in a recent patent application filed by Amazon.com. On 10 August, Amazon filed a patent for something called a "gift cluster" that would look at a person's past purchasing history (including things they looked at but didn't buy), what they ordered, what they had delivered and where (zip code and other demographics), together with other information they either collect or purchase from others (linking other massive databases) to profile their users.
OK, well, they already do this. Indeed, according to their Terms of Service, they use this information to share with their partners and affiliates, and to suggest items for me to buy next time I log in. That's why Amazon keeps trying to sell me these silly lawyer books by Scott Turow.
The patent application also says that Amazon will essentially be able to profile me in terms of my religion, income, purchasing habits, and even sexual orientation.
I am reminded of an episode of The Mind of The Married Man, a now defunct HBO series, in which the lead character’s Tivo recorder decided that he was gay, and suggested programming like Liza Minnelli specials for him to watch. Even this fact in the Amazon patent application is not so startling, and one assumes that they are already doing this.
What the Amazon patent allows Amazon to do is to suggest gifts for me based on the information they have collected and the ubiquitous "other information". It is not clear in the application whether this will be done with or without my knowledge or consent. Thus, the HAL 9000 computer at Jeff Bezos’ office may decide that I like golf, and suggest that my cousin buy me new clubs for my birthday (please don't.) Or that I need a new Treo 750 (please do).
It is possible that, under this patent, a person could ask Amazon to suggest a gift for me, and in that way, essential "mine" the database, and learn my likes, dislikes, preferences, etc. This is one of the ways that anonymous databases (in this case, an anonymous database about a known person – me) can become exposed, but there are many more ways to compromise the database. Cross referenced databases, "phishing" type attacks, and social engineering are all ways to corrupt the privacy of the database.
This is all well and good if the only thing that happens as a result is that I get offered more appropriate advertising (guess I don't need that Rochester Big and Tall ad after all), or that the gifts I get are more to my liking (hmmm.. that tie from Indonesia vs. a new video iPod?) It's also fine if I can control the content and use of the information.
The problem is the information can also be used substantially to my detriment. I can be denied employment or insurance because of a perception in my preferences. "Don't ask, don't tell" doesn't seem to apply to my purchasing habits. For example, many medical professionals may refuse to take on patients who either have a history of filing malpractice actions, or who have, based on their profile, a perceived propensity toward litigation. Finally, you never know the offers you didn't get because your profile dictates that someone thinks you didn’t want it.
While laws such as the EU Data Privacy Directives and their equivalents in Asia generally give the data subject the right to access and correct personal information collected, this right may not extend to aggregated information – which ultimately is nothing more than lots of personal information.
The laws need to be tightened. We need to redefine personal information as any information from which the identity of a person can reasonably (and sometimes unreasonably) be determined (this is actually the general standard for laws like the HIPAA, but is generally not well enforced).
More importantly, we need to have some guidelines on what general information can be collected, collated, analysed, and processed, both by governments and the private sector. Until then, it's generally a free for all. And oh, my birthday is coming up...
This article originally appeared in Security Focus.
Copyright © 2006, SecurityFocus