This article is more than 1 year old

Has somebody shared your 'anonymised' health data? Bad news

Harvard boffins unmask 100% of 'encrypted' S Korean records

Researchers from Harvard University have published a paper claiming a 100 per cent success rate in de-anonymising patients from their supposedly anonymised healthcare data in South Korea.

The study, which bears the ronseal title of "De-anonymizing South Korean Resident Registration Numbers Shared in Prescription Data", was published this week in Technology Science.

Two de-anonymisation experiments were conducted in the study on prescription data from deceased South Koreans, with encrypted national identifiers - Resident Registration Numbers (RNN) - included.

The researchers found significant vulnerabilities in the anonymisation process which is applied to identifiers contained within prescription data, data which is often sold to multinational health companies.

The RNNs, similar to Blighty's National Insurance numbers, are unique 13-digit codes which represent demographic information.

Finding that "weakly encrypted RRNs" may be vulnerable to de-anonymisation, both experiments were 100 per cent successful, and revealed all 23,163 of the unencrypted RNNs.

Each experiment was conducted independently of the other, and provided each other with complementary validations as the boffins were able to match the same RRNs to the same patients in both experiments. Both are detailed in the journal article's methodology section.

Scandalous slurpage

The Harvard experiment used decedent information, however the experiments demonstrated "how others could associate an actual RRN for a living patient with his sensitive medical information."

The researchers noted a civil suit involving multinational IMS Health, regarding its access to that sensitive information. IMS Health claims to possess over 10PBs of "unique healthcare data", and is one of the largest vendors of physician prescribing data worldwide.

Confirmation of the suit is provided through an internal IMS Health document (PDF, pg. 14) which alleges an "affiliate collected plaintiffs' personal information without the necessary consent in violation of applicable privacy laws and transferred such information to IMS Korea for sale to customers."

If IMS Health did receive these kinds of data, then our study exposes the real-world realities of imperfect anonymization. Further research is necessary to propose alternatives to lawsuits.

Nearly the entirety of South Korea's population was victim to RNN breaches back in 2011, when an attack on the Cyworld social network managed to expose the information of 35 million users.

UK omnishambles

Medical information in the UK is not held on the NHS' Personal Demographics Serivce (PDS) by National Insurance number, but by a unique 10-digit NHS Number.

Responding to The Register regarding this security of this number's generation algorithm, the University of Cambridge's Ross Anderson said that the security of its generation was not really the point.

"The point isn't the generation algorithm, but the fact that about 800,000 people who work for the NHS and other organisations need to be able to tie people to NHS numbers in order to do their jobs," Anderson said.

The PDS is stated to "not hold any clinical or sensitive data items such as ethnicity or religion," however it does hold information such a name, birth data, address, gender, and telecommunication details.

Anderson has written extensively the security of information stored by Britain's NHS for 20 years now, and considers it to be "awful".

For example, the Department of Health refuses to do any incident monitoring centrally as ministers just don't want to know. If they were serious about privacy they'd have data centrally on every single incident.

[W]hen we tried to get this under freedom of information, we were referred to the thousands of individual NHS organisations. It's simply not credible for them to pretend they're managing security and privacy when they just don't want to know what's going on.

Anderson added:

The NHS is the main source of public-sector information leaks most years (the HMRC incident was an exception). Then there was the care.data scandal last year, which was also covered in one of the papers in [Latanya Sweeney's Technology Science].

Talking to El Reg, Dr Neil Bhatia, a Hampshire GP, explained that accessing the PDS couldn't possibly be audited or controlled. "It relies on trust," he said. There are too many users for it not to rely on trust as it's set up.

Although the potential research applications for granular medical data is enormous, it may "not be just medical researchers who would receive such access," said Bhatia, "but also those with commercial or political interests."

Bhatia suggested that particularly limited forms of aggregated data could be a decent compromise, protecting "extremely vulnerable individual records", however but insufficiently secured aggregated data sets would not be fit for purposes.

Asked whether there may be achievable balance between the provision, including sale, of healthcare data and patients' privacy, Anderson told El Reg that: "There could be, but given current governance arrangements it's extremely unlikely to happen."

The professor of security engineering added: "[M]inisters are still totally intimidated by the drugs and medical devices industries, who see access to our health data as a nifty marketing tool." ®

More about

TIP US OFF

Send us news


Other stories you might like