Developers of encrypted databases and security researchers are at loggerheads – and it's over a study that claims property-preserving encrypted databases may be vulnerable to attack.
The researchers – Muhammad Naveed of the University of Illinois at Urbana-Champaign, Charles Wright of Portland State University, and Seny Kamara of Microsoft Research – reckon inference attacks on encrypted database (EDB) systems like CryptDB, Cipherbase, and Encrypted BigQuery are possible using only encrypted column and publicly available auxiliary information. Developers contend that the research is invalid because the cracked systems don't correspond to recommended deployment scenarios.
EDB systems, many of which are based on the design of CryptDB, make use of property-preserving encryption (PPE) schemes such as deterministic (DTE) and order-preserving encryption (OPE). The technology – which is still in its infancy – is seen as a way to minimize the impact of data breaches and hack attacks.
Naveed told El Reg: "As far as we know, these PPE-based encrypted databases are not deployed yet, but there is considerable interest in their potential use – mostly fuelled by the recent rashes of high-profile data breaches. In fact, even large companies are evaluating these systems (e.g., Google's Encrypted Bigquery, SAP's SEED, Microsoft's Cipherbase, and Microsoft's SQL Server 2016)."
"The potential applications would include electronic medical records, human resources databases, university databases; basically any application using a database for sensitive and private information," he added.
In a paper, Inference Attacks on Property-Preserving Encrypted Databases [PDF], Naveed and his colleagues point to potential security flaws in the technology that might allow hackers to infer metadata, and perhaps more, about entries in encrypted databases.
In this paper, we study the concrete security provided by such systems. We present a series of attacks that recover the plaintext from DTE- and OPE-encrypted database columns using only the encrypted column and publicly-available auxiliary information. We consider well-known attacks, including frequency analysis and sorting, as well as new attacks based on combinatorial optimization.
We evaluate these attacks empirically in an electronic medical records (EMR) scenario using real patient data from 200 US hospitals. When the encrypted database is operating in a steady state where enough encryption layers have been peeled to permit the application to run its queries, our experimental results show that an alarming amount of sensitive information can be recovered. In particular, our attacks correctly recovered certain OPE-encrypted attributes (e.g., age and disease severity) for more than 80 per cent of the patient records from 95 per cent of the hospitals; and certain DTE-encrypted attributes (e.g., sex, race, and mortality risk) for more than 60 per cent of the patient records from more than 60 per cent of the hospitals.
In an associated blog post, Microsoft described the research as an advance in the "database security 'arms race'" – high praise from Redmond's official research blog.
Raluca Ada Popa, one of the original developers of CryptDB, told El Reg that the researchers' findings are invalid because the authors have failed to use CryptDB systems as intended.
Popa explained: "The authors of the paper have used the CryptDB system in an unsafe way. The CryptDB system provides guidelines for safe usage: it says that if a database administrator wants to protect a data field (same as column), it must mark the field as 'sensitive.' Then, the CryptDB system will encrypt the field with strong encryption schemes, which do not allow any inference attacks, including the specific attack of Naveed et al."
She said the research was akin to claiming an attack succeeded on the firewall without it being set up by an admin who knows what connections should be blocked. Popa pointed interested parties towards sections of her thesis on CryptDB that cover the "sensitive annotation" issue (see pages 45 and 53 of this PDF).
CryptDB recommends that admins use OPE only for fields that are less sensitive, and provides timestamps as an example. Timestamps are commonly used in apps – they also do not repeat, and are from a sparse domain, so the attacks of Naveed et al don't apply, according to Popa.
"I think their research is useful in furthering the understanding about leakage of OPE and DET when an attacker has side information. However, their conclusions do not apply to CryptDB when used correctly," Popa concludes.
CryptDB offers a means to create secure cloud-based database applications, so research into its security is important. Naveed and his colleagues have not backed down in the face of Popa's rebuttal, continuing to argue that CryptDB is insufficiently secure for the storage of electronic medical records and other similarly sensitive data.
Kamara argues that he and his colleagues used the systems in the way they would have to be used in practice by engineers to run real-world applications. That's because marking everything as "sensitive" would defeat the purpose of the technology, he explains in a blog post responding to Popa's criticism.
For example, PPE-based EDB systems are typically claimed to be secure if a database administrator labels all "sensitive" fields (for some undefined notion of sensitivity) so that they are encrypted with standard encryption schemes. But of course, this also means that these fields then cannot be queried at all – ever. So this leaves us with an EDB system that only works over non-sensitive data. If it's non-sensitive, one could ask how much value we are getting from encrypting it at all. Is this really the point of an encrypted DB system? To do SQL over encrypted non-sensitive data? Is this really consistent with how these systems are motivated and understood?
The point is that to claim that these systems are secure, you effectively have to cripple them until you have to use regular plain encryption – at least for the kinds of data you actually care to protect.
Kamara reckons that making encrypted database systems secure to the sort of attacks he and his colleagues have developed would cripple their utility. He said the trio had set out to answer the question: "What security do we get when we run an EMR application on top of these [encrypted database] systems?"
Popa said that strong encryption schemes such as CryptDB still allow a wealth of queries, contrary to claims by the researchers to the contrary.
"One can continue doing a wide range of queries on the strongly encrypted sensitive fields," Popa explained. "This would not cripple the system."
"In fact, 70 per cent to 90 per cent of the data fields in common applications can be supported only with these strong encryption schemes," she added.
The Inference Attacks on Property-Preserving Encrypted Databases paper (PDF) is due to be presented in October at the ACM Conference on Computer and Communications Security. ®