Google on Wednesday released source code for a project called Private Join and Compute that allows two parties to analyze and compare shared sets of data without revealing the contents of each set to the other party.
This is useful if you want to see how your private encrypted data set of, say, ad-clicks-to-sales conversion rates, correlates to someone else's encrypted conversion rate data set without disclosing the actual numbers to either side.
This particular technique is a type of secure multiparty computation that builds upon a cryptographic protocol called Private Set-Intersection (PSI). Google employs this approach in a Chrome extension called Password Checkup that lets users test logins and passwords against a dataset of compromised credentials without revealing the query to the internet goliath.
Private Join and Compute, also known as Private Intersection-Sum (PIS), takes PSI further by hiding the data that represents the intersection of the two data sets and revealing only the results of calculations based on the data.
The technique is described in a research paper, "On Deploying Secure Computing Commercially: Private Intersection-Sum Protocols and their Business Applications," penned by nine Google researchers: Mihaela Ion, Ben Kreuter, Ahmet Erhan Nergiz, Sarvar Patel, Mariana Raykova, Shobhit Saxena, Karn Seth, David Shanahan, and Moti Yung.
The paper describes how PIS can be computed using three cryptographic protocols: Random Oblivious Transfer, encrypted Bloom filters, and Pohlig–Hellman double masking.
"Private Intersection-Sum is not an arbitrary question, but rather arose naturally and was concretely defined based on a given central business need: computing aggregate conversion rate (or effectiveness) of advertising campaigns," Google's researchers explain in their paper. "This problem has both great practical value and important privacy considerations, and represents a type of analysis that occurs surprisingly commonly."
As an example, Google researchers describe a scenario in which a city wants to know whether the cost of operating weekend train service is offset by increased revenues at local businesses. The city's rider data set and the point-of-sale data set from merchants can be processed using Private Join and Compute in a way that allows the city to determine the total number of train riders who made a purchase at a local store without revealing any identifying information.
SEAL up your data just like Microsoft: Redmond open-sources 'simple' homomorphic encryption blueprintsREAD MORE
Google's researchers argue that reconciling organizations' hunger for data mining with rising interest in privacy requires security computing protocols. "Indeed, the consideration given to privacy by users and governments around the world is growing rapidly," they observe.
In an email to The Register, Mike Rosulek, assistant professor of computer science at Oregon State University in the US, explained that PSI can replace the status quo, whereby Google and another company draft a legal agreement promising to share data to understand ad campaign effectiveness, generate aggregate data, and then to dispose of each other's source data sets under contractual duress.
These PSI techniques let companies do this without the legal ritual. "With PSI there is no way to violate the 'agreement' because the cryptography literally prevents you from learning more than you are allowed," he said.
For those appearing in one of these data sets – an individual who saw a Google ad or bought an advertised product – PSI-sum computation offers a similar privacy proposition as the contract scenario, said Rosulek.
"Imagine a ghost appears to Sergey Brin in a dream and says 'people who saw this advertisement collectively spent $824,852 at Company X!'" he said. "If you feel like this ghastly vision is not a significant violation of your personal privacy, then you should be comfortable with PSI-sum, since it releases exactly the same information about you into the world."
Rosulek suggests the greatest benefit of this technology accrues to companies that would have otherwise foregone analytics altogether for fear of privacy problems.
While Google developed its technology as a privacy preserving way to attribute aggregate ad conversions, the web giant says it hopes PIS can advance research into public policy, diversity and inclusion, healthcare and vehicle safety by making secure computing more widely accessible.
At the moment, however, the code is not quite secure enough. The PIS security model envisions "honest-but-curious adversaries" and as the GitHub repo notes, "If a participant deviates from the protocol, it is possible they could learn more than the prescribed information." What's more, the protocol doesn't ensure that parties using it employ legitimate inputs or prevent arbitrary inputs. And there may be PIS leakage.
"For example, if an identifier has a very unique associated integer values, then it may be easy to detect if that identifier was in the intersection simply by looking at the intersection-sum," the GitHub repo cautions.
The code isn't officially supported by Google and comes with no guarantees. ®
Speaking of encryption... MongoDB Server 4.2 RC, unveiled at MongoDB World 2019 this week, includes a feature called client-side field level encryption. This allows clients to "selectively encrypt individual document fields, each optionally secured with its own key and decrypted seamlessly on the client," according to the software's maker.
This ensures data is encrypted by a client before it is sent to the database to store, and decrypted by the client when it is fetched, providing end-to-end encryption. Whoever is hosting the MongoDB database cannot decipher the data, therefore, because only the client, ideally, has the necessary keys.