HTTPS may be good at securing financial transactions, but it isn't much use as a privacy tool: US researchers have found that a traffic analysis of ten HTTPS-secured Web sites yielded “personal data such as medical conditions, legal or financial affairs or sexual orientation”.
In I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis, (Arxiv, here), UC Berkeley researchers Brad Miller, AD Joseph and JD Tygar and Intel Labs' Ling Huang show that even encrypted Web traffic can leave enough breadcrumbs on the trail to be retraced.
Sites tested in the study included healthcare services, banking and finance, legal services, as well as Netflix and YouTube. Their “traffic analysis attack” covered 6,000 individual pages on the ten Websites, and got close to 90 per cent accuracy in associating users with the pages they viewed.
It's not the first time that such work has been conducted, but the paper's authors say they've obtained the highest-quality reconstruction of Internet users' browsing of sites secured by HTTPS. The researchers were able to work out which pages users were viewing with 89 per cent accuracy.
The researchers call their analysis a “Bag of Gaussians” (BoG) “due to similarity with the Bag-of-Words approach to document classification”:
“Our attack applies clustering techniques to identify patterns in traffic. We then use a Gaussian distribution to determine similarity to each cluster and map traffic samples into a fixed width representation compatible with a wide range of machine learning techniques,” they write.
The attack isn't trivial: as the authors note, the attacker has to be able to visit the same Web pages as the target, and has to be able to capture the victim's traffic. That way, the attacker can identify patterns in the encrypted traffic that can be matched against the pages the attacker and victim both visited.
However, they note, ISPs, employers – and by extension, spies and censors – have exactly this view of user traffic.
The analysis attack can be mitigated, the paper says, with a padding technique they refer to as “Burst padding” that would change the signature of encrypted pages on separate visits.
As well as Netflix and YouTube, the researchers used traffic analysis covering the Mayo Clinic, Planned Parenthood, Kaiser Permanente, Wells Fargo, Bank of America, Vanguard, the ACLU and Legal Zoom. ®