Microsoft Edge and Yandex are "much more worrisome" compared to Brave, Chrome, Firefox and Safari, according to a paper on browser privacy (PDF) published this week.
Douglas J Leith, a comp sci professor at Trinity College Dublin, investigated the network activity of six browsers – Google Chrome, Mozilla Firefox, Apple Safari, Brave, Microsoft Edge and Yandex – using a proxy to capture encrypted traffic.
Using a default install, he inspected the data sent when first starting the browser, as well as data transferred when navigating to a web page by pasting a URL into the address bar and by typing the URL – the latter being interesting because it may use a cloud-driven autocomplete feature. All the tests were conducted on a Mac, even for Edge. Finally, he checked on activity when the browsers were left idle for 24 hours.
In the paper, Leith said: "We find that the browsers split into three distinct groups from this privacy perspective. In the first (most private) group lies Brave, in the second Chrome, Firefox and Safari and in the third (least private) group lie Edge and Yandex."
The, er, borne identity
Is Edge really worse than Chrome – the latter being from one of the biggest data collectors in the business? The problem, according to Leith, is to do with identifiers that browsers send to the vendors to enable different searches and sessions to be tied together.
"Edge and Yandex both use hardware identifiers," he said. "That's tied to the physical hardware of the device and can't easily be changed. Whereas Chrome and Firefox use identifiers that are essentially random numbers generated when the browser first starts." The Chrome and Firefox identifiers do persist between sessions, but are reset if you do a fresh install. Leith explained that to ensure a true fresh install, he deleted configuration data left behind in the user profile.
We suggested to Leith that most users, if they ever reinstall their browsers, will not bother with deleting user profile data, in which case the difference between the identifiers melts away. "Absolutely – if you leave your profile around, then some of the identifiers are tied to the browser," he told us.
There is also the matter of what happens if users log into Google, Microsoft, Apple or Firefox services while using the browser. "If you log on to Google or Apple services through the browser, of course it's all integrated together," he said.
But the focus of Leith's research is on what happens with a default install where you choose not to log on. How private is your browsing history?
Leith started by pasting (not typing) an URL into the browser's top bar. In the case of Chrome: "This generates a request to www.google.com/complete/search with the URL details (i.e. http://leith.ie/nothingtosee.html) passed as a parameter and also two identifier-like quantities (psi and sugkey)." Similarly, Edge sent the URL to the Bing autocomplete API complete with identifying cookie. Yandex also transmitted the URL to its own servers before navigation. Firefox, Brave and Safari did not collect any data from a pasted URL.
What if the user types into the browser? In this respect, the URL or search autocomplete feature is key. Browsers used to distinguish between URLs and searches, but this distinction has largely been lost in favour of a single Omnibox, as Google calls it, which works as a search box unless you type a full and properly formed URL. Most users do not bother, which means web navigation is largely driven by search, plus a few bookmarked sites.
"Safari has the most aggressive autocomplete behaviour, generating a total of 32 requests to both Google and Apple," Leith reported. "The requests to Apple include identifiers that persist across browser restarts and so can be used to link requests together and so reconstruct browsing history.
"Chrome is the next most aggressive, generating 19 requests to a Google server which, once again, include an identifier that persists across browser restarts.
"Firefox is significantly more private, sending no identifiers with requests and terminating requests after the first word, so generating a total of 4 requests. Better still, Brave disables autocomplete by default and sends no requests at all as a user types in the top bar."
Edge, on the other hand, "sends text to www.bing.com as it is typed. A request is sent for almost every letter typed, resulting in a total of 25 requests. Each request contains a cvid value that is persistent across requests although it changes across browser restart."
The key point here is that in some cases the vendor and/or search provider gets all the data they need to construct a user's browsing history. Why does this matter? "The user's browsing history is generally seen as being sensitive data," Leith told us. "It's about a person's interests, it's fine-grained. Sharing that with a third party without clear knowledge and consent seems like a privacy problem."
Leith's study is narrow, though he does demonstrate significant differences between browsers. Whether Edge is worse than Chrome is open to debate, but Edge, Yandex, Chrome and Safari seem to lead the field in terms of calling home with a user's browsing data. Mozilla's FireFox seems better, and Brave better still. Other relevant questions are how the user's search history data is used by the companies that collect it, and what is the impact when users sign in, in order to get the benefit of synchronised bookmarks and indeed browser history across different devices.
Leith is right to highlight the significance of the search/autocomplete feature, which is now standard in most web browsers, and its potential to give away our browsing history even when not logged in to any service.
Mozilla gave us the following comment: "Mozilla publicly documents our data practices and we have a public data review process. Browsing history is only sent to Mozilla if a user turns on our Sync service, whose purpose is to share data across a user's devices. Unlike other browsers, Sync data is end-to-end encrypted, so Mozilla cannot access it.
"Firefox does collect some technical data about how users interact with our product, but that does not include the user's browsing history. This data is transmitted along with a unique randomly generated identifier. IP addresses are retained for a short period for security and fraud detection and then deleted. They are stripped from telemetry data and are not used to correlate user activity across browsing sessions." ®