Updated In 2015, as part of a privacy review conducted under the auspices of the World Wide Web Consortium (W3C), Nick Doty flagged a potential problem with web applications.
This week, after five years of debate about whether or how to mitigate this privacy concern, the technical types discussing the matter have simply given up and kicked the can down the road to browser makers in the hope that maybe they can do something.
The year 2015 marked the debut of Progressive Web Applications (PWAs). These are web apps that can be installed on a device and can function when offline. They require a manifest file, a set of JSON-formatted keys and values that describe various app characteristics and capabilities.
One of these keys is
start_url which, if used, is the preferred URL that gets loaded when the web app gets launched from its installed shortcut.
In his privacy review, Doty, then a privacy analyst for the W3C and founding director of UC Berkeley's Center for Technology, Society & Policy, concluded that
start_url represents a potential mechanism for device fingerprinting and associating individuals with an identifier.
"I believe this should be marked as contributing to fingerprinting and creating a new cookie-like local state mechanism," he wrote.
His concern has been captured in section of the W3C's Web Application Manifest draft specification:
It's conceivable that the
start_url could be crafted to indicate that the application was launched from outside the browser (e.g.,
"start_url": "index.html?launcher=homescreen"). This can be useful for analytics and possibly other customizations. However, it is also conceivable that developers could encode strings into the
start_url that uniquely identify the user (e.g., a server assigned UUID). This is fingerprinting/privacy sensitive information that the user might not be aware of.
A unique string of this sort could be used to respawn cookies that had been cleared. For example, a PWA that set its
start_URL to include a user identifier such as
"index.html?uid=abcdef" could reference that identifier to re-associate the user with previously deleted cookies.
Since Doty first raised the issue, various W3C participants have been discussing what can be done in a GitHub Issues thread. After about a year, Mozilla standards engineer Marcos Cáceres closed the issue with a commit recommending that browser makers include a way for users to inspect and change the
The W3C steers the way the World Wide Web works. Yet it is reluctant to record crucial meetings – and its minutes are incompleteREAD MORE
Last year, Lukasz Olejnik, an independent privacy researcher and consultant and former member of the W3C Technical Architecture Group, shamed those involved into reopening the issue.
"Correct me if I'm mistaken," he wrote, "but is throwing the problem on users the recommended solution to the security/privacy issues of here?" He included a smiley face emoticon at the end of the sentence to soften the blow.
That prompted a bug entry for Firefox that remains open. It's unclear how other browser makers see the issue. On iOS, at least, PWA isolation prevents cookie respawning, though not unique ID creation.
Olejnik analyzed the top 10,000 web pages and found 1672 pages include a
manifest.json file, 828 use a dedicated
start_url, 274 append parameters to that URL, and none appear to be using randomly generated identifiers. From that he concludes
start_url isn't being used presently for tracking people.
That suggests there are better tracking mechanisms available at the moment, though that may not always be the case as new privacy defenses get implemented in browsers.
"I think this problem should be taken seriously," wrote Maciej Stachowiak, a software engineer who leads the development of Apple's WebKit. "Tracking via URL parameters is an increasingly common technique on the web in general, to the point that WebKit deployed active mitigations for it. If this technique hasn't made it to PWAs yet, that is only good fortune, not a trait to be relied on."
Olejnik argues that there could be legal implications under the California Consumer Privacy Act of 2018 and Europe's General Data Protection Regulation if
start_url is used as an identifier.
Discussions of the issue continued until about a week ago when Cáceres said there's nothing to be done.
"I honestly don't think there is a way to solve this," he wrote. "It's inherent in the design of URLs that you can encode unique identifiers into them by using an unlimited range of patterns and by mixing and matching their structures."
With agreement from others, he reiterated that the problem is unsolvable on Wednesday and closed the discussion, again.
Privacy is hard. ®
Updated to add
On August 2, after some further discussion, Cáceres re-opened the issue, again.