This article is more than 1 year old

Mulled Chrome API shines light on long-neglected privacy gap: Sites can snoop on your find-in-page searches

Naughty JS can watch you hit control+F, start typing, see what's on your mind

Updated A browser feature being developed for the open-source Chromium platform has raised data-leakage privacy concerns – though the Google engineers working on the project contend the potential benefits outweigh the risks.

The issue – potential leakage of text entered into the find-in-page search popup invoked by hitting the CTRL-F command – has been a worry for more than eight years. Separate but related Chromium bugs in 2012 and in 2017 highlight the problem. Google's engineers have apparently marked it "won't fix" because it might break things.

Earlier this month, Google software engineer Joey Arhar announced plans to develop the beforematch event, an API that allows browser code (e.g. JavaScript files) to make website text that has been hidden through page styling controls available for scrolling and searching.

For example, if a portion of a webpage has been collapsed so the text is not visible, a find-in-page request would not work as expected. Similarly, a web link using the recently implemented Scroll to Text Fragment API that points to hidden text or element fragment navigation (an #anchor link) would fail.

But with the beforematch event, web developers can craft code that listens for the beforematch event to make hidden text visible and thus searchable prior to handling user interaction with the page.

"The net effect is that the user is able to use find-in-page or link navigation to find content in collapsed sections – something that is not currently possible," the documentation explains.

The privacy implications, also outlined in the documentation, are that the beforematch event expands the amount of information available to those able to run software on web pages, specifically web publishers and possibly their ad tech partners, depending on whether they have cross-origin access.

Chrome logo behind a padlock silhouette

Google rolls out pro-privacy DNS-over-HTTPS support in Chrome 83... with a handy kill switch for corporate IT

READ MORE

"In particular, the page can know which section of text was found using find-in-page, fragment navigation, and scroll-to-text navigation," the documentation says, adding that developers could also glean information about what the user navigated to – via scroll-to-text navigation, or typed into a find-in-page search box – based on which section of the page receives an event.

The privacy risk of beforematch is not that of key logging – recording exactly what a web page user typed into a search dialog. Rather, it's that those able to run code on the page can infer something about the searched text based on the section of the page that receives the event.

For example, if a search about "worker rights" took the user to a section on unionization, the topic could be inferred because that section responded to the beforematch event even if the specific search keywords were not exposed.

"We believe that the risk of exposing this information to the page is low," the explainer says.

Lukasz Olejnik, independent researcher and consultant, expressed concern about the API over the weekend via Twitter, noting the technology makes it easier to profile users and discover their interests. In a direct message he told The Register, "Blurring the lines between the browser's user interface and the web content is a potential risk in the long run."

In the Twitter discussion, Eric Lawrence, program manager on the Microsoft Edge team, pointed out that other browser APIs like Intersection Observer and various ways to read the scroll position on a page can be abused to violate privacy in the same way.

In other words, the privacy problem here – that users don't expect a search on a locally loaded web page to be potentially readable like a search query sent out over the network – goes beyond Chromium's beforematch API. It's present in other APIs. And while concerns about beforematch may seem minor, more serious attacks that allow information interception rather than just inference continue to be possible via event hijacking or misusing window scrolling.

And this isn't merely a matter of academic discussion. Website code that presents find-in-page search boxes as if they were native browser constructs can be found on websites today. There's nothing necessarily nefarious about this, but abuse would be easy. At the very least, the practice is at odds with user expectations.

Code from AppsFlyer website, showing custom search box

In an email to The Register, Serge Egelman, director of usable security and privacy at the International Computer Science Institute (ICSI) in Berkeley, California, and CTO of privacy analysis biz AppCensus.io, said he recently came across an ad tech company, AppsFlyer.com, that had implemented its own search box (type ctrl-F to see it) to handle find-in-page searches instead of relying on the built-in browser capability.

"I noticed that the search box was several pixels lower than it should be," Egelman explained. "I also happened to have the inspector open (a tool in the developer console that allows you to view the source code responsible for any element of the webpage that is moused over), and noticed that the search box was being rendered by website code, rather than being part of the browser."

"My assumption was that they're using it to figure out whether certain topics are missing from their documentation," he added. "However, there are so many privacy and security abuses that this can enable as well."

The Register asked AppsFlyer if anyone could explain why the company website implements its own search box instead of using the native browser find-in-page popup. We've not heard back.

Egelman said Ahmad Bashir, a post-doctoral student in his lab at ICSI, has been mining popular websites to gather their JavaScript code for another project. He asked Bashir to look for other examples of find-in-page interception in the code he has collected but the data hasn't come back yet.

"I was really surprised that browser security policies don't currently prevent this from happening," he said. ®

Updated to add

In a statement provided after this story was filed, a spokesperson for AppsFlyer said, “We implemented our own search within articles because some of the information needed by our customers cannot be accessed by native search (as it resides in accordions, tabs, and other HTML structures). AppsFlyer does not collect or share search data. The last searches are stored locally in the browser for better user experience.”

More about

TIP US OFF

Send us news


Other stories you might like