Mulled Chrome API shines light on long-neglected privacy gap: Sites can snoop on your find-in-page searches

Naughty JS can watch you hit control+F, start typing, see what's on your mind


Updated A browser feature being developed for the open-source Chromium platform has raised data-leakage privacy concerns – though the Google engineers working on the project contend the potential benefits outweigh the risks.

The issue – potential leakage of text entered into the find-in-page search popup invoked by hitting the CTRL-F command – has been a worry for more than eight years. Separate but related Chromium bugs in 2012 and in 2017 highlight the problem. Google's engineers have apparently marked it "won't fix" because it might break things.

Earlier this month, Google software engineer Joey Arhar announced plans to develop the beforematch event, an API that allows browser code (e.g. JavaScript files) to make website text that has been hidden through page styling controls available for scrolling and searching.

For example, if a portion of a webpage has been collapsed so the text is not visible, a find-in-page request would not work as expected. Similarly, a web link using the recently implemented Scroll to Text Fragment API that points to hidden text or element fragment navigation (an #anchor link) would fail.

But with the beforematch event, web developers can craft code that listens for the beforematch event to make hidden text visible and thus searchable prior to handling user interaction with the page.

"The net effect is that the user is able to use find-in-page or link navigation to find content in collapsed sections – something that is not currently possible," the documentation explains.

The privacy implications, also outlined in the documentation, are that the beforematch event expands the amount of information available to those able to run software on web pages, specifically web publishers and possibly their ad tech partners, depending on whether they have cross-origin access.

Chrome logo behind a padlock silhouette

Google rolls out pro-privacy DNS-over-HTTPS support in Chrome 83... with a handy kill switch for corporate IT

READ MORE

"In particular, the page can know which section of text was found using find-in-page, fragment navigation, and scroll-to-text navigation," the documentation says, adding that developers could also glean information about what the user navigated to – via scroll-to-text navigation, or typed into a find-in-page search box – based on which section of the page receives an event.

The privacy risk of beforematch is not that of key logging – recording exactly what a web page user typed into a search dialog. Rather, it's that those able to run code on the page can infer something about the searched text based on the section of the page that receives the event.

For example, if a search about "worker rights" took the user to a section on unionization, the topic could be inferred because that section responded to the beforematch event even if the specific search keywords were not exposed.

"We believe that the risk of exposing this information to the page is low," the explainer says.

Lukasz Olejnik, independent researcher and consultant, expressed concern about the API over the weekend via Twitter, noting the technology makes it easier to profile users and discover their interests. In a direct message he told The Register, "Blurring the lines between the browser's user interface and the web content is a potential risk in the long run."

In the Twitter discussion, Eric Lawrence, program manager on the Microsoft Edge team, pointed out that other browser APIs like Intersection Observer and various ways to read the scroll position on a page can be abused to violate privacy in the same way.

In other words, the privacy problem here – that users don't expect a search on a locally loaded web page to be potentially readable like a search query sent out over the network – goes beyond Chromium's beforematch API. It's present in other APIs. And while concerns about beforematch may seem minor, more serious attacks that allow information interception rather than just inference continue to be possible via event hijacking or misusing window scrolling.

And this isn't merely a matter of academic discussion. Website code that presents find-in-page search boxes as if they were native browser constructs can be found on websites today. There's nothing necessarily nefarious about this, but abuse would be easy. At the very least, the practice is at odds with user expectations.

Code from AppsFlyer website, showing custom search box

In an email to The Register, Serge Egelman, director of usable security and privacy at the International Computer Science Institute (ICSI) in Berkeley, California, and CTO of privacy analysis biz AppCensus.io, said he recently came across an ad tech company, AppsFlyer.com, that had implemented its own search box (type ctrl-F to see it) to handle find-in-page searches instead of relying on the built-in browser capability.

"I noticed that the search box was several pixels lower than it should be," Egelman explained. "I also happened to have the inspector open (a tool in the developer console that allows you to view the source code responsible for any element of the webpage that is moused over), and noticed that the search box was being rendered by website code, rather than being part of the browser."

"My assumption was that they're using it to figure out whether certain topics are missing from their documentation," he added. "However, there are so many privacy and security abuses that this can enable as well."

The Register asked AppsFlyer if anyone could explain why the company website implements its own search box instead of using the native browser find-in-page popup. We've not heard back.

Egelman said Ahmad Bashir, a post-doctoral student in his lab at ICSI, has been mining popular websites to gather their JavaScript code for another project. He asked Bashir to look for other examples of find-in-page interception in the code he has collected but the data hasn't come back yet.

"I was really surprised that browser security policies don't currently prevent this from happening," he said. ®

Updated to add

In a statement provided after this story was filed, a spokesperson for AppsFlyer said, “We implemented our own search within articles because some of the information needed by our customers cannot be accessed by native search (as it resides in accordions, tabs, and other HTML structures). AppsFlyer does not collect or share search data. The last searches are stored locally in the browser for better user experience.”

Broader topics


Other stories you might like

  • Google has more reasons why it doesn't like antitrust law that affects Google
    It'll ruin Gmail, claims web ads giant

    Google has a fresh list of reasons why it opposes tech antitrust legislation making its way through Congress but, like others who've expressed discontent, the ad giant's complaints leave out mention of portions of the proposed law that address said gripes.

    The law bill in question is S.2992, the Senate version of the American Innovation and Choice Online Act (AICOA), which is closer than ever to getting votes in the House and Senate, which could see it advanced to President Biden's desk.

    AICOA prohibits tech companies above a certain size from favoring their own products and services over their competitors. It applies to businesses considered "critical trading partners," meaning the company controls access to a platform through which business users reach their customers. Google, Apple, Amazon, and Meta in one way or another seemingly fall under the scope of this US legislation. 

    Continue reading
  • Brave roasts DuckDuckGo over Bing privacy exception
    Search biz hits back at 'misleading' claims, saga lifts lid on Microsoft's web tracking advice

    Brave CEO Brendan Eich took aim at rival DuckDuckGo on Wednesday by challenging the web search engine's efforts to brush off revelations that its Android, iOS, and macOS browsers gave, to a degree, Microsoft Bing and LinkedIn trackers a pass versus other trackers.

    Eich drew attention to one of DuckDuckGo's defenses for exempting Microsoft's Bing and LinkedIn domains, a condition of its search contract with Microsoft: that its browsers blocked third-party cookies anyway.

    "For non-search tracker blocking (e.g. in our browser), we block most third-party trackers," explained DuckDuckGo CEO Gabriel Weinberg last month. "Unfortunately our Microsoft search syndication agreement prevents us from doing more to Microsoft-owned properties. However, we have been continually pushing and expect to be doing more soon."

    Continue reading
  • Brave Search leaves beta, offers Goggles for filtering, personalizing results
    Freedom or echo chamber?

    Brave Software, maker of a privacy-oriented browser, on Wednesday said its surging search service has exited beta testing while its Goggles search personalization system has entered beta testing.

    Brave Search, which debuted a year ago, has received 2.5 billion search queries since then, apparently, and based on current monthly totals is expected to handle twice as many over the next year. The search service is available in the Brave browser and in other browsers by visiting search.brave.com.

    "Since launching one year ago, Brave Search has prioritized independence and innovation in order to give users the privacy they deserve," wrote Josep Pujol, chief of search at Brave. "The web is changing, and our incredible growth shows that there is demand for a new player that puts users first."

    Continue reading

Biting the hand that feeds IT © 1998–2022