Ad-tech firms grab email addresses from forms before they're even submitted

Researchers find widespread harvesting of info without consent


Tracking, marketing, and analytics firms have been exfiltrating the email addresses of internet users from web forms prior to submission and without user consent, according to security researchers.

Some of these firms are said to have also inadvertently grabbed passwords from these forms.

In a research paper scheduled to appear at the Usenix '22 security conference later this year, authors Asuman Senol (imec-COSIC, KU Leuven), Gunes Acar (Radboud University), Mathias Humbert (University of Lausanne) and Frederik Zuiderveen Borgesius, (Radboud University) describe how they measured data handling in web forms on the top 100,000 websites, as ranked by research site Tranco.

The boffins created their own software to measure email and password data gathering from web forms – structured web input boxes through which site visitors can enter data and submit it to a local or remote application.

Providing information through a web form by pressing the submit button generally indicates the user has consented to provide that information for a specific purpose. But web pages, because they run JavaScript code, can be programmed to respond to events prior to a user pressing a form's submit button.

And many companies involved in data gathering and advertising appear to believe that they're entitled to grab the information website visitors enter into forms with scripts before the submit button has been pressed.

"Our analyses show that users’ email addresses are exfiltrated to tracking, marketing and analytics domains before form submission and without giving consent on 1,844 websites in the EU crawl and 2,950 websites in the US crawl," the researchers state in their paper, noting that the addresses may be unencoded, encoded, compressed, or hashed depending on the vendor involved.

Most of the email addresses grabbed were sent to known tracking domains, though the boffins say they identified 41 tracking domains that are not found on any of the popular blocklists.

"Furthermore, we find incidental password collection on 52 websites by third-party session replay scripts," the researchers say.

Replay scripts are designed to record keystrokes, mouse movements, scrolling behavior, other forms of interaction, and webpage contents in order to send that data to marketing firms for analysis. In an adversarial context, they'd be called keyloggers or malware; but in the context of advertising, somehow it's just session-replay scripts.

Gunes Acar, one of the report co-authors, was also the co-author of a similar research project in 2017 that looked at data gathering by session-replay companies Yandex, FullStory, Hotjar, UserReplay, Smartlook, Clicktale, and SessionCam.

Evidently, not much has changed since then, except perhaps that email addresses have become more desirable as unique identifiers now that privacy-oriented browsers like Brave, Firefox, and Safari are taking more steps to block cookies and tracking scripts.

Email addresses, the researchers observe, represent a cookie replacement because they're unique, persistent, and can be used to track people across applications, platforms, and even offline interactions that may be tied to an email address like loyalty card transactions.

The website categories with the most leaking forms include: Fashion/Beauty (11.1 per cent, EU; 19 per cent US); Online Shopping (9.4 per cent EU; 15.1 per cent US); and General News (6.6 per cent EU; 10.2 per cent US).

Websites categorized as Pornography had the best privacy when it comes to surreptitious form data harvesting.

"A somehow surprising result was the following: despite filling email fields on hundreds of websites categorized as Pornography, we have not a single email leak," the researchers say, noting that previous studies of adult-oriented websites have relatively fewer third-party trackers than similarly popular general interest websites.

Those pesky regulations

The report authors say that EU websites practicing email exfiltration may be in violation of at least three GDPR requirements: transparency, purpose limitation, and prior consent. Firms found to be violating these rules can be fined up to $20m euros or 4 per cent of annual revenue, per Article 83(5).

The US doesn't have a federal data privacy law, though it's conceivable one of the handful of US states with applicable privacy rules could take action against pre-submission form harvesting. But given the toothlessness of US privacy regulation over the past decade, don't expect much.

The authors say they attempted to contact 58 first-parties and 28 third-parties with GDPR requests. They report receiving 30 responses from the first-parties, which varied from surprise and remediation to justifications of one sort or another.

"fivethirtyeight.com (via Walt Disney’s DPO), trello.com (Atlassian), lever.co, branch.io and cision.com were among the websites that said they had not been aware of the email collection prior to form submission on their websites and removed the behavior," the report says.

Marriott, meanwhile, said the information collected by digital analytics firm Glassbox helps with customer care, technical support, and fraud prevention.

Third-parties Taboola, Zoominfo, and ActiveProspect defended their data collection practices.

Facebook, aka Meta, is among the third-parties involved in this. The researchers say that email addresses or their hashes were spotted being sent to facebook.com from 21 different websites in the EU.

"On 17 of these, Facebook Pixel’s Automatic Advanced Matching feature was responsible for sending the SHA-256 of the email address in a SubscribedButtonClick event, despite not clicking any submit button," the report says.

Advanced Matching – called out recently for harvesting student loan data – is designed to collect hashed customer data, such as email addresses, phone numbers, and names from checkout, sign-in, and registration forms. The researchers speculate that on these sites, Facebook's script treats clicks on non-submit buttons as a click event for the submit button.

Facebook did not respond to a request for comment.

The report concludes that browser vendors, regulators, and privacy tool makers need to deal with this issue because it isn't going away. "Based on our findings, users should assume that the personal information they enter into web forms may be collected by trackers – even if the form is never submitted," the report concludes. ®


Other stories you might like

  • America edges closer to a federal data privacy law, not that anyone can agree on it
    What do we want? Safeguards on information! How do we want it? Er, someone help!

    American lawmakers held a hearing on Tuesday to discuss a proposed federal information privacy bill that many want yet few believe will be approved in its current form.

    The hearing, dubbed "Protecting America's Consumers: Bipartisan Legislation to Strengthen Data Privacy and Security," was overseen by the House Subcommittee on Consumer Protection and Commerce of the Committee on Energy and Commerce.

    Therein, legislators and various concerned parties opined on the American Data Privacy and Protection Act (ADPPA) [PDF], proposed by Senator Roger Wicker (R-MS) and Representatives Frank Pallone (D-NJ) and Cathy McMorris Rodgers (R-WA).

    Continue reading
  • Meta: We need 5x more GPUs to combat TikTok, stat
    And 30% fewer new engineers this year

    Comment Facebook parent Meta has reportedly said it needs to increase its fleet of datacenter GPUs fivefold to help it compete against short-form video app and perennial security concern TikTok.

    The oft-controversial tech giant needs these hardware accelerators in its servers by the end of the year to power its so-called discovery engine that will become the center of future social media efforts, according to an internal memo seen by Reuters that was written by Meta Chief Product Officer Chris Cox.

    Separately, CEO Mark Zuckerberg told Meta staff on Thursday in a weekly Q&A the biz had planned to hire 10,000 engineers this year, and this has now been cut to between 6,000 and 7,000 in the shadow of an economic downturn. He also said some open positions would be removed, and pressure will be placed on the performance of those staying at the corporation.

    Continue reading
  • Brave Search leaves beta, offers Goggles for filtering, personalizing results
    Freedom or echo chamber?

    Brave Software, maker of a privacy-oriented browser, on Wednesday said its surging search service has exited beta testing while its Goggles search personalization system has entered beta testing.

    Brave Search, which debuted a year ago, has received 2.5 billion search queries since then, apparently, and based on current monthly totals is expected to handle twice as many over the next year. The search service is available in the Brave browser and in other browsers by visiting search.brave.com.

    "Since launching one year ago, Brave Search has prioritized independence and innovation in order to give users the privacy they deserve," wrote Josep Pujol, chief of search at Brave. "The web is changing, and our incredible growth shows that there is demand for a new player that puts users first."

    Continue reading
  • Cookie consent crumbles under fresh UK data law proposals
    Campaigners fear erosion of rights as narrowing of law proposed as well as political control over independent watchdog

    The UK government has published its plans for reforming local data protection law which includes removing the requirement for consent for all website cookies – akin to the situation across much of the US.

    Also notable is the removal of the requirement for a Data Protection Impact Assessment, as well as a new political direction over the Information Commissioner's Office.

    However, Nadine Dorries, the minister for the Department of Digital, Media, Culture and Sport, rejected controversial proposals to remove the right to challenge automated decision-making. Privacy campaigners had said the proposals were "irresponsible" and would make it harder for people to "challenge the government or corporations."

    Continue reading

Biting the hand that feeds IT © 1998–2022