Study: How Amazon uses Echo smart speaker conversations to target ads

Web giant milks advertisers with data harvested from digital assistant


Updated Amazon and third-party services have been using smart speaker interaction data for ad targeting, in violation of privacy commitments, according to researchers at four US universities.

Academics at the University of Washington, University of California-Davis, University of California-Irvine, and Northeastern University claim "Amazon processes voice data to infer user interests and uses it to serve targeted ads on-platform (Echo devices) as well as off-platform (web)."

The researchers – Umar Iqbal, Pouneh Nikkhah Bahrami, Rahmadi Trimananda, Hao Cui, Alexander Gamero-Garrido, Daniel Dubois, David Choffnes, Athina Markopoulou, Franziska Roesner, Zubair Shafiq – describe their findings in a paper titled, "Your Echos are heard: Tracking, profiling, and ad-targeting in the Amazon smart speaker ecosystem."

The ten academics say that smart speaker interaction – that's you talking to the device – generates ad auction bids from advertisers that are as much as 30x higher than the bids would be without Amazon Echo speaker data. What's more, they say, the way Amazon and Skills developers – makers of software integrated with Amazon's Alexa voice assistant service – operate is often inconsistent with privacy policies.

That may not be a surprise to you; either way, the paper provides a technical dive into, and a thorough analysis of, how Amazon, Echo devices, Alexa, and adverts are all tied together.

To understand how Amazon and Skills developers handle audio data, the boffins created an auditing framework to evaluate how voice data gets collected, used, and shared. They did so because Amazon Echo smart speakers do not provide an interface to assess how data gets used and there's no ready-made mechanism for understanding what happens to smart speaker data sent over the internet.

So the researchers created multiple fake personas with different smart-speaker usage profiles, and then simulated interactions to test statistical differences in amounts bid for audio and web advertisements. In this way, they claim they were able to infer the effect of smart speaker interactions involving the constructed personas.

Technically, the auditing framework involved setting up a custom Raspberry Pi router to record the network endpoints contacted by Amazon Echo and emulating an Amazon Echo by setting up Alexa Voice Service SDK, in order to capture unencrypted network traffic.

This hardware and software setup was used to issue voice commands to an Amazon Echo in order to watch how the data was used for audio ads (via Amazon Music, Spotify, and Pandora), on the web for display ads (personas using a browser logged into Amazon account and Alexa web companion app), and on non-Echo devices.

The fake personas were configured to install and interact with Skills associated with their respective interests. These included: Connected Car, Dating, Fashion & Style, Pets & Animals, Religion & Spirituality, Smart Home, Wine & Beverages, Health & Fitness, and Navigation & Trip Planners.

The researchers say that Echo interaction data is collected both by Amazon and third-parties and that Amazon shares user data with as many as 41 ad partners. They say that ad targeting enabled by the data leads to ad bids as much as 30x higher, and that Amazon's inference of ad interests from voice data is a clear violation of the company's privacy policy and public statements.

In addition, over 70 percent of Skills fail to mention Alexa or Amazon, and a mere 2.2 percent make their data collection practices clear in their privacy policies, the researchers contend.

Transparency

"Amazon’s inference of advertising interests from users’ voice is a clear violation of their policies and public statements," the researchers state in their paper. "Amazon does not provide transparency in usage of data and thus cannot be reliably trusted to protect user privacy."

The paper also observes that Amazon was granted a patent in 2018 titled "Voice-based determination of physical and emotional characteristics of users" that describes how the "current physical and/or emotional condition of the user may facilitate the ability to provide highly targeted audio content, such as audio advertisements promotions, to the user."

Asked to comment, an Amazon spokesperson challenged some of the papers conclusions without citing specific inaccuracies. "Many of the conclusions in this research are based on inaccurate inferences or speculation by the authors, and do not accurately reflect how Alexa works," the spokesperson told The Register in an emailed statement.

Umar Iqbal, postdoctoral researcher at the University of Washington and lead author of the paper, responded, "This is a broad statement without any concrete details."

Asked to provide specifics, Amazon said with regard to the 30x bid price change, "The price of ad auctions are influenced by several factors that are not linked to voice requests or Alexa interactions." Amazon also defended its handling of privacy policy compliance. "We require Skill developers that collect personal information to provide a privacy policy, which we display on the Skill's detail page, and to collect and use that information in compliance with their privacy policy and applicable law," the company said.

Amazon also insisted it does not sell data, which the paper does not allege. "We are not in the business of selling data and we do not share Alexa requests with advertising networks," the US giant said.

If you ask Alexa to order paper towels or to play a song on Amazon Music, the record of that purchase or song play may inform relevant ads shown on Amazon or other sites where Amazon places ads

Iqbal countered, with references to the study's paper: "We find evidence of Alexa Skills directly communicating with advertising/tracking services (section 4.2). We also note that Amazon's advertising partners sync their cookies with Amazon and bid higher than non-partner advertisers (section 5.5). A logical explanation for this behavior is that Amazon/Skills share/sell user interest data with their advertising partners. We do not claim that Amazon directly shares voice input/transcripts with advertising networks."

That is to say, records of Alexa conversations are not handed over verbatim, though the nature of what is discussed is seemingly used to guide the kinds of adverts you'll be targeted with.

Amazon's statement continues, "Similar to what you'd experience if you made a purchase on Amazon.com or requested a song through Amazon Music, if you ask Alexa to order paper towels or to play a song on Amazon Music, the record of that purchase or song play may inform relevant ads shown on Amazon or other sites where Amazon places ads. Customers can opt out of interest-based ads from Amazon at anytime on our website."

Iqbal noted: "This statement actually supports our findings." ®

Updated to add

On Thursday, researcher Umar Iqbal contacted The Register to say that some clarifications had been made to the paper to avoid potential misinterpretation.

"Most notably, we did not state that Amazon shared raw voice recordings/transcripts with advertisers, but we did find evidence that Amazon processes voice recordings from skill interactions to infer user interests and uses those interests to target ads," Iqbal said.

"We also clarified that Amazon’s inference of advertising interests from users’ voice is potentially inconsistent with their public statements, but not their privacy policy. Amazon’s privacy policy neither acknowledges nor denies the usage of Echo interactions for ad targeting."

Broader topics


Other stories you might like

  • Talos names eight deadly sins in widely used industrial software
    Entire swaths of gear relies on vulnerability-laden Open Automation Software (OAS)

    A researcher at Cisco's Talos threat intelligence team found eight vulnerabilities in the Open Automation Software (OAS) platform that, if exploited, could enable a bad actor to access a device and run code on a targeted system.

    The OAS platform is widely used by a range of industrial enterprises, essentially facilitating the transfer of data within an IT environment between hardware and software and playing a central role in organizations' industrial Internet of Things (IIoT) efforts. It touches a range of devices, including PLCs and OPCs and IoT devices, as well as custom applications and APIs, databases and edge systems.

    Companies like Volvo, General Dynamics, JBT Aerotech and wind-turbine maker AES are among the users of the OAS platform.

    Continue reading
  • Despite global uncertainty, $500m hit doesn't rattle Nvidia execs
    CEO acknowledges impact of war, pandemic but says fundamentals ‘are really good’

    Nvidia is expecting a $500 million hit to its global datacenter and consumer business in the second quarter due to COVID lockdowns in China and Russia's invasion of Ukraine. Despite those and other macroeconomic concerns, executives are still optimistic about future prospects.

    "The full impact and duration of the war in Ukraine and COVID lockdowns in China is difficult to predict. However, the impact of our technology and our market opportunities remain unchanged," said Jensen Huang, Nvidia's CEO and co-founder, during the company's first-quarter earnings call.

    Those two statements might sound a little contradictory, including to some investors, particularly following the stock selloff yesterday after concerns over Russia and China prompted Nvidia to issue lower-than-expected guidance for second-quarter revenue.

    Continue reading
  • Another AI supercomputer from HPE: Champollion lands in France
    That's the second in a week following similar system in Munich also aimed at researchers

    HPE is lifting the lid on a new AI supercomputer – the second this week – aimed at building and training larger machine learning models to underpin research.

    Based at HPE's Center of Excellence in Grenoble, France, the new supercomputer is to be named Champollion after the French scholar who made advances in deciphering Egyptian hieroglyphs in the 19th century. It was built in partnership with Nvidia using AMD-based Apollo computer nodes fitted with Nvidia's A100 GPUs.

    Champollion brings together HPC and purpose-built AI technologies to train machine learning models at scale and unlock results faster, HPE said. HPE already provides HPC and AI resources from its Grenoble facilities for customers, and the broader research community to access, and said it plans to provide access to Champollion for scientists and engineers globally to accelerate testing of their AI models and research.

    Continue reading

Biting the hand that feeds IT © 1998–2022