FYI: Twitter's API still spews enough metadata to reveal exactly where you lived, worked

Old tweets betray sensitive data under new tools

Analysis Researchers have demonstrated yet again that location metadata from Twitter posts can be used to infer private information like users' home addresses, workplaces, and sensitive locations they've visited.

Computer science boffins Kostas Drakonakis, Panagiotis Ilia, Sotiris Ioannidis, and Jason Polakis affiliated with The Foundation for Research & Technology in Greece and the University of Illinois in the US published their findings in a paper titled "Please Forget Where I Was Last Summer: The Privacy Risks of Public Location (Meta)Data," which is scheduled to be presented at the Network and Distributed System Security Symposium in February.

"We show that location metadata enables the inference of sensitive information that could be misused for a wide range of scenarios (eg: from a repressive regime de-anonymizing an activist’s account to an insurance company inferring a customer’s health issues, or a potential employer conducting a background check)," they claim in their paper.

The privacy risk associated with Twitter geolocation data was explored in academic research published in 2015 and since then Twitter has provided users with more control over location data and limited the precision of recorded coordinates. The company presently disables precise location by default and it requires users to opt-in to share their location.

"Account holders choose to share their location when they Tweet," a Twitter spokesperson said in an email to The Register on Monday.

"Please note this is opt-in; we never attach location to a Tweet without the person's permission. If someone chooses to share their location in a Tweet, the location is also available via our APIs. Again, this is strictly when a person opts in."

Some progress, but not enough

But Twitter's changes haven't really mitigated the privacy risk since the company continues to offer historic location data through its developer API. Versions of the Twitter mobile app for Android and iOS released before April 2015 automatically included precise GPS coordinates as metadata in tweets tagged with a low-precision location label.

"In the dataset we collected we found that tweets with coarse grained location labels (e.g., the name of a city) also have GPS coordinates in the metadata dating back to 2010," said Polakis. "After April 2015 tweets started appearing with coarse grained labels but without GPS coordinates in the metadata, indicating that around that time there was a change in Twitter's app."

For the researchers, the Twitter policy that allowed the inclusion of precise location data represents a privacy problem that should be addressed.

"This privacy violation is invisible to users, as the GPS coordinates are only contained in the metadata returned by the API and not visible through the Twitter website or app," the paper explains. "To make matters worse, this historical metadata currently remains publicly accessible through the API."

Location data presents businesses with a challenge: It's potentially so valuable for ad targeting that companies appear to be disinclined to discourage its disclosure and don't go to great lengths to explain how such data might be used. Last week, the Los Angeles City Attorney filed a lawsuit against IBM's weather company for failing to adequately disclose how it uses the location data harvested through its Weather Channel app.

For Twitter users, the problem is privacy. To outline possible risks, the paper describes how a user's negative statements about a doctor on Twitter allowed the individual to be placed at the office of a mental health professional. It also recounts a user complaining about blood testing in a tweet geo-tagged to a rehab center.

Some tools better left unshared

In the course of their work, the researchers developed and tested a location data auditing tool called LPAuditor to examine tweets for location metadata and infer sensitive personal information.

The tool, which relies on publicly accessible geolocation databases, will not be open sourced due to the potential for misuse, said Jason Polakis, assistant professor of computer science at the University of Illinois at Chicago and one of the paper's co-authors, in an email to The Register.

The software can pinpoint the locations associated with homes and workplaces much more accurately than previously demonstrated techniques.

"Our system is able to identify the home and workplace for 92.5 per cent and 55.6 per cent of the users respectively," the paper says.

That's between 18.9 per cent and 91.6 per cent more accurate for homes and 8.7 per cent to 21.8 per cent more accurate for workplaces than has been demonstrated in the past, the researchers say.

Polakis and his colleagues found "71 per cent of users have tweeted from sensitive locations, 27.5 per cent of which can be placed there with high confidence based on the content of their tweets."

When users can choose whether location data gets published, there's a 94.6 per cent reduction in tweets tagged with GPS coordinates, according to the researchers. They argue such stats underscore the benefit of giving people control over location data. But location controls are not retroactive – developers presently have access to years of location data through the Twitter API.


Facebook admits it does track non-users, for their own good


Out of 290,162 users in the survey dataset, 87,114 posted geotagged tweets via the official Twitter and Foursquare apps. The researchers did not consider other third-party apps, which they said "may handle geolocation data differently as Twitter’s Geo Guidelines are neither mandatory nor enforceable."

Using the Twitter API, the researchers were able to find precise geolocation data for about 30 per cent of those in the user dataset. They say the Twitter policies that allowed such data to be published resulted in "an almost 15-fold increase in the number of users whose key locations are successfully identified by our system."

What's in the databases?

The fact that third parties may have collected this data and stored it without the explicit consent of Twitter users is troubling for Polakis.

"So much data is being collected and shared/sold to third parties without the users being explicitly aware of that (or able to prevent it)," he said. "And indeed it is problematic when users have no way to delete that data in third-party databases, even though the first party may offer such an option."

Cautioning that he's not a legal scholar, he nonetheless says that given the research findings and the sensitive nature of the what can be inferred from location data, legislation or more explicit oversight may make sense for such data.

"We hope to see a change in how major companies collect and share location data, and the adoption of more privacy-preserving approaches," he said.

"We also hope that our work can help educate users on the risks that they face when they share their location data (either explicitly or inadvertently) with web services or other users. Being aware of what someone could infer about you using that data can be a powerful incentive towards being more cautious during your online activities." ®

Other stories you might like

  • India extends deadline for compliance with infosec logging rules by 90 days
    Helpfully announced extension on deadline day

    Updated India's Ministry of Electronics and Information Technology (MeitY) and the local Computer Emergency Response Team (CERT-In) have extended the deadline for compliance with the Cyber Security Directions introduced on April 28, which were due to take effect yesterday.

    The Directions require verbose logging of users' activities on VPNs and clouds, reporting of infosec incidents within six hours of detection - even for trivial things like unusual port scanning - exclusive use of Indian network time protocol servers, and many other burdensome requirements. The Directions were purported to improve the security of local organisations, and to give CERT-In information it could use to assess threats to India. Yet the Directions allowed incident reports to be sent by fax – good ol' fax – to CERT-In, which offered no evidence it operates or would build infrastructure capable of ingesting or analyzing the millions of incident reports it would be sent by compliant organizations.

    The Directions were roundly criticized by tech lobby groups that pointed out requirements such as compelling clouds to store logs of customers' activities was futile, since clouds don't log what goes on inside resources rented by their customers. VPN providers quit India and moved their servers offshore, citing the impossibility of storing user logs when their entire business model rests on not logging user activities. VPN operators going offshore means India's government is therefore less able to influence such outfits.

    Continue reading
  • America edges closer to a federal data privacy law, not that anyone can agree on it
    What do we want? Safeguards on information! How do we want it? Er, someone help!

    American lawmakers held a hearing on Tuesday to discuss a proposed federal information privacy bill that many want yet few believe will be approved in its current form.

    The hearing, dubbed "Protecting America's Consumers: Bipartisan Legislation to Strengthen Data Privacy and Security," was overseen by the House Subcommittee on Consumer Protection and Commerce of the Committee on Energy and Commerce.

    Therein, legislators and various concerned parties opined on the American Data Privacy and Protection Act (ADPPA) [PDF], proposed by Senator Roger Wicker (R-MS) and Representatives Frank Pallone (D-NJ) and Cathy McMorris Rodgers (R-WA).

    Continue reading
  • If Twitter forgets your timeline preference, and you're using Safari, this is why
    Privacy through amnesia not ideal for remembering user choice

    Apple's Intelligent Tracking Protection (ITP) in Safari has implemented privacy through forgetfulness, and the result is that users of Twitter may have to remind Safari of their preferences.

    Apple's privacy technology has been designed to block third-party cookies in its Safari browser. But according to software developer Jeff Johnson, it keeps such a tight lid on browser-based storage that if the user hasn't visited Twitter for a week, ITP will delete user set preferences.

    So instead of seeing "Latest Tweets" – a chronological timeline – Safari users returning to Twitter after seven days can expect to see Twitter's algorithmically curated tweets under its "Home" setting.

    Continue reading

Biting the hand that feeds IT © 1998–2022