This article is more than 1 year old
Think file-hosting sites guard your private data? Think again
Attacks already under way
Academic researchers say they've uncovered weaknesses in dozens of the most popular file hosting sites that allow people to gain unauthorized access to data that's supposed to be available only to those selected by the user.
The services, which include sites such as RapidShare, FileFactory, and Easyshare, allow users to upload large files and make them available to anyone who knows the unique URI (or Uniform Resource Identifier) that's bound to each one. Users may post the link on websites or forums available to the public or share it in a single email to prevent all but the recipient from downloading it. RapidShare, for instance, says it can be used to “share your data with your friends, colleagues or family.”
But according to academics in Belgium and France, a “significant percentage” of the 100 FHSs (or file hosting services) they studied made it trivial for outsiders to access the files simply by guessing the URLs that are bound to each uploaded file. What's more, they presented evidence that such attacks, far from being theoretical, are already happening in the wild.
“These services adopt a security-through-obscurity mechanism where a user can access the uploaded files only by knowing the correct download URIs,” the researchers wrote in a paper presented at the most recent USENIX Workshop on Large-Scale Exploits and Emergent Threats. “While these services claim that these URIs are secret and cannot be guessed, our study shows that this is far from being true.”
The researchers said they trained web crawlers on the file services and uncovered hundreds of thousands of private files in less than a month. They also used the sites to store private files that contained internet beacons, so they'd know if anyone opened them. Over a month's span, 80 unique IP addresses accessed the so-called honey files 275 times, indicating that the weakness is already being exploited in the wild to harvest data many users believe isn't available for general consumption.
The weakness that's easiest to exploit was found on sites that use sequential identifiers in the download URIs. By writing scripts that enumerate the the IDs character by character, their crawler was able to locate almost 311,000 unique files over a period of 30 days. The researchers then ran searches on Microsoft's Bing.com to arrive at an estimate that 168,320, or 54 percent of them, were private because they hadn't been shared online.
“Unfortunately, the problem is extremely serious since the list of insecure FHSs using sequential IDs also includes some of the most popular names, often highly ranked by Alexa in the list of the top internet websites,” the researchers wrote. To prevent their findings from being abused, their report didn't say which sites are vulnerable to specific types of attacks.
Another common weakness involved the use of pseudorandom URIs for each uploaded file. By using brute-force attacks that cycled through every possible combination, the researchers were able to successfully guess a file's unique ID 1.1 times for every thousand attempts. Part of the weakness is the result of websites that used IDs that consisted of only numeric strings with a maximum length of six numbers. But even when services used IDs with alphanumeric characters or numbers with a length of eight, the researchers achieved similar success rates.
In other cases, file services used ID systems with enough complexity that rendered brute-force techniques ineffective or used CAPTCHAs or other mitigations. But the researchers were often able to guess the names anyway, in some cases by exploiting a directory traversal vulnerability in a webhosting program used by multiple services.
In other cases, they defeated the mitigations by using a feature that allows people to report copyright violations and other abuse to the site admins and combining it with a separate feature for deleting files. Because the feature on one site exposed the first 10 characters of a file's 14-character ID, the number of combinations to brute force was a manageable 65,536.
The researchers said the most effective countermeasure against the attacks is the use of encryption on the user's computer. They developed a proof-of-concept Firefox add-on that automatically encrypts and decrypts files upon upload and download and uses steganographic techniques to hide the encrypted files.
The researchers included Nick Nikiforakis, Steven Van Acker, Wouter Joosen, of the Katholieke Universiteit of Leuven in Belgium, and Marco Balduzzi and Davide Balzarotti of the Institute Eurecom in France. A PDF of their paper is here. ®