This article is more than 1 year old
I know what you downloaded from Freenet
Anonymous P2P network open to easy forensic attack
Exclusive The Freenet Project has been around since 2000. It was designed as a stealthy P2P network (some have called it a "darknet") that distributes its content so broadly that it's impossible to censor.
There are a number of security features in Freenet that other P2P networks lack. Because data that the network's various nodes exchange is encrypted, it's difficult, though not impossible, for an outside observer to know what's being passed between two nodes. It is also nearly impossible to identify the author of a Freesite, or to identify the person responsible for inserting content into the network, unless they wish to be known. Most importantly, it's nearly impossible for an outside attacker to determine whether a given node is requesting the data being sent to it, or is merely relaying it to another node.
These layers of obscurity and limited anonymity are what enable Freenet participants to exchange information freely. Content that is illegal, whether rightly or wrongly, flows freely through the network, cached on thousands of computers worldwide.
Who knows where that stuff came from?
Each participant necessarily operates a Freenet node, which caches encrypted data that has either been requested by that node's owner, or requested by other Freenet nodes. That is, one's node will cache data that it is merely proxying for others. Caching enables the broad distribution of content that makes Freenet impossible to censor. It also introduces doubt about the origin of any data found in a given node's cache. It helps to provide deniability.
Of course, anyone can find out what data is in their cache by decrypting it. If one applies the correct Content Hash Key (CHK), the data will be revealed. But because it's encrypted, one can avoid knowing what's in their cache simply by neglecting to run a list of CHKs against it - hence deniability in case a forensic examiner should locate illegal files in one's Freenet cache. It is, or rather, ought to be, impossible to determine whether the owner of a particular machine requested the files in his cache, or if his node merely proxied and cached them for others.
Obviously, this works only so long as cached data that the node's owner has requested, and cached data that his node has proxied, are indistinguishable. Unfortunately, The Register has discovered that this is not the case for large files.
Behavioral differences
Let there be a file of, say, 700 MB - maybe a movie, maybe warez, and possibly illegal, that you wish to have. Your node will download portions of this "splitfile" from numerous other nodes, where they are distributed. To enable you to recover quickly from interruptions during the download, your node will cache all of the chunks it receives. Thus when you re-start the download after an interruption, you will download only those portions of the file that you haven't already received. When the download is complete, the various chunks will be decrypted and assembled, and the file will be saved in your ~/freenet-downloads directory.
If you destroy the file but leave your cache intact, you can request it again, and the file will appear almost instantly. And there's the problem.
Freenet distributes files in a way that tends to select for frequently-requested, or "popular" data. This is partly because the other nodes that one's requests pass through will also cache parts of any files one requests. The more often a file is requested, the more often it will be cached, and the more nodes it will appear on.
We tested this, and found that a 50 MB file took six hours to download the first time we tried. After we eliminated the contents of our own local cache, we requested the file again, and it took only two hours and 20 minutes. Clearly, our "neighborhood" nodes had been caching a good deal of it while we downloaded it the first time. That behavior is by design, and it's nothing to be concerned about. The difference in download times between files never downloaded before and ones cached nearby is not revealing, because anyone else nearby might have initiated the request.
However, it is quite easy to distinguish between a large file cached in nearby nodes and one cached locally. And that is a very big deal.
As we noted earlier, a large splitfile will be cached locally to enable quick recovery from download interruptions. The problem is, the entire file will be cached. This means that, when a file is downloaded once, so long as the local cache remains intact, it can be reconstructed wholly from the local cache in minutes, even when the computer is disconnected from the internet. And this holds even when the browser cache is eliminated as a factor.
We tested this by downloading the same 50 MB file and removing it from our ~/freenet-downloads directory, while leaving the local Freenet cache intact. On our second attempt, it "downloaded" in one minute, nine seconds.
We ran the test again after disconnecting our computer from the internet, with the Freenet application still running, and it "downloaded" in one minute, fifty seconds.
So, it took six hours initially; two hours, twenty minutes with neighboring nodes caching it thanks to our request; and less than two minutes with our local cache intact, even when disconnected from the net.
The difference in download time between a splitfile cached locally (seconds) and one cached nearby (hours) is so great that we can safely dismiss the possibility that any part of it is coming from nearby nodes, even under the best possible network conditions. It's absolutely clear that the entire file is being rebuilt from the local cache. Forensically speaking, that information is golden.
The attack
Exploiting that information would be trivial. Only a bit of statistical data, of the sort that any government agency in the world could easily afford to obtain, will be needed.
Here's what we need to know: how many chunks of a splitfile will appear on a node that only relays file requests after x amount of uptime. That's it. We already know that for nodes requesting a splitfile, the answer is 100% of the chunks in the amount of uptime needed to fetch them. By running several nodes and observing them, we can easily determine how long it will take to cache, by relaying alone, an entire file of x size.
Since Freenet logs uptime, a forensic examiner can easily learn how long your node has been alive, even if there have been interruptions. So it is quite possible to estimate how many intact files, and of what size, your node ought to have cached without your participation. If the examiner finds many more files, or many larger files, than predicted, and they are illegal, you are in trouble.
Using a tool called FUQID, which queues Freenet file requests, one could easily run a list of forbidden CHKs against a disk image. If the number/size of whole files containing naughty stuff is significantly higher than predicted by your node's uptime, you are in trouble.
A forensic attack can be made more damning if the examiner has statistical information about the density of certain types of files on the network overall, which, again, running several test nodes will reveal. If the density of intact naughty files in your cache doesn't mimic within reasonable tolerances the density of such files on the network, you are in trouble.
You might be smart enough to disable your browser cache and its downloads history, and smart enough to wipe properly or encrypt dangerous files you've downloaded, but your Freenet cache, over which you have little control, will still tell on you.
The fix
We ran these observations by Freenet founder Ian Clarke. He agreed that the caching behavior does reveal far too many clues. But the next major revision is expected to eliminate the problem. Sometime later this year, it is hoped, the Freeenet developers will release a version that employs premix routing.
According to current plans, requests will be relayed through at least three nodes before any caching is performed. The three nodes nearest the one making the request will therefore not be able to determine what has been requested. The requesting node will cache downloads temporarily, although the mechanism by which locally cached data will be purged or randomized once the download is complete is not clear to us. Perhaps it has yet to be established.
In a forthcoming article, we will consider Freenet more generally, and offer suggestions for using it with greater safety, such as it is. ®
Update Due to an ambiguous comment in e-mail exchanged between The Register and Freenet founder Ian Clarke, we reported above that the forthcoming premix routing version of Freenet will temporarily cache splitfiles locally. According to Clarke, no premix nodes (including the requesting node) will cache any content.