A university research project in Brazil has had its Tor relay node banned after it was caught harvesting the .onion addresses of visitors.
Marcus Rodrigues, a junior researcher with the University of Campinas in São Paulo, claims he and others were working to create a tool that could tell malicious hidden services from benign ones when they decided to begin poking .onion addresses and fetching their webpages in bulk.
"My research in particular is about malicious hidden services. I'm developing a method to automatically categorize a malicious hidden service by its content (eg, drug traffic website, malware propagation)," Rodrigues told The Register.
"We would then publish an academic paper containing up-to-date statistics regarding what kind of malicious websites there are on the dark web. We were also going to develop a platform on which the user could verify if a certain .onion website is trustworthy or malicious before entering it."
To do this, Rodrigues says, he modified the node to collect specific data about the hidden services, though he notes nothing was collected that could de-anonymize the user or the specific service.
"That would provide information about the Hidden Services running at the time, such as their .onion addresses, their popularity and some technical data – none of which would allow me to deanonymize or harm the hidden service in any way," he explained.
In a Tor mailing list post on Thursday, Rodrigues described the system in more detail:
My relay was harvesting .onion addresses and I apologize if that breaks any rule or ethical guideline.
We were conducting some research on malicious Hidden Services to study their behavior and how we could design a tool that could tell malicious and benign Hidden Services apart.
Because we focus mainly on web pages, we use a crawler to get almost all of the data we need. However, there are some statistics (such as the size of the Tor network, how many HSs run HTTP(s) protocol, how many run other protocols and which protocols do they run, etc) which cannot be obtained through a crawler. That's why we were harvesting .onion addresses.
We would run a simple portscan and download the index page, in case it was running a web server, on a few random addresses we collected. We would also try and determine the average longevity of those few HSs. However, after collecting the data we needed for statistical purposes, the .onion addresses we collected would be deleted and under no circumstances we would disclose the information we collected on a specific .onion address we harvested. In addition, we would never target specific harvested HS, but only a random sample.
The moral of the story: always read the rules.
Now, Rodrigues says, his group is unable to bring its Tor relay node back online, and so far nobody from the project has given them any indication that the ban will ever be removed. Still, he says, the research will continue.
"I can use other methods to discover the Hidden Services," he explains, "but none is as informative or as efficient." ®