This article is more than 1 year old

US law firm cleared of robots.txt DMCA hacking charge

Wayback Machine just screwed up, court says

Analysis Sometimes plaintiffs just don't know when to quit.

After losing a trademark infringement suit against a competitor, Healthcare Advocates - a patient advocacy organization based out of Philadelphia - sued the intellectual property law firm that represented the defendant in the trademark action, alleging that the firm had "hacked" the Wayback Machine in order to view blocked archives of its website.

The firm - Harding, Earley, Follmer & Frailey - used the Wayback Machine to look at past incarnations of Healthcare Advocates' site in order to gather evidence to defend against the original trademark infringement charges. Healthcare Advocates had a robots.txt file in place to prevent anyone from viewing the archived versions of its site, but the law firm was still able to bring up certain archived pages.

Healthcare Advocates argued that this constituted a circumvention of a technical measure designed to control access to a copyrighted work, which would violate the Digital Millenium Copyright Act. The company alleged that the firm used the Wayback Machine to bypass its technical measure, the robots.txt file, in order to view its copyrighted website.

The US District Court for the Eastern District of Pennsylvania wasn't buying it, however. The court last week pointed out that the law firm didn't do anything out of the ordinary in order to gain access to the archived pages that Healthcare Advocates had intended to block. Instead, the Wayback Machine simply malfunctioned and allowed the firm to view material that should have been blocked.

Normally, when the Wayback Machine receives a request for the archives of a site with a robots.txt file, it displays a blocked site error message. If any of the site's pages are not blocked, the error message will contain a link to those past versions.

Such a link came up when the law firm searched for the Healthcare Advocates site, even though the robots.txt file should have blocked all of the site's pages.

Apparently, a caching error caused by heavy server load on the days in question caused certain Internet Archive servers to "forget" that they had a copy of the Healthcare Advocates robots.txt file. Then, for unknown reasons, the servers overlooked the robots.txt file when querying Healthcare Advocates' website directly.

This allowed the law firm to view some pages, but not others. Healthcare Advocates never asserted that the law firm had anything to do with causing the excessive load on the days when it tried to view the archived pages, and it was undisputed that the law firm used nothing more than an ordinary web browser to use the Wayback Machine.

The court held that, since the technical prevention measure never actually stood between the law firm and the copyrighted material, the law firm couldn't have circumvented it. Or, to use the court's phrase, "[t]hey did not 'pick the lock' . . . because there was no lock to pick."

The law firm never bypassed the Wayback Machine's obstructions in order to view the pages. The archives simply showed up as if there had been no robots.txt file in place at all. With those facts, the judge concluded, there had been no "hack," and the law firm had not violated the DMCA.

Healthcare Advocates' real beef should have been with Internet Archive for allowing the pages to slip through, but the San Francisco organization settled their way out of the lawsuit last year. The terms of the agreement are unknown, but it allowed IA to avoid having a judgment against it from showing up in the public record.

This is fairly important when considered in the context of the judge's conclusion that a robots.txt file, under the facts of this case, actually does constitute a technical measure subject to the DMCA. That view, if adopted in other jurisdictions, could have some widespread implications for Internet programmers.

While the facts of the current case limit the reach of the robots.txt ruling, it does open the door for a more expansive view of robots.txt files in the future. There may come a day when anyone writing code that ignores a robots.txt could be on the hook for violating the DMCA.

Thus, it was a very good thing for Internet Archive that it got out of the suit when it did. If they had remained as a defendant, the facts of the case would have been much more expansive, and the judge would have had an opportunity to rule on the issue of whether a program that overlooked a robots.txt file violated the DMCA.

An adverse ruling on that issue could have caused some serious problems for Internet Archive, as well as for some big-name deep-pockets out there (*cough* GOOGLE! *cough*). It would have undoubtedly created a whole new class of lawsuit against web services that missed or ignored a robots.txt as they scoured the Internet.

Which, ironically, would have been good news for the law firm defendant in this case.

They won the battle, but they may have lost a huge new revenue stream. ®

Kevin Fayle is an attorney, web editor and writer in San Francisco.

More about

TIP US OFF

Send us news


Other stories you might like