Scraping public data from the web still OK: US court
Latest LinkedIn blow / Profile harvesting legal / HiQ case rolls on
Scraping data from a public website doesn't violate America's Computer Fraud and Abuse Act (CFAA), the US Ninth Circuit Court of Appeals ruled on Monday.
The decision [PDF] echoes the appeal's court 2019 decision, which upheld a lower court's 2017 determination in HiQ v. LinkedIn that web scraping doesn't qualify as accessing a protected computer without authorization.
The case began in California in 2017 when HiQ, an employment analytics firm, filed a lawsuit challenging LinkedIn's legal and technical efforts to block HiQ from copying public profile data from LinkedIn users.
The district judge hearing the case granted a preliminary injunction to HiQ that barred LinkedIn from interfering with HiQ's data scraping while the case progressed. He decided it didn't make any sense to apply the CFAA – a law that criminalizes accessing a protected computer "without authorization" or in a way that "exceeds authorized access" – to the collection of public data from LinkedIn's website.
LinkedIn nonetheless appealed and two years later the Ninth Circuit sided with HiQ and sent the case back to the Northern District of California to be resolved.
Undeterred, LinkedIn appealed to the US Supreme Court. In March 2020, it asked the Supreme Court to review the Ninth Circuit ruling. The company argued that implementing technical barriers to web scraping in conjunction with sending a cease-and-desist letter together should qualify as an authorization mechanism. In effect, the Microsoft-owned social media site wants to have competitive benefits of gated access without the consequences – invisibility to search engine traffic.
"Under the Ninth Circuit’s rule, every company with a public portion of its website that is integral to the operation of its business – from online retailers like Ticketmaster and Amazon to social networking platforms like Twitter – will be exposed to invasive bots deployed by free-riders unless they place those websites entirely behind password barricades," LinkedIn's attorneys wrote in the company's petition [PDF] to be heard by the Supreme Court.
"But if that happens, those websites will no longer be indexable by search engines, which will make information less available to discovery by the primary means by which people obtain information on the Internet."
- US Supreme Court gives LinkedIn another shot at stymieing web scraping
- Poetry in lockdown: hiQ to Supremes / Please leave LinkedIn scrape ruling / well enough alone
- Welcome to The Reg's poetry corner... hiQ once again / beats LinkedIn on web scrape case / more appeals await
- hiQ prevails / LinkedIn must allow scraping / Of your page info
On June 3, 2021, the Supreme Court in a related case, Van Buren v. United States, narrowed the CFAA, which had for years been criticized for failing to define "without authorization" and "exceeds authorized access."
The high court in Van Buren said that breaking terms of service alone does not qualify as "exceeds authorized access" under the CFAA. Yet it left some ambiguity about whether credential-based gating is the only way to determine whether access was "without authorization."
Then two weeks later, the Supreme Court sent HiQ v. LinkedIn back to the Ninth Circuit for reconsideration in light of how Van Buren had reshaped CFAA liability. Now, the appeals court has revisited its earlier decision and come to the same conclusion it did two years ago, albeit bolstered by the Van Buren case.
"[A] defining feature of public websites is that their publicly available sections lack limitations on access; instead, those sections are open to anyone with a web browser," the Ninth Circuit ruling [PDF] says.
"In other words, applying the 'gates' analogy to a computer hosting publicly available web pages, that computer has erected no gates to lift or lower in the first place. Van Buren therefore reinforces our conclusion that the concept of 'without authorization' does not apply to public websites."
The ruling doesn't resolve the HiQ's dispute with LinkedIn, however. It merely prevents LinkedIn from blocking HiQ's gathering of public data and from making a claim against the analytics biz under the CFAA. Issues related to unfair competition, privacy, and state law have yet to be addressed.
In a statement emailed to The Register, a spokesperson for LinkedIn indicated the company intends to keep fighting in court.
"We’re disappointed, but this was a preliminary ruling and the case is far from over," a company spokesperson said. "We will continue to fight to protect our members' ability to control the information they make available on LinkedIn." ®