This article is more than 1 year old

Google's data minefield

Search engine vs Government

The US Government's broad subpoena to search engines effectively seeks to mine the data of the internet. While Google has resisted the subpoena, there may be little they can do to protect our privacy from many prying eyes.

Moreover, the Government subpoena makes Google and other search engines or ISPs the source of first resort for any information about what people's preferences are, what they like or dislike, what they do and don't do, what they read and don't read.

In an effort to protect children from being able to see indecent materials (technically, materials that are "harmful to minors"), Congress passed the Children's Online Protection Act (COPA). This act, a response to the US Supreme Court's rejection of a similar but more restrictive law, the Communications Decency Act, made it unlawful to sell any materials that any court in the United States deemed to be "harmful" to minors anywhere on the internet.

A lawsuit challenging this statute of First Amendment free expression and speech grounds ensued, and the U.S. Court of Appeals essentially told the U.S. government to prove that other means - like web filters, white lists, black lists, and others - that were less restrictive on free speech. However, this could not achieve the same governmental objectives of protecting children without reducing all of the Internet to pabulum.

To do this, the Department of Justice decided that it had to know virtually everything that virtually everyone was doing online - at least for a representative period of time. As a result, the government recently subpoenaed records from all of the major ISP's and search engines. Not surprisingly, almost all of these companies complied with the subpoenas. Only Google objected. As a result of attempting to protect both its trade secrets and the public perception of privacy, Google was rewarded by an immediate drop in its stock price.

The case has much broader implications than the subpoena itself - it raises the continuing question of the ability of the government or others to essentially usurp massive commercial databases. The immediate problem is that both Google and others may find their efforts to quash such subpoenas thwarted by the courts. The real problem is not that the records can be subpoenaed - of course they can. The problem is that these massive databases exist - or more accurately persist - at all.

The Question of COPA

The issue in the COPA litigation is whether it's better to make it a crime to sell things to your average 4 year old instead of giving parents the ability to filter that stuff out. The government argues that no filtering technology keeps out all the things that parents might not want their kids to see, and therefore, we have to make sure that nobody can ever sell that kind of stuff without some effective means of verifying that the recipient is not a kid - even to a 17 year old kid with mom or dad's credit card. The argument suggests we have to make it a crime because filters aren't 100 per cent effective. Sure. And by the way, drugs have been illegal for years. How is that working out in reducing their use and availability?

This entire debate would be academic, except for the means the government has decided to employ to attempt to prove its case. A civil subpoena for, well, the Internet.

What the government subpoenaed from Google, and presumably from all the other search engines was "[a]ll URLs that are available to be located through a query on your computer's search engine as of July 31, 2005" and further demanded production of "[a]ll queries that have been entered on [Google's] search engine between June 1, 2005 and July 31, 2005, inclusive." Although the government ultimately narrowed the scope of the subpoena somewhat after negotiations with Google's lawyers, both the original and modified requests are startlingly broad in scope and remarkably irrelevant to the underlying litigation.

Privacy and trade secrets

When you get a subpoena from the government, you have relatively few grounds in the law to object. First, you can assert some kind of privilege - attorney-client, priest-penitent, or doctor-patient (but not journalist-source so much anymore.) While doctors, lawyers, clergy and journalists all use the Internet and search engines to find information relevant to research for clients, patients, and the like, a broad subpoena for general information is not likely to significantly impact these privileges. This does not mean that the privilege doesn't apply, or that everything you do online is "public" and therefore entitled to no privilege.

Say a client goes to a lawyer about an arcane area of the law. The lawyer hops on Google to do research (with Lexis and Westlaw being too expensive for this solo practitioner). The terms searched for by themselves would reveal the nature of the research, the privileged communications, and the legal strategy. The fact it occurs in the "public" internet is no more relevant than having opposing counsel following a lawyer in the law library to see what books and pages he or she is reading. It's just not kosher. So Google's objection to the broad subpoena on privilege grounds is less than fanciful, but hardly compelling.

Another ground for objecting to the production of documents is that compliance would be unreasonable and oppressive. If the demand for documents is so extensive, and so difficult to amass, the subpoena can be quashed or modified by the court. There is little doubt that this Google subpoena is broad - probably excessively so. But Google prides itself on being able to search for and deliver targeted search results for - well, a Google of information in about 3.24 seconds.

In response to Google's concerns about compliance, a Berkeley statistics professor essentially said it can't be too burdensome because Yahoo! complied. Of course, this is much like when my eight year old demands a later bedtime because of the later bedtime of his older brother - particularly when there is no indication in the record that Yahoo! ever complained or attempted to quash its subpoena. Moreover, I probably missed the part of my college statistics class where the statistics professor was deemed an expert on the difficulty of compliance with a subpoena. In fact, the statistics professor also claims to be an expert on the privacy implications of compliance with the subpoena to Google, glibly stating under oath that Google has no legitimate privacy concerns because "[o]ther vendors have been able to produce samples of queries with all information that might identify a user removed". Hmmm, see what happens when you fall asleep in statistics class? You miss that part about the need to protect privacy.

The problem with subpoenaing search engines

So what about Google's argument about privacy? Certainly Government hasn't subpoenaed what I personally am searching for. They state that what they seek to do is to "characterize sites that can be found using search engines and what people see in practice when they use search engines". If no specific searches are sought, just trend data, what is the big problem? The answer is, plenty.

First of all, there is no reason to believe that any of this is even remotely relevant to the issue in the underlying litigation - whether electronic filters for children work. Let's assume that the Google subpoena shows that there is tons of pornography and smut out there and that people are actively looking for it - and actually finding it. Ooh. So what? This is like trying to determine the effectiveness of the "V chip" by subpoenaing the Neilson or Arbitron aggregate records of popular television shows, and TV Guide's records of everything broadcast. While the Government eschews Google's argument that they are attempting to see "what is out there" on the internet, effectively this is exactly what they are trying to do.

If the statistics show that people aren't using Google to search for porn, Government will simply argue that porn is available on un-indexed sites, and therefore COPA is necessary. We also don't know what percentage of the Google users subpoenaed are children, or adults. Were they using Google's safe image search? Were the sites sought active, and were the URL's available? Was anything delivered to the user? Did the user have any filters on the machines that would block delivery of the URL? Were those filters configured properly? Indeed, while the government agreed to allow Google to produce a random sample of URLs delivered, Google's counsel correctly pointed out that the Government's statistician would still need access to the entire database in order to ensure the sample was statistically random.

Moreover, the subpoena makes Google and other search engines or ISPs the source of first resort for any information about what people's preferences are, what they like or dislike, what they do and don't do, what they read and don't read.

I remember years ago finding an article in an obscure medical journal that was relevant to a case I was litigating. I asked the publisher for a copy of the publication, and he explained that I would have to buy a subscription - for several thousand dollars. So I reached into my desk drawer and pulled out a subpoena - cheaper by far than actually buying the data. Copyright, smopyright. Indeed, having subpoenaed the data, I could now introduce it into the public record at trial - and with today's electronic filing, even make the entire thing available online.

Subpoenaing the world's information

One can imagine thousands of cases where aggregate or even specific Google information might be useful to one party or another in litigation. Did my publisher act in good faith in promoting my book? Let's subpoena Amazon to see how it sold over time and compare it to comparable books. Did my advertising agency meet its contractual obligations? Let's subpoena Yahoo! Indeed, as long as a plausible claim can be made not that the information is relevant to the litigation, but that it may lead to the discovery of relevant information, it is subject to subpoena. As Google's counsel pointed out, "Google objects to [the Government's] view of [its] highly proprietary search database - the primary reason for the company's success - as a free resource that [the government] can access and use, some levels removed, to formulate its own defense." In other words Government, if you want this data from Google, buy it.

Now the Government's statistician has eschewed the need for identifying information. But as I just noted, without this kind of information, the relevance of the data to the COPA litigation is seriously diminished. So once they obtain the general information, there is little to stop the government from asking, "oh, and by the way, what else do you know about those Google users?" What were their IP addresses? What time of day did they perform their searches (during school hours, or between 3pm and 8pm local time?) Did the same people search for kids sites (like Disney or Nickelodeon) and then search for smut? What sites were actually delivered up as a result of these searches? What information does Google keep cached? Oh, and of course, how does Google collect, store, and collate this information in the first place?

The last one is a real kicker. One of Google's principal objections to the subpoena is that compliance will reveal its trade secrets - what it called its "crown jewels". Google argued that it would require Google to disclose the approximate number of URLs in its database, and some details about how it maintains crawled URLs, such as the number of servers, server distribution, and how often Google crawls the worldwide web. This information, according to Google, would be highly valuable to competitors, or miscreants seeking to do harm. What Google didn't mention was the fact that, because Google's competitors have already turned over their versions of this information, even with the protective order in place, it would become public which of the major search engines delivers the most or most accurate results based upon an enormous database in the Government's hands. This could hurt Google's advertising revenues.

Finally, there is the matter of public perception about privacy. Not actual privacy. Indeed, Google's own privacy policy expressly states that Google "may share aggregated non-personal information with third parties outside of Google". This means exactly the kind of information that the government has subpoenaed. Indeed, in its objection to the subpoena, Google argued that compliance would "suggest that Google is willing to reveal information about those who use its services". Damn straight. That's exactly what Google's privacy policy says it will do - not revealing directly who is using its services, but revealing information about the aggregate people who do. The American public has strange attitudes about privacy. It seems to be OK for Google to collect, store and maintain this massive database, sell it, lease it, or let other companies have access to it, charge advertising revenues based on it, but heaven forbid it should fall in the hands of the Government.

Indeed, the Google privacy policy goes on to ask the rhetorical question, "what protections do I have against intrusions by the Government into my use of Google services?" It answers this by saying: "Google does comply with valid legal process, such as search warrants, court orders, or subpoenas seeking personal information. These same processes apply to all law-abiding companies. As has always been the case, the primary protections you have against intrusions by the Government are the laws that apply to where you live." This is pretty standard fare. What is different in this case is that Google is actually challenging the validity of a subpoena - a rare event for any company that gets paid little if anything from the people about whom it collects data. The more general practice is for the Government to send over a copy of a subpoena or search warrant, and the ISP or search company to send over the documents - sometimes not even in that order. Indeed, there is no requirement that the entity retaining your personal records notify you about the legal process to allow you to challenge it at your own expense - and often the Government requests, demands, or passes a law prohibiting the recipient from ever telling you about it - even if the underlying subpoena is itself invalid. Indeed, there have been several reported cases where law enforcement officials have created "fake" subpoenas or court orders for ISP information, and even then the courts have held that the information was OK to use, because it didn't belong to the data subject.

The Google subpoena fight isn't really about the anonymous data at issue here today. It is really about the way the Government can "deputize" unwilling private companies who collect and maintain massive databases to act as their agents in the future. Want someone's credit report? Don't subscribe to Experian and subject yourself to the Fair Credit Reporting Act, just whip out a subpoena. Want to engage in massive warrantless domestic surveillance of e-mail communications? Don't mess with FISA, Title III, ECPA, or even any Presidential inherent authority. Just pass a law (like the ones just passed in Europe) mandating that ISPs and phone companies retain such data, and then subpoena not just one person's emails, but everyone's - as long as it is relevant to some issue in some litigation somewhere. Let's just create a single massive database of what everyone is doing all the time, and let anyone "dip" into it whenever it is deemed to be relevant to settling some dispute.

It seems Orwell was off by about 22 years.

This article originally appeared in SecurityFocus

Copyright © 2006, SecurityFocus

Mark D. Rasch, J.D., is a former head of the Justice Department's computer crime unit, and now serves as Senior Vice President and Chief Security Counsel at Solutionary Inc.

More about


Send us news

Other stories you might like