I see that Google has posted their official response to the request for a "million URL's" by the department of Justice. You can read it here. They argue three main points, that the data as requested is useless, that it may expose google trade secrets and that it would be too much work for Google to supply a million "random" URL's. Privacy concerns and "chilling effect" are thrown in for good measure, but it appears to me that these are more in support of Google's business model which is based on the perception of user privacy. While I remain unconvinced that there is not some bit of evil lurking in the heart of Google, this is generally a good thing.
I'm most interested in writing about it from an analytical point of view. It is nice to see a case where "data" is held out as NOT being the answer to a question. This is not to say that data is useless - no one would argue that, but it is a clear statement that a particular set of data may not be suited to a particular purpose. In this case, crafting a law based on search results just seems to be a bad idea. Here is what they say about it:
"First, the Government's presentation falls woefully short of demonstrating that the requested
information will lead to admissible evidence. This burden is unquestionably the Government's.
Rather than meet it, the Government concedes that Google's search queries and URLs are not
evidence to be used at trial at all. Instead, the Government says, the data will be "useful" to its
purported expert in developing some theory to support the Government's notion that a law banning
materials that are harmful to minors on the Internet will be more effective than a technology filter
in eliminating it.
Google is, of course, concerned about the availability of materials harmful to minors on the
Internet, but that shared concern does not render the Government's request acceptable or relevant.
In truth, the data demanded tells the Government absolutely nothing about either filters or the
effectiveness of laws. Nor will the data tell the Government whether a given search would return
any particular URL. Nor will the URL returned, by its name alone, tell the Government whether
that URL was a site that contained material harmful to minors."
Earlier you may have caught that I feel the privacy thing is gratuitous and perhaps a bit ingenuous and to see why I believe this, here is a sample entry from my logs today:
xx.xxx.137.74 - - [22/Feb/2006:10:25:50 -0500] "GET /blog/archives/pmi-and-pmp/pmp-exam-cheats.html HTTP/1.1" 200 13096 zo-d.com "http://www.google.co.uk/search?hl=en&q=PMP+cheat+test+answers" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; {0EF9B069-A48C-18A5-1EEF-88AC09646F5E}; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"
I x'ed out the IP address, but this is typical of what shows up in my logs when someone arrives here from a Google search. The first part is the IP address, then date, the page they are directed to here, some data about http response and size, and then the search itself. This is followed by browser identification. A simple lookup of the IP address shows that it comes from inside one of the big computer manufacturing companies. The fact that google passes along the search terms when it refers a user to my site is great for me. I use it to understand what people are looking for when they arrive here and occasionally write things which respond to those sorts of requests, but since the IP address of the user is passed along too, it is not particularly private.
Most people ending up here are looking for things they don't need to keep to themselves, but if my content were a bit more shady I can imagine that I'd be getting a lot of information from google about the dark side - information that includes where that person is on the internet. This sort of information is not what the government should be using to fish for new ways to make laws, but it is hardly the hallmark of privacy protection.