I was quite pleased to learn last week that by proving 'control' over a website, I could view detailed statistics about how the Google crawler sees it.
The signup process involved putting a blank file with an arbitrary name at the site root; the existence of the file in response to a Google probe confirmed that you, the Google account holder who had just requested the filename, controlled the site
I signed up and happily viewed some data on a site I control. Later, though, while offline, I recalled that many sites will give an OK response to *any* URL path requested of them. These "soft 404s" can cause some confusion for web crawlers, which wind up collecting pages of negligible value. Would a site that gave a "soft 404" OK for any path let anyone claim the right to view stats at Google?
As this is an old and well-known problem in crawling, I figured Google had accounted for it -- for example, they could probe a site with random paths and determine that it gives false OK indications, and then require a more rigorous test in those cases. But, I didn't check that they actually did this.
Give 'em a few more launches, they'll eventually get this right. And it is important that Google does -- because the theme of most of their recent launches has been linking more web content and behavioral data than ever to precise human identities.
I think there's a master plan at work -- more on this in a future post. But both the 'false claim of ownership' and cross-site-scripting exploits are forms of identity theft, and if Google is cross-referencing all your web trails to a single identity, then that identity is going to be a very attractive target for hijacking.