Google 'Instant Previews' hit Google Analytics with fake traffic
Real-time page fetch
Updated Update: This story has been updated with comment from Google, and it has been clarfied to show that Google did tell webmasters when "Instant Previews" launched that it would be doing real-time fetches in some cases. We've also added stats concerning the fetches from The Register's site logs.
Google's new "Instant Previews" search tool is skewing traffic stats for sites using Google Analytics, creating page views before pages are actually viewed.
Rolled out across Google's search engine earlier this month, Instant Previews lets searchers, yes, preview sites before they visit them. Users click on a small icon that appears beside a search result, and this launches an image of the site in question on the right-hand-side of Google's results page.
As Google pointed out when "Instant Previews" was launched, Google is – in some cases – fetching these previews in real time. Soon after the tool's launch, webmasters posting to Google's help forums noticed that these pre-fetches were skewing Google Anayltics numbers. And as noticed by Search Engine Land, a Google employee later confirmed this with a post of his own.
This same employee goes on reiterate that the preview fetches use their own user agent, so webmasters can filter them out if they're using other analytics methods.
"If you are using other website metrics tracking solutions, it might make sense to also filter that user-agent out."
The company has now posted a FAQ that details the user agent in question:
Mozilla/5.0 (en-us) AppleWebKit/525.13 (KHTML, like Gecko; Google Web Preview) Version/3.1 Safari/525.13
Asked to comment, Google told us: "Webmasters have the ability to control whether Instant Previews are counted as page views. This works in the same way they control how crawls by regular Googlebot count as page views. Instant Previews sometimes gets enough information from Google’s regular crawl. Occasionally, Google will need to refetch this information when the user needs it, and in these situations we will do so using the 'Google Web Preview' useragent. Webmasters can configure their sites to treat this useragent in the same way that they handle crawls by googlebot."
The FAQ page explains that Google fetches previews in real time when it lacks a cached copy of the page previousy collected by its crawl bots. "We mostly generate preview images based on content we’ve crawled with Googlebot," it says. "When we don’t have a cached preview image (which primarily happens when we can’t fetch the contents of important resources), we may choose to create a preview image on-the-fly based on a user’s request. "
The company also says that because the preview fetches use a separate user agent, the previews may include data that webmasters have blocked the crawl bots from collecting. "As on-the-fly rendering is only done based on a user request (when a user activates previews), it’s possible that it will include embedded content which may be blocked from Googlebot using a robots.txt file."
The Google Analytics situation is reminiscent of the AVG Linkscanner, which started spewing fake traffic across the net in early- to mid-2008. In late February of that year, AVG paired its anti-virus engine with a real-time malware scanner that would vet search results before users clicked on them. If you searched Google, for instance, it would automatically visit each address that turns up on Google's results page.
According to the company, more than 20 million people had downloaded the new AVG 8 by late June 2008, and this caused a huge uptick in traffic on sites across the web. Under pressure from webmasters, the company soon disabled its real-time scanning.
But judging from site logs at The Register, the number of real-time preview fetches from Google is relatively small. Over the past 24 hours, we've had 1244 page requests for the user agent in question from a mere 60 unique IPs. ®