The past couple of months we noticed a very odd trend the the geo-location / city data that was being saved in Omniture. The great metropolis of Redmond, Washington was now the number 1 city of origin of our web visitors. And this was not a small lead, they were the number 1 city by more than double the traffic from the number 2 spot.
Trying to find an explanation for all this, I checked the twitter-verse and did a little digging online, but did not have a chance to really dive into the issue until yesterday.
From Bing Forum: I’m getting a ton of hits from IP 65.55.* that appear to be coming from user searches such as referrer “http://www.bing.com/search?q=copper” and user agents similar to “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)”
The swell folks over at Microsoft claim that they are working on a new spider and working out the kinks. Well this “kink” has been mucking with my data and is very annoying. Kink FAIL! I pulled the web log files and began to search for all traffic from the IP Range of 65.55.*.* Here is what I found:
2009-10-04 00:12:34 W3SVC1 127.0.0.1 GET /filename - 80 - 126.96.36.199 msnbot/2.0b+(+http://search.msn.com/msnbot.htm) 404 0 0 2009-10-04 00:12:48 W3SVC1 127.0.0.1 GET /filename - 80 - 188.8.131.52 Mozilla/4.0 200 0 0 2009-10-04 00:12:48 W3SVC1 127.0.0.1 GET /filename - 80 - 184.108.40.206 Mozilla/4.0 200 0 0 2009-10-04 00:12:48 W3SVC1 127.0.0.1 GET /filename - 80 - 220.127.116.11 Mozilla/4.0 200 0 0 2009-10-04 00:12:48 W3SVC1 127.0.0.1 GET /filename - 80 - 18.104.22.168 Mozilla/4.0 200 0 0
The first line above looks ok. The bot clearly identifies itself and does not fire off the tracking JS code. The next few lines appear to be normal “real” user traffic and do result in the tracking code being executed. I check the IP address over at http://www.dnsstuff.com and get back that the IP address does indeed belong to Microsoft.
I could list row after row of data from the log files with countless variations of traffic from various M$ ip addresses that do not identify them as bots. All of the fake/spoofed traffic has this in common:
- 65.55.*.* IP address range
- User Agent listed as “Mozilla/4.0” only
<cfif findnocase("65.55.","#cgi.remote_addr#") GT 0 AND findnocase("Mozilla/4.0","#cgi.HTTP_USER_AGENT#") GT 0>
This IF block checks for the pattern and prevents the Omniture JS code from being loaded if both conditions are met. I put this code in place around noon yesterday and in less than 24 hours I can see that this is working.
I will be keeping a close eye on all the traffic reports to make sure that we are not losing any real traffic data, but I am confident that will will keep this current issue from jacking my data any further.
Below are a few sites I found that helped me in my research:
James Dutton found this bug showing up in Yahoo Web Analytics as well. Read his tips to resolve it!
Yahoo Web Analytics data inflated by Bing and how to fix it.