Bing! You have inflated traffic numbers!

Our #1 Fan!  The past couple of months we noticed a very odd trend the the geo-location / city data that was being saved in Omniture.  The great metropolis of Redmond, Washington was now the number 1 city of origin of our web visitors.  And this was not a small lead, they were the number 1 city by more than double the traffic from the number 2 spot.

Trying to find an explanation for all this, I checked the twitter-verse and did a little digging online, but did not have a chance to really dive into the issue until yesterday.

@OmnitureCare pointed me in the right direction with this link to a post on the Bing Forum.

From Bing Forum: I’m getting a ton of hits from IP 65.55.* that appear to be coming from user searches such as referrer “http://www.bing.com/search?q=copper” and user agents similar to “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)”

The swell folks over at Microsoft claim that they are working on a new spider and working out the kinks. Well this “kink” has been mucking with my data and is very annoying. Kink FAIL!  I pulled the web log files and began to search for all traffic from the IP Range of 65.55.*.*  Here is what I found:

2009-10-04 00:12:34 W3SVC1 127.0.0.1 GET /filename - 80 - 65.55.207.50 msnbot/2.0b+(+http://search.msn.com/msnbot.htm) 404 0 0
2009-10-04 00:12:48 W3SVC1 127.0.0.1 GET /filename - 80 - 65.55.109.1 Mozilla/4.0 200 0 0
2009-10-04 00:12:48 W3SVC1 127.0.0.1 GET /filename - 80 - 65.55.109.1 Mozilla/4.0 200 0 0
2009-10-04 00:12:48 W3SVC1 127.0.0.1 GET /filename - 80 - 65.55.109.1 Mozilla/4.0 200 0 0
2009-10-04 00:12:48 W3SVC1 127.0.0.1 GET /filename - 80 - 65.55.109.1 Mozilla/4.0 200 0 0

The first line above looks ok.  The bot clearly identifies itself and does not fire off the tracking JS code.  The next few lines appear to be normal “real” user traffic and do result in the tracking code being executed.   I check the IP address over at http://www.dnsstuff.com and get back that the IP address does indeed belong to Microsoft.

Gee thanks for the spam traffic Microsoft.

I could list row after row of data from the log files with countless variations of traffic from various M$ ip addresses that do not identify them as bots.  All of the fake/spoofed traffic has this in common:

  • 65.55.*.*  IP address range
  • User Agent listed as “Mozilla/4.0” only

Now that I have the “signature” of the bad traffic, I wanted to find a way to prevent this from falsely inflating our city and other visitor data.  I dug around trying to find an elegant way to do this with JavaScript only, but decided the best approach would be to leverage the ColdFusion platform we use and simply suppress the s_code file from being loaded if I was able to detect the traffic matched the above pattern.

<cfif findnocase("65.55.","#cgi.remote_addr#") GT 0 AND findnocase("Mozilla/4.0","#cgi.HTTP_USER_AGENT#") GT 0>

This IF block checks for the pattern and prevents the Omniture JS code from being loaded if both conditions are met.   I put this code in place around noon yesterday and in less than 24 hours I can see that this is working.

Bye Bye Bad Bing Bot Badness

I will be keeping a close eye on all the traffic reports to make sure that we are not losing any real traffic data, but I am confident that will will keep this current issue from jacking my data any further.

Below are a few sites I found that helped me in my research:

-Rudi

Updated 10/15/2009:

James Dutton found this bug showing up in Yahoo Web Analytics as well.  Read his tips to resolve it!
Yahoo Web Analytics data inflated by Bing and how to fix it.

Leave a comment

Your email address will not be published. Required fields are marked *

Are you human? Or are you Dancer? *

2 thoughts on “Bing! You have inflated traffic numbers!”