Measuring visitors, page views and actions on your Web site is finally settling down and becoming a manageable task.
As we get accustomed to the tools and the terminology, the prospect of tracking things online is no longer as frightening as it used to be. At least that was the case until November 20, 2003.
On that date, RedEye, a Web analytics vendor in London, released a little study it had done showing that everything you know is wrong. Here's what RedEye did, and what it found.
RedEye has two large clients with lots and lots of visitors: www.asda.com, the UK version of Wal-Mart, and www.willhill.com, an online gambling site (that's legal in the UK).
A majority of customers log in weekly to buy things or, even more often, to place wagers. When customers log in, you know a great deal about them. You know who they are, and through a session ID you know what they're up to. You can then compare their actions to their previous behavior.
Without the login, you have to tie that session ID to something else, typically an Internet protocol (IP) address or a cookie. We all thought that was fine.
Unfortunately, RedEye discovered that was not fine: “The results are staggering: the IP-based approach overestimated unique users by 7.6 times, whilst a cookie-based approach overestimated unique users by 2.3 times.”
How can this be?
When you log on, your Internet service provider assigns you an IP address—a number. When packets leave your computer asking for a specific URL, they contain that IP address so that the server knows where to send the page back. So far, so good.
But let's say your ISP is AOL. AOL has millions of users and large proxy servers that try to handle all that traffic. Through the magic of load balancing, they don't have to worry about everybody in St. Louis trying to access the Internet through the same machine. Instead, your request is handed over to whichever machine is not so busy, and it's then sent along to the Web site you wanted. That site then sends its page back to you via that less-busy node.
Very nice from a load balancing perspective—but not so nice for Web analytics practitioners.
Server logs may show three different people looking at three different pages when in fact it was you, looking at three pages one after the other. And, yes, the opposite might be true. It's possible for several people to look like one person.
Are we having fun yet?
Here's how that works. You work at Megacorp. Megacorp has a nifty firewall. It takes your page request and sends it out to examplesite.com on your behalf. It wants to be certain that nothing untoward comes back, so it has examplesite.com send the page back to it, the firewall, rather than you.
The firewall does the same for every other employee at Megacorp, and the curious Webmaster at examplesite.com wonders why there is one person, at one IP address, looking at 24,000 pages per day.
“Ahhh,” I hear you say, “but that's what Cookies are for, right?”
Sadly, according to the RedEye report, the typical visitor to the two sites they studied uses an average of 2.3 cookies per month. That means the cookie data shows 2.3 times more people visiting than actually did.
Between using multiple machines (one at work, one at home, one at the Internet cafe) and occasionally deleting their cookies (I've never met anybody other than net professionals who did), each of us is a cookie glutton. Your data may vary.
So what have we learned?
If you want to really know what visitors are up to on your site, you have to compare your login information to your IP and cookie information and then weight the cookie data for those who do not log in.
“The report concludes that in most cases, a cookie-based approach combined with appropriate weightings is the only way to ensure that online management information is accurate. The online management information used by the majority of companies is so fundamentally flawed that decisions based around it will be wrong more often that they are right.”
Now why did they have to go and confuse us with the facts?