Having a Web site is no longer a competitive advantage—you know that. Managers are expected to quantify the impact of their Web properties on the bottom line of their businesses. As a result, the Web analytics industry is booming and is predicted to grow to $1 billion by 2006.
In my view, Web analytics will never be an exact science, even though its influence in the decision-making process is growing.
Issues inherent in Web data analysis make it difficult to get accurate, insightful data. This article's purpose is to make you aware of the pitfalls associated with Web data analysis so you can plan for or avoid them.
Here are five ways your data can be skewed by the very technology you seek to leverage.
1. AOL Proxy Servers
AOL proxy servers are a killer to traditional Web site analytics programs. If you want to learn more about them, go here, but here is a summary from AOL (my explanations italicized):
When a member [AOL user] requests multiple documents for multiple URLs [Web pages, PDFs, etc.], each request may come from a different proxy server [a different IP address]. Since one proxy server can have multiple members going to one site, Webmasters should not make assumptions about the relationship between members and proxy servers when designing their Web site.
Implication for your business:
If one AOL user views 10 pages on your site, your Web site analytics tool could be misled into thinking that 10 different users came to your site and each viewed only one page.
I know that when I see a lot of single page loads to the homepage, I start considering making minor changes to entice people to click further. Incorrect data can lead you to make incorrect decisions.
If you do not account for or fix the AOL proxy server issue, the reports that quantify the number of visitors (or unique users) have the potential to be highly inaccurate. Although many corporations and some ISPs use proxy servers, they do not pose a problem because the number of users coming through non-AOL proxy servers to Web sites is small. But because AOL drives up to 50% of the traffic on some sites that I analyze, it presents a huge problem.
2. Random Spiders
A spider is an automated program designed to gather information on Web pages. Most log file analysis tools recognize when Googlebot (Google), Scooter (AltaVista), Slurp (Inktomi) or any of the major search engine spiders visit a site. They know that the visitor viewing Web pages is actually an automated program and is not a legitimate user. Typically, most decent analysis tools automatically filter out data resulting from spiders.
This is good news! Using a tool that automatically recognizes and filters out automated spiders helps you get closer to reporting, analyzing and making decisions on the correct data.
Here's the bad news: Any programmer can create a spider and send it to your site. There are thousands of unknown spiders, and some of them are crawling your site, inflating your Web data even as you read this article. These unknown spiders usually get through the average Web analytics tool filters because those spiders don't identify themselves as such, instead appearing as regular users.
Email harvesters are an example of random spiders. Ever wonder how a spammer got your email address? A popular method is the email harvester, an automated program built to traverse the Web for email addresses so that they can be added a database.
Implication for your business:
What are the repercussions to your decision-making processes when a spider hits your site 1,000 times in 30 seconds and doesn't get filtered?
Well, think about it… if you just started an ad campaign or new marketing initiative, and suddenly a significant amount of traffic came to your site, you might attribute this success to your new ad campaign or marketing initiative. In actuality, the increases you saw in your data may have been the result of an unknown and unfiltered spider.
Even worse, if you assumed that your campaign was a success, you might extend the campaign and spend more of your budget on it, essentially throwing money out the window.
(If your Web analytics solution does not require reading log files but instead uses a small piece of JavaScript on each page, then you may not have this issue. Spiders typically can't read JavaScript and will not register in your Web site analysis reports.)
I recently spoke with a rep at IBM who works with its SurfAid analytics program. He explained that their software uses logic to automatically filter out automated spiders. If a user loads a given amount of pages in a certain period of time, then the user can be automatically filtered out. IBM's SurfAid team also keeps track of the growing list of spiders and updates the software to filter them out.
This is the first program I have seen that recognizes the importance of automatically filtering out suspicious activity that can lead to highly inaccurate data.
3. Frames
If your site is developed in frames, take your number of page loads and divide that number by three. That is how many pages may actually have been loaded on your site. A framed site typically loads three pages for every single page a user views on their screen. Frames tackle many problems with site development, but open up a slew of other issues with tracking and marketing your Web site.
Implication for your business:
Guess what the above scenario can do to your data… triple it! If your site (or a part of it) is developed in frames, then any data you have reported for the site (or a part of it) may be tripled.
4. Flash and Dynamic Sites
Flash is becoming increasingly popular as a Web site development tool. Take a look at my favorite flash site of all time: https://www.NeoStream.com. You might notice that as you move around the site, the URL bar (where you type in Web addresses) never changes.
Implication for your business:
Although the site is beautiful on the surface, developing a site in this manner will devastate your Web analytics initiatives. It will appear to your analytics tool that the entire site is composed of only one page.
No matter how many different pages a user views, it will always appear as if the homepage is being loaded over and over again. Fortunately, not all flash and dynamic sites are programmed in this manner, but many still are. For those that are, true analysis can be difficult and sometimes impossible.
Analyzing data for conversion metrics, ROI for various marketing campaigns, top entry points into your site, user paths through site, fall off rates and many other essential Internet business metrics are not usually possible without paying for additional programming changes to correct the one-page site dilemma.
5. Sharing Secure Certificates
When a user leaves the public area of your site and moves to a secure area (maybe where a credit card is processed), sometimes a very different, unique URL is used: (I don't mean a user that goes from https://www.mysite.com, but more like https://secure057.notmysite.net).
When secure transactions happen on someone else's server where you are “sharing” a secure certificate with others, you do not have access to that log data.
Implication for your business:
Once visitors start to buy a product or complete an application/lead, and at some point leave your site to go to the shared secure site, their activity starts getting tracked on that shared secure site. That means they will essentially “disappear” at some point in your site's log, giving you an incomplete view of user activity through one of the most important parts of your site: the conversion.
Getting this data from a shared hosting environment could prove costly, and may even be impossible, depending on how inflexible your Web site host is.
Next time: Solutions to the problems.