Tracking Web Site Visitors

by Greg Jorgensen, principal consultant, PDXperts LLC
for
Computer Bits magazine, December 2002 issue


If you have a website you probably want to know how many people visit your site, what pages they look at, and how long they browse. If your site offers searching, you may want to see what your visitors search for, and how often their searches succeed (or fail). If you sell on the web you certainly want to know how many visitors started to buy something but didn't finish the purchase process, and how far they got before giving up.

Web Server Logs And Log Analyzers
You can learn a lot about your site's visitors from the web server logs. Web servers such as Apache and Microsoft's IIS log every request sent to the server. Requests include the client's IP address, date and time of the request, the request string (URL), and server status code. Often the request includes the client's browser type & version (called the user agent) and their operating system name & version. Web servers can usually log the referring URL, which tells you how someone got to your site (i.e. from a search engine).

Log analyzers process the raw web server log files into summaries and statistics for a person to read. Even simple log analyzers gather a lot of good information from log files: number of visitors, average time spent on the site, most- and least-popular pages, entry pages, browser type and version totals, which search engines or other sites send people to your site, errors visitors encountered, and even hacking attempts. Most log analyzers present results as tables of numbers and graphs; some produce fancy 3-D graphics and Adobe Acrobat files. At the end of this article you'll find a list of free and commercial log analyzers.

If you host your web site at an ISP they may offer some log analysis tools. Your ISP should give you some way to download your "raw" log files so you can use your own analyzer. If you run your own web server you'll probably find web server logging turned on by default. Both Apache and IIS allow you to configure the log file format and where the log files go. Log files usually rotate periodically: the current log file gets renamed or archived and the web server continues logging into an empty file. Your ISP or system administrator can tell you how often the log files rotate and how long they retain the rotated files. Log analyzers can accept multiple log files as input, so you can process your log files using a date range, then save the results daily or weekly; you don't have to keep the raw log files forever.

You can easily read too much into log file analysis reports, especially if the graphs look pretty and the numbers seem impressive. Before dashing off copies for a board meeting, check that your analysis makes sense. Did you exclude your own IP addresses? You probably don't want to include yourself or your employees as visitors. Did you use the right date range? Did you exclude extreme values from the averages? A visitor who went to a meeting in the middle of surfing your site will skew the average time spent. Did you account for robots and spiders (the automatic programs that index your site for search engines)? Most log analyzers can identify robots, though rogue robots harvesting email addresses or MP3 files don't play nice. If any of the statistics look wrong do some checking; mistakes in the log analyzer settings can introduce artifacts in your reports.

Setting up a log analyzer and correctly configuring it requires some study and experimentation. Some analyzers offer a boggling number of options. If your ISP or system administrator can't help consult someone who can; you shouldn't make decisions about your web site if you don't trust your log analysis.

Click Paths
As visitors go through your site, clicking links and buttons, they create trails of page views called click paths. Studying click paths can tell you what parts of your site visitors look at, and what parts they don't. Click paths can alert you to confusing navigation and dead ends. If your site sells products or asks for registration you can see if visitors bail out during the purchase or registration process (called abandoned carts in e-commerce jargon).

Log analyzers show high-level click path statistics: how many people hit your login page versus how many got to the next page. The more sophisticated log analyzers can show the most-travelled click paths and even let your drill down to study individual click paths.

Limits Of Log Analysis
The log file limits what a log analyzer can tell you. For example, you can see that lots of visitors use your search page, but you don't know what they searched for, or how many searches found something. You can see that customers abandon their cart during the checkout process, but you don't know what they had in their cart, or how much those abandoned carts represent in lost sales. The log file doesn't include the data you need.

Some factors beyond your control limit log file analysis. Broadband (cable and DSL) providers and large online services, such as including America Online, share a relatively small number of IP addresses among a lot of users (AOL claims 16 million users). Businesses and even home users put multiple computers on a single IP address using Network Address Translation (NAT). Caching and proxy servers installed to improve performance distort request totals and click paths. Some log analyzers account for shared IP addresses by looking at the request date and time along with the IP address to identify unique visitors, but as the online population grows identifying visitors purely through log analysis becomes less reliable.

Privacy concerns also affect the accuracy of log file information. Some web browsers let the user suppress the user agent and operating system information (or change it to whatever they like). Users may use software to strip referrer information from their requests, or their company or ISP may do it for them with a proxy server.

Tracking Visitors In Your Code
You can supplement log file analysis with more and better information by programming your site to track and log visitor activity. Whatever web programming technology you use—Active Server Pages (ASP), PHP, ColdFusion, Java Server Pages (JSP), Perl—you already have the tools you need. You need to add three features to your server-side code:

You have to assign a unique identifier to each visitor so you can track them across requests. The visitor passes the identifier to your code in a browser cookie or as part of the request URL. Many techniques exist for assigning and propagating unique identifiers, but you probably don't have to do it yourself. Session management comes built-in with the most common server-side web programming tools. In ASP you use the Session object. PHP and ColdFusion have automatic cookie- and URL-based session management. The details vary but you probably won't have to add a lot of code to enable session management if you don't use it already.

Once you have session management working you can track almost anything you want. For example, you can keep track of which pages in the checkout or registration process the customer got to and what the had in their shopping cart when they got there. You can keep track of which of your PDFs they downloaded. And you can log the exact click path by user without worrying about mix-ups caused by shared IP addresses.

You need to log all of the information you collect to a database. You may already have a database server such as SQL Server, Oracle, or MySQL, but if not you can install the free open source MySQL database. The database lets you store information in a structured format designed for querying in various ways. For example, a table containing session ID, a date/time stamp, and a page name (or URL) can log every visitor's click path. Add a column to hold the number of items in the shopping cart at every page view, and you have very detailed tracking of shopping habits and abandoned carts.

If you offer a search feature on your site you should log search requests and number of matches. Seeing what your visitors actually search for can reveal unexpected search terms or frustrated attempts to find something on your site.

Implementing these techniques requires programming expertise and experience with web applications. You may not know how to write the code yourself, but now you know what information your site can collect and report.

Log Analyzers
I can't list many log analyzers here, just a few of the most popular. Yahoo! has a long list of free and commercial products.

Analog: Popular, fairly easy to set up, fast, good overall reports but limited clickpath information. Free. www.analog.cx

Webalizer: Simple, fast. Free. www.mrunix.net/webalizer

Summary: Lots of attractive reports, including some no other analyzer includes. $59 and up. www.summary.net

Urchin: Popular commercial analyzer. Free version available, $695 and up. www.urchin.com

Sawmill: Powerful, friendly, excellent reports. $99 and up. www.sawmill.net

FunnelWeb: Easy to use, fast and thorough, great reports. Free version available, commercial version $995. www.quest.com/funnel_web/analyzer

WebTrends: The high-end standard. Many options and features, including real-time reporting. Pricing varies. www.netiq.com/webtrends