Mastering the Web
Contents
Website Planning Tutorial
Website Design Tutorial
HTML Tutorial
HTML Tables Tutorial
CGI Tutorial
JavaScript Tutorial
Perl Tutorials
CSS Tutorial
Installing a Web Server
Security Tutorial
HTML Cookies Tutorial
Web Tracking Tutorial
Download Free Programs
F.A.Q.

  1. Web Tracking Concepts
  2. Web Tracking Implementation

Download FWTLogstat1

Download FWTLogstat2

Web Tracking Concepts

When most people start building a Web site, they do not consider that some day they might want to track how users come to the site and what those users do once they get there. Tracking your site visitors can show you things like how many page-views you are getting, the number of persons that visit your site, and what browsers people is using. All that tracking data can be useful over the long term to find out if your site marketing is getting the job done, determine which areas of your site are successes (and which areas are not), watch how your relative browser-share is changing over time, and more. However, user tracking has its limitations that you must know when you study site traffic patterns.

Unfortunately, when most people start building a Web site, they do not consider they someday might want to track its traffic. But once a site is up and running, we want to know how many people are looking at our pages and how many pages each of those people is looking at. Then is when we discover that if we had spent more time thinking about setting up our site, tracking would be easier. Before you can decide what type of analysis you can do, you need to know what information is available. Unfortunately, there is not much tracking data you can collect, and what you can get is unreliable. Still, you can gain useful knowledge from what does exist.

Your Web server records information about every request it gets. The information available to you for each request includes:

  • Date and time of the request.
  • The IP address of the requester.
  • What is requested.
  • The server's response code.
  • The referrer (which URL sent this visitor to you).
  • The visitor's browser (also called "user agent").

As I mentioned before, the information you have available is inaccurate but not completely unreliable. Although this data is inexact, you can still use it to gain a better understanding of how people use your site.

Let us talk first about the difference between hits and page-views, something that confuses many people. A hit is any request for a file your server receives. This includes images, sound files, and anything else that may appear on a page. A page-view is a request for the document where all these things are contained. Therefore, you will always have mucho more hits than page-views.

The Web server records in a file called "log file" (or simply, "log"), the data about the requests it receives. An analyzer program, like FWTLogstat2, can tell you how many hits were received by your site, how many were successful and how many were not, how many were page-views and how many were not, etc.

To count page-views, the program needs some method of differentiating hits that are page-views from those that are not. The main factor taken into account is type of the file served (*.htm, *.html, *.gif, and so on)

Once you have determined which hits are page-views and which are not, you will want to count how many page-views each of your pages gets individually, and, if you split your site into areas, you may want to determine how many page-views each area gets. FWTLogstat2 can also give you directly the individual page counters, and if you have properly structured your site, the sector information. Assigning a different path (sub-directory) to each sector lets you get statistics about sectors with just the effort of setting an option.

A sound program to get statistics for your Web site should also address the following issues.

  • Page-views by time interval. You can look at how page-views change every hour in the day. This will tell you when people are accessing your site.
  • Page-views by referrer. This information can help you determine where to put your advertising dollars, or with which sites exchange links.
  • Page-views by visitor browser, and/or browser version.

Now let us talk about visitors, information that is not delivered by FWTLogstat2. This is because, unfortunately, counting visitors is more difficult than counting page-views.

First, let us get one thing clear: there is absolutely no way to count visitors reliably because you can never be sure who is visiting your site.

Basically, there is only one thing in the log file that you can utilize to track visitors: the IP address from where they are connecting to the server. To count visitors, you simply count the number of unique IP addresses in your logs. Unfortunately, easiest is not always best. This method is the most inaccurate one available to you. Most people connecting to the Internet get a different IP address every time they connect.

That is because ISPs (Internet service providers) assign addresses dynamically in order to use the limited block of IP addresses given to them more efficiently. When a customer connects, the ISP assigns them an IP address. In addition, when they disconnect, the ISP makes that IP address available to another customer.

For example, John connects at 8 a.m. and is given the IP address 152.163.199.42, visits your site, and disconnects. At 10 a.m., Mary connects and she is assigned the same IP address. She visits your site and then disconnects. Later, as you are tallying the unique IP addresses in your logs, you will unknowingly count Mary and John as one visitor. This method becomes increasingly inaccurate if you are examining data over longer time periods.

To overcome this problem, you can use cookies. Define a cookie that will have a unique value for every visitor. Let us call it a user identity (UI). If a person visits you without providing you with a UI (either because he has not visited your site before or because he has set his browser not to accept cookies), calculate a new value and send a cookie along with the page he requested. You will need to maintain a log of your own with the user visits. This approach is used by the program FWTLogstat1.

Nevertheless, there are still a couple of issues that need a discussion. First, many people turn off their cookies, so that cookies will not account for the whole of your visitors. Second, cookies can be deleted, so it is possible that a person who visits your site at 8 a.m. will no longer have your cookie when they return at 9 a.m.

Third, when your Web server sends a cookie to a visitor, it is stored on the visitor's machine – so if a person visits your site from home in the morning and visits again from work using another PC, you will log two different cookies. Moreover, four, the contrary can also occur: several people may use the same machine, in which case you will see only one cookie for all of them.

We should also mention that certain proxy servers might handle cookies in a weird manner. It is possible that a given proxy server will not deliver cookies to the user's machine. Or it might not deliver the correct cookie to the user's machine (it might even deliver some other cookie from its cache). Or it might not send the user's cookie back to your Web server.

All the same, cookies are the safest way to count visitors, but you must always correlate them with statistics obtained from the server log in an attempt to determine how much of an undercount results.

Troubles with Tracking

Even the best tracking technique has its limitations. I will show here what they are so that you know what problems you will be facing, as well as some possible workarounds. The number of page-views you count is not the actual number of page-views of your site. "How can this be?" you ask. "I'm simply counting records in my Web server's access log." Well, the fact is many requests never make it to your access log.

First, browsers have caches. If a person requests a page from your site and soon requests it again, the browser may not go back to your server to request the page a second time. Instead, it may simply retrieve it from its cache. And you would never know. You can try using "expires" or "no cache" tags to stop browsers from caching your pages, but you can never be sure if your tags are read or not.

Second, let us say that a user's browser does not retrieve your page from its cache but actually repeats the requests to the server. Many ISPs use proxy servers, and proxy servers cache pages just like browsers. If a person using an ISP with a proxy server makes a request, the proxy server first checks its cache. If the page is there, it serves that page to the person, instead of going to your server. And you would also never know.

Again, you can try using the tags I have described above, but there is no guarantee that the proxy server will respect your tags.

Another tracking obstacle is robots, or spiders (affectionately called "bots"). These software programs travel by the Web, generally indexing pages for the search engines. Do you mind if your page-view counts include hits from bots? If you do mind, then you had better find a way to ignore these hits. You can create a list of IP addresses to ignore, but with new bots born every day, the list will always be a step behind. Similarly, you can use the requester's user-agent string. This approach is used by the program FWTLogstat2, which permits you to delete from your logs the entries produced by the robots that are in a list. However, there is nothing keeping the robot's creator from sending any string that pleases him. The program FWTLogstat2 takes in consideration only those user agent names that are seen requesting the special file 'robots.txt'. Yet, this presumes that the robot follows a protocol that may not be observed.

Counting Visitors

So, if you are not able to accurately record every single request, of course you cannot get a full count of your site's visitors. But that is not your only problem.

Bots can also wreak havoc in this situation. If one or more bots hit you, your visitor numbers will not be affected much. But if you calculate page-views per visitor and you ignore bots, your numbers may be skewed.

A browser can send your Web server any user-agent string it wants, so whatever reporting you do based on these numbers is a matter of trust. As there have been problems of compatibility between browsers, there are many pages that make a test to determine which browser is requesting them. On the other hand, to counteract this, browsers permit users to change the user agent string they use. Of course, if one browser cache is better than another's one, the number of page-views you see from the former will be lower than the latter.

Marketers and advertisers love the concept of the visit, i.e., how long a person stays at a site before moving on. Yet this number is impossible to determine using HTTP. Let us say I request a page from site A at noon. Then I request another page from site A at 12:19 p.m. How long was my visit? You can never know for sure. It is possible that I stared at the first page for the full 19 minutes. But I may just as easily have opened another browser window and read the newspaper for the duration of those 19 minutes. Or, I may have gone for a walk. According to the Internet Advertising Bureau, a visit is "a series of page requests by a visitor without 30 consecutive minutes of inactivity." If you are happy with this definition, use it to inform the length of your users' visits.

Even if every single piece of this puzzle is confusing, you can assemble a picture of your site traffic. It will not be perfect, but it will provide you with enough information to get an idea of how you are doing and how you can build a better site.

Previous | Contents | Next

| HOME | FEEDBACK | BOOKMARK |
Build your Website
© 1999-2008 Hector Castro -- All rights reserved

If your doubt is not answered in this site, please use the
contact form .
I'll answer as soon as posible.
I can help you using instant messaging. To schedule a meeting, please use the
meeting form.
You will find the late news about the free programs offered here on my blog
Free Webmaster Tools
You can get news about updates to my free programs through this
RSS feed.

www.great-web-info.com