How Log Files Work
August 2, 1999
Every time a file is retrieved from a Web site, the server
software keeps a record of it (assuming that logging is
turned on). The server stores this information in text files,
(usually with a .txt or .log extension), called the Access
Log, Error Log and Referrer Log. The log files contain not
only a record of which pages were requested at which times,
but a good bit of information about the people (or other
entities) that requested them.
As you can imagine, log files can get huge very quickly, and
take up an enormous amount of expensive hard drive space at
your hosting service. Therefore, most Web servers are set up
to "rotate" or "cycle" the log files in some way, to make
sure that all the files get saved, but that they don't hang
around on the server. A simple way to do this is to have
the server automatically email a copy of the log files to
somebody periodically. This lucky individual transfers them
to some permanent storage location, and the server
automatically purges the original log files after a certain
amount of time.
If you want to have decent stats for your site, be careful
about keeping your log files organized. It's a pain in the
neck, but worth it - any gap in your data can screw up your
reports, and once it's lost it's lost.
The wealth of data in the log files is not readily mined with
the naked eye. A raw log file entry looks something like this:
206.135.203.174 - - [19/Jul/1999:00:00:04 -0600] "GET
/studio/drives.html HTTP/1.1" 200 20607
"http://www.webdevelopersjournal.com/studio/hard.html" "Mozilla/4.0
(compatible; MSIE 5.0; Windows 98; DigExt)"
As you can see, this entry shows what page was requested,
when it was requested, where the visitor came from, and even
what browser and OS they were running. As I'm sure you can
also see, you won't learn much of interest just by looking at
the raw log files. There's page after page of this stuff.
To get the most out of the data, you need to be able to see
totals for the whole site, and compare the figures over time.
That's where a log analysis software package comes in. These
handy tools range from Getstats (a free Unix program that can
run on your Web server) to various cheap shareware options, to
industrial-strength packages like
Marketwave Hit List Pro 4.0 ($395 list) or
WebTrends Log Analyzer 4.52 ($399 list).
Basic tools like Getstats can give you almost as much
information as the pricey packages, but customization options
are limited, and results are presented in plain text format.
If you want pretty pictures and graphs for the marketing
department, you'll need something like Hit List or WebTrends.
For a comparative review of these two packages, see a
review from
Web Developers
Journal.
There's gold in them there log files!
Mining that Data
|