Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


The Log-Analysis Script - Page 4

December 11, 2001

Now that the hostname lookups are taken care of, it's time to write the log-analysis script. Example 8-2 shows the first version of that script.

Example 8-2: log_report.plx, a web log-analysis script (first version)

#!/usr/bin/perl -w

# log_report.plx

# report on web visitors

use strict;

while (<>) {
  my ($host, $ident_user, $auth_user, $date, $time,
    $time_zone, $method, $url, $protocol, $status, $bytes) =
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
 (\S+)" (\S+) (\S+)$/;

  print join "\n", $host, $ident_user, $auth_user, $date, $time,
    $time_zone, $method, $url, $protocol, $status,
    $bytes, "\n";
}

This first version of the script is simple. All it does is read in lines via the <> operator, parse those lines into their component pieces, and then print out the parsed elements for debugging purposes. The line that does the printing out is interesting, in that it uses Perl's join function, which you haven't seen before. The join function is the polar opposite, so to speak, of the split function: it lets you specify a string (in its first argument) that will be used to join the list comprising the rest of its arguments into a scalar. In other words, the Perl expression join '-', 'a', 'b', 'c' would return the string a-b-c. And in this case, using \n to join the various elements parsed by our script lets us print out a newline-separated list of those parsed items.

The Mammoth Regular Expression

The real juicy part of this script, though, is that giant regular expression used to parse each log file line into its component parts. Here's that part of the script again:

  my ($host, $ident_user, $auth_user, $date, $time,
    $time_zone, $method, $url, $protocol, $status, $bytes) =
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
 (\S+)" (\S+) (\S+)$/;

There are a couple of important things to note here. The first is that it is actually fairly tricky to represent this regular expression, which is meant to be on a single line, within the limited width of this book's pages. It's particularly tricky in this case because the spaces between the various elements are important, but it's hard to keep track of those spaces when the expression is broken to fit onto multiple lines. If you are going to test this script yourself, be sure that your version of the expression is all on one line, with a single space character between the right parenthesis that ends the first line and the begin parenthesis that begins the second line. (Or you can just download the example from the book's web site, at http://www.elanus.net/book/, since the downloadable example doesn't feature those problematic line breaks.) You also can refer to the version of this expression created using the /x modifier, which is described in the accompanying sidebar, "Regular Expression Extensions," and use that version instead of the one-line version given here.

Regular Expression Extensions

Putting the /x modifier at the end of a regular expression lets you use regular expression "extensions." This means that you can put whitespace characters (like spaces, tabs, and newlines) into the expression, and they will be ignored by Perl when trying to make a match. (The one exception to this is inside a square- bracketed character class, where literal whitespace characters will still count.) To get a literal whitespace character outside a character class you need to precede it by a backslash. Also, you can embed comments in the expression by preceding them with the hash symbol (#), just like you can with regular Perl statements. The idea is that you can break your expression across multiple lines and use indenting and comments in an effort to make it more easily understood.

With a substitution expression, by the way, the /x modifier applies only to the search pattern (the first half of the expression). The replacement part (the second half) still treats whitespace and the # sign as literal characters.

Here's how you might use the /x modifier to represent the regular expression in Example 8-2:

my ($host, $ident_user, $auth_user, $date, $time,
     $time_zone, $method, $url, $protocol, $status,
     $bytes) =

     /             # regexp begins
     ^             # beginning-of-string anchor
     (\S+)         # assigned to $host
     \             # literal space
     (\S+)         # assigned to $ident_user
     \             # literal space
     (\S+)         # assigned to $auth_user
     \             # literal space
     \[([^:]+)     # assigned to $date
     :             # literal :
     (\d+:\d+:\d+) # assigned to $time
     \             # literal space
     ([^\]]+)      # assigned to $time_zone
     \]\ "         # literal string '] "'
     (\S+)         # assigned to $method
     \             # literal space
     (.+?)         # assigned to $url
     \             # literal space
     (\S+)         # assigned to $protocol
     "\            # literal string '" '
     (\S+)         # assigned to $status
     \             # literal space
     (\S+)         # assigned to $bytes
     $             # end-of-string anchor
     /x;           # regexp ends, with x modifier

Converting IP Addresses (con't) - Page 3
Perl for Web Site Management
The Mammoth Regular Expression (con't) - Page 5


Up to => Home / Authoring / Languages / Perl / Manage




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers