Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Different Log File Formats - Page 6

December 11, 2001

It's fairly easy to modify this script to accept either the common or the extended log format. We do that by adding a configuration variable near the top of the script that looks like this:

my $log_format = 'common'; # 'common' or 'extended'

Then we modify the part of the script where the regular expression parsing occurs to include some logic to check that $log_format variable, along with a second version of the regular expression to be used on logs that are in the extended format:

  if ($log_format eq 'common') {

    ($host, $ident_user, $auth_user, $date, $time,
      $time_zone, $method, $url, $protocol, $status, $bytes) =
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
 (\S+)" (\S+) (\S+)$/
            or next;

  } elsif ($log_format eq 'extended') {

    ($host, $ident_user, $auth_user, $date, $time,
      $time_zone, $method, $url, $protocol, $status, $bytes,
      $referer, $agent) =
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\]
 "(\S+) (.+?) (\S+)" (\S+) (\S+) "([^"]+)" "([^"]+)"$/
      or next;
  } else {
    die "unrecognized log format '$log_format'";
  }

I think this probably qualifies as the ugliest block of code in this entire book. This is not the sort of code that anybody wants to have to make sense of more than once, but fortunately, once we get it right, we aren't likely to need to modify it. Anyway, you'll notice that the new regular expression for extended-format logs has a couple of new chunks at the end, both of which look like "([^"]+)". By now that should be an easy one for you: it means "match a literal double quote, then capture one or more characters that are anything but a double quote, then match another literal double quote." These two new chunks capture into the new $referer and $agent variables that we've added at the end of the parenthetical list being assigned to. We've also added an else block, which just does a quick sanity check, dying with an error message if the $log_format variable was inadvertently set to an unexpected value. You may have noticed that there is no my declaration before the list of variables in either branch of the if- elsif construct. That's because declaring those variables as my variables here, inside the curly braces of the if-elsif block, would limit their visibility later on in the while block, where they need to be visible. As a result, we've moved the my declaration for the variables above the if-elsif construct, just after the while (<>) line:

my ($host, $ident_user, $auth_user, $date, $time,
  $time_zone, $method, $url, $protocol, $status, $bytes,
  $referer, $agent);

We should also add the $referer and $agent variables to the list of variables that the debugging print statement should print out. This will give us some extra blank lines in the output if our log file is actually in the common format, but that print statement is just a quick debugging tool anyway; the real output that the script produces later will be implemented more intelligently:

print join "\n", $host, $ident_user, $auth_user, $date, $time,
  $time_zone, $method, $url, $protocol, $status,
  $bytes, $referer, $agent, "\n";

The Mammoth Regular Expression (con't) - Page 5
Perl for Web Site Management
Different Log File Formats (con't) - Page 6


Up to => Home / Authoring / Languages / Perl / Manage




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers