Weaving Magic With Regular Expressions
July 16, 2001
|
The Web is made up of large bodies of text. Manipulating,
managing, and organizing these volumes of information is one of
the more complex jobs of a Webmaster. Something as simple as
updating copyrights and dates across a site can be time
consuming. HTML editors like
Dreamweaver and content management systems like
eGrail have made Web site management easier, but
there are still plenty of small repetitive text replacement tasks
that can be simplified or automated with
Perl.
This is the first article in a series that will show how to
leverage Perl's extraordinary text manipulation capabilities to
save time and make you more effective in managing the
complexities of your Web site.
|
Introduction
First I'd like to introduce myself. I've been working with Perl
for the last six years or so. First as a Unix systems
administrator, then as a Webmaster and later in Web application
development, systems automation, and publishing. I'd like to
think of myself as an experienced Perl programmer, but I'm always
learning new things almost every day. Perl has helped me solve
most of my computing problems and has probably been my most
prized programming and automation tool. One of the neat things
about writing about Perl is that many of you, the readers, have
something that you can teach me. I would like this column to be a
learning experience for both of us. If you have a practical
solution that you would like to share through this column, please
feel free to send me an e-mail. If you don't quite understand
something that I've said, please let me know. I want to make sure
that WDVL is a valuable
resource that you can draw from to solve the problems that you
encounter on a regular basis. If you would like to know more
about me, read through my
bio
and then feel free to send an email to
eisen@pobox.com.
Perl and Text
Perl seems to have a special relationship with text processing.
This probably comes from its author's background in linguistics.
In short, Perl was built to process text. This is probably why it
has flourished for Web programming, systems administration, and
publishing. There are three primary mechanisms in Perl for
processing text. The first is the Perl regular expression engine.
This is a special pattern matching language that is based on
sed and awk, two text processing tools that have
been part of Unix for many years. Perl regular expressions are
very efficient and very powerful.
The second text processing mechanism in Perl is the text
processing functions that are built into Perl. Many of these
functions are unique and do not exist in other popular languages.
Some of these functions include split(),
shift(), pop(), chomp(),
join, and slice(). These functions
along with Perl's use of dynamic strings, hashes, and arrays,
have saved me countless hours over the years.
The last text processing mechanism in Perl is external modules
that can be loaded dynamically. Many of these modules themselves
are built using regular expressions and Perl functions. Examples
include HTML::Parser and
Parse::RecDescent. Modules are usually easier to use
than regular expressions and are typically built to solve a
particular text processing problem, such as parsing HTML files.
In this series, we will learn how to use all three mechanisms for
solving common text processing problems on the Web. In this
article we will focus on some basic regular expression techniques
for replacing text strings in files.
Regular Expressions Introduced - Page 2
|