Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Weaving Magic With Regular Expressions

July 16, 2001

The Web is made up of large bodies of text. Manipulating, managing, and organizing these volumes of information is one of the more complex jobs of a Webmaster. Something as simple as updating copyrights and dates across a site can be time consuming. HTML editors like Dreamweaver and content management systems like eGrail have made Web site management easier, but there are still plenty of small repetitive text replacement tasks that can be simplified or automated with Perl. This is the first article in a series that will show how to leverage Perl's extraordinary text manipulation capabilities to save time and make you more effective in managing the complexities of your Web site.

Introduction

First I'd like to introduce myself. I've been working with Perl for the last six years or so. First as a Unix systems administrator, then as a Webmaster and later in Web application development, systems automation, and publishing. I'd like to think of myself as an experienced Perl programmer, but I'm always learning new things almost every day. Perl has helped me solve most of my computing problems and has probably been my most prized programming and automation tool. One of the neat things about writing about Perl is that many of you, the readers, have something that you can teach me. I would like this column to be a learning experience for both of us. If you have a practical solution that you would like to share through this column, please feel free to send me an e-mail. If you don't quite understand something that I've said, please let me know. I want to make sure that WDVL is a valuable resource that you can draw from to solve the problems that you encounter on a regular basis. If you would like to know more about me, read through my bio and then feel free to send an email to eisen@pobox.com.

Perl and Text

Perl seems to have a special relationship with text processing. This probably comes from its author's background in linguistics. In short, Perl was built to process text. This is probably why it has flourished for Web programming, systems administration, and publishing. There are three primary mechanisms in Perl for processing text. The first is the Perl regular expression engine. This is a special pattern matching language that is based on sed and awk, two text processing tools that have been part of Unix for many years. Perl regular expressions are very efficient and very powerful.

The second text processing mechanism in Perl is the text processing functions that are built into Perl. Many of these functions are unique and do not exist in other popular languages. Some of these functions include split(), shift(), pop(), chomp(), join, and slice(). These functions along with Perl's use of dynamic strings, hashes, and arrays, have saved me countless hours over the years.

The last text processing mechanism in Perl is external modules that can be loaded dynamically. Many of these modules themselves are built using regular expressions and Perl functions. Examples include HTML::Parser and Parse::RecDescent. Modules are usually easier to use than regular expressions and are typically built to solve a particular text processing problem, such as parsing HTML files.

In this series, we will learn how to use all three mechanisms for solving common text processing problems on the Web. In this article we will focus on some basic regular expression techniques for replacing text strings in files.

Regular Expressions Introduced - Page 2


Up to => Home / Authoring / Languages / Perl




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers