Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Processing Text with Perl Modules - Page 11

September 24, 2001

In the previous article, we learned how to use Perl's built-in routines to perform many common text manipulation function. In the final article of this series on text processing, we will take a tour through a cornucopia of useful text processing modules that will kick the tar out of some of those arduous text processing tasks.

The Power of CPAN

The Comprehensive Perl Archive Network is a group of servers around the world that provide access to the Perl source code and hundred of Perl modules that have been contributed by volunteers. CPAN is one of the things I imagine other language authors wish they had for their respective languages (like Java) but don't. Fortunately for us, pre-built modules that bundle up the code and logic for performing many common tasks are freely available for the taking. See the list of resources on the last page of this article for a list of resources.

Installing Modules

Part of what makes CPAN powerful is the fact that Perl supports it directly with the CPAN.pm module, which has been distributed with the Perl source code for several years now. The module is capable of searching for, downloading, and installing modules directly from CPAN. It will even handle module dependencies where the module you're trying to install requires other modules from CPAN before it can be installed.

On most operating systems, you can install a CPAN module by typing:

perl -MCPAN -e 'install HTML::Parser'

where HTML::Parser is the name of the module you wish to install. This will automatically find, download, compile, and install the module onto your system.

If you are using Activestate Perl and the module you are installing is available in Activestate's repository, you can type: ppm install GD

PPM is a command-line utility that is only available if you are using Activestate Perl. Note that not all Perl modules from CPAN are available to PPM. So if you're running Activestate Perl on a win32 platform, you will also need to have Visual C++ and nmake installed on your system to load modules from CPAN that are not available to PPM.

Making Text HTML Safe

I'm sure most of you have had at least one occasion where you needed to effectively cut and paste a text file into an HTML file. If that text file contained any reserved characters like & or <, you probably had to convert them to HTML-safe entities such as &lt; for < by hand. Or maybe you haven't fixed the text and you now have an invalid HTML document out there on your Web site.

Well, if you find yourself doing this hand tuning on a regular basis or if you're routinely posting text into HTML files without checking to see if it's HTML safe, stop; because CPAN has a module called HTML::Entities which does all of the work for you.

The module contains a function appropriately named encode_entities() that automatically encodes all HTML reserved characters. So for example, if you have a string of text that's contained in a variable named $text that needs to be HTML encoded, you would first add the statement: use HTML::Entities to the top of your script and then type:

encode_entities($text);

somewhere in the main body of your source code. So if $text contained the string "Fred & Barney's Bowling Academy", it would be converted into "Fred &amp; Barney's Bowling Academy".

We could also build a simple script that converts an entire file such that we can execute the following on the command-line:

html_encode.pl < sample.txt > newtext.txt

Or in plain english, we direct a text file called sample.txt to the script as input and write the resulting encoded text to newtext.txt. The source of the script would look like the following:

#!/usr/bin/perl -w
use strict;
use HTML::Entities;

while (<>) {
    encode_entities($_);
    print;
}

Sending Bulk E-mails - Page 10
Weaving Magic With Regular Expressions
Encrypting Text with RC4 - Page 12


Up to => Home / Authoring / Languages / Perl / Weave




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers