Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Try it out: Rhyming Dictionary - Page 6

February 23, 2001

Let's see one more example of this, where we'll combine looking for matches with looking through the lines in a file:

Imagine yourself as a poor poet. In fact, not just poor, but downright bad - so bad, you can't even think of a rhyme for 'pink'. So, what do you do? You do what every sensible poet does in this situation, and you write the following Perl program:

#!/usr/bin/perl
# rhyming.plx
use warnings;
use strict;
my $syllable = "ink";
while (<>) {
print if /$syllable$/;
}

We can now feed it a file of words, and find those that end in 'ink':

>perl rhyming.plx wordlist.txt
blink
bobolink
brink
chink
clink
>

For a really thorough result, you'll need to use a file containing every word in the dictionary - be prepared to wait though if you do! For the sake of the example however, any text- based file will do (though it'll help if it's in English). A bobolink, in case you're wondering, is a migratory American songbird, otherwise known as a ricebird or reedbird.

How It Works

With the loops and tests we learned in the last chapter, this program is really very easy:

while (<>) { print if /$syllable$/;}

We've not looked at file access yet, so you may not be familiar with the while(<>){...} construction used here. In this example it opens a file that's been specified on the command line, and loops through it, one line at a time, feeding each one into the special variable $_ - this is what we'll be matching.

Once each line of the file has been fed into $_, we test to see if it matches the pattern, which is our syllable, 'ink', anchored to the end of the line (with $ ). If so, we print it out.

The important thing to note here is that perl treats the 'ink' as the last thing on the line, even though there is a new line at the end of $_. Regular expressions typically ignore the last new line in a string - we'll look at this behavior in more detail later.

Shortcuts and Options

All this is all very well if we know exactly what it is we're trying to find, but finding patterns means more than just locating exact pieces of text. We may want to find a three-digit number, the first word on the line, four or more letters all in capitals, and so on.

We can begin to do this using character classes - these aren't just single characters, but something that signifies that any one of a set of characters is acceptable. To specify this, we put the characters we consider acceptable inside square brackets. Let's go back to our matchtest program, using the same test string:

$_ = q("I wonder what the Entish is for 'yes' and 'no'," he thought.);

> perl matchtest.plx
Enter some text to find: w[aoi]nder
The text matches the pattern 'w[aoi]nder'.
>

What have we done? We've tested whether the string contains a 'w', followed by either an 'a', an 'o', or an 'i', followed by 'nder'; in effect, we're looking for either of 'wander', 'wonder', or 'winder'. Since the string contains 'wonder', the pattern is matched.

Conversely, we can say that everything is acceptable except a given sequence of characters - we can 'negate the character class'. To do this, the character class should start with a ^, like so:

> perl matchtest.plx
Enter some text to find: th[^eo]
'th[^eo]' was not found.
>

So, we're looking for 'th' followed by something that is neither an 'e' or an 'o'. But all we have is 'the' and 'thought', so this pattern does not match.

If the characters you wish to match form a sequence in the character set you're using - ASCII or Unicode, depending on your perl version - you can use a hyphen to specify a range of characters, rather than spelling out the entire range. For instance, the numerals can be represented by the character class [0-9]. A lower case letter can be matched with [a-z]. Are there any numbers in our quote?

> perl matchtest.plx
Enter some text to find: [0-9]
'[0-9]' was not found.
>

You can use one or more of these ranges alongside other characters in a character class, so long as they stay inside the brackets. If you wanted to match a digit and then a letter from 'A' to 'F', you would say [0-9][A-F]. However, to match a single hexadecimal digit, you would write [0-9A- F] or [0-9A-Fa-f] if you wished to include lower-case letters.

Escaping Special Characters - Page 5
Beginning Perl
Reoccurring Character Classes - Page 7


Up to => Home / Authoring / Languages / Perl / BeginningPerl




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers