Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Repetition - Page 9

March 9, 2001

We've now moved from matching a specific character to a more general type of character - when we don't know (or don't care) exactly what the character will be. Now we're going to see what happens when we want to talk about a more general quantity of characters: more than three digits in a row; two to four capital letters, and so on. The metacharacters that we use to deal with a number of characters in a row are called quantifiers.

Indefinite Repetition

The easiest of these is the question mark. It should suggest uncertainty - something may be there, or it may not. That's exactly what it does: stating that the immediately preceding character(s) - or metacharacter(s) - may appear once, or not at all. It's a good way of saying that a particular character or group is optional. To match the word 'he or she', you can put:

> perl matchtest.plx
Enter some text to find: \bs?he\b
The text matches the pattern '\bs?he\b'.
 >

To make a series of characters (or metacharacters) optional, group them in parentheses as before. Did he say 'what the Entish is' or 'what the Entish word is'? Either will do:

> perl matchtest.plx
Enter some text to find: what the Entish (word )?is
The text matches the pattern 'what the Entish (word )?is'.
>

Notice that we had to put the space inside the group: otherwise we end up with two spaces between 'Entish' and 'is', whereas our text only has one:

> perl matchtest.plx
Enter some text to find: what the Entish (word)? is
'what the Entish (word)? is' was not found.
 >

As well as matching something one or zero times, you can match something one or more times. We do this with the plus sign - to match an entire word without specifying how long it should be, you can say:

> perl matchtest.plx
Enter some text to find: \b\w+\b
The text matches the pattern '\b\w+\b'.
 >

In this case, we match the first available word - I.

If, on the other hand, you have something which may be there any number of times but might not be there at all - zero or one or many - you need what's called 'Kleene's star': the * quantifier. So, to find a capital letter after any - but possibly no - spaces at the start of the string, what would you do? The start of the string, then any number of whitespace characters, then a capital:

> perl matchtest.plx
Enter some text to find: ^\s*[A-Z]
'^\s*[A-Z]' was not found.
  
>

Of course, our test string begins with a quote, so the above pattern won't match, but, sure enough, if you take away that first quote, the pattern will match fine. Let's review the three qualifiers:

/bea?t/ Matches either 'beat' or 'bet'
/bea+t/ Matches 'beat', 'beaat', 'beaaat'…
/bea*t/ Matches 'bet', 'beat', 'beaat'…

Novice Perl programmers tend to go to town on combinations of dot and star, and the results often surprise them, particularly when it comes to searching-and-replacing. We'll explain the rules of the regular expression matcher shortly, but bear the following in mind:

A regular expression should hardly ever start or finish with a starred character.

You should also consider the fact that .* and .+ in the middle of a regular expression will match as much of your string as they possibly can. We'll look more at this 'greedy' behavior later on.

Posix and Unicode Classes - Page 8
Beginning Perl
Well-Defined Repetition - Page 10


Up to => Home / Authoring / Languages / Perl / BeginningPerl




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers