Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Lookaheads and Lookbehinds - Page 18

April 6, 2001

Sometimes you may want to say something along the lines of 'substitute the word "fish" with "cream", but only if the next word is "cake".' You can do this very simply by saying:

s/fish cake/cream cake/

What does this do? The regular expression engine scans a referenced string, looking for a match on "fish cake" On finding one, it substitutes the text "cream cake". Not too bad - it does the job. In this case it's not too big a deal that it has to substitute five characters from each match with five identical characters from the substitution string. It's not hard to see how this sort of inefficiency could really start to bog a program down if we used substitutions excessively.

What we want is a way of putting an assertion into the match - a 'match the text only if the next word is "cake"' clause - without actually matching the assertion itself. Having matched "fish", we really just want to look ahead , to see if it says " cake" (and give the match a thumbs-up if it does), then forget about "cake" altogether.

In life, that's not so easy. Fortunately in Perl we have an operator for just this sort of thing:

/fish(?= cake)/

will match exactly what we want - it looks for "fish", does a positive lookahead on " cake", and matches "fish" only if that succeeds. For example:

#!/usr/bin/perl
# look1.plx
use warnings;
use strict;
$_ = "fish cake and fish pie";
print "Our original order was ", $_, "\n";
s/fish(?= cake)/cream/;
print "Actually, make that ", $_, " instead.\n";

will return

>perl look1.plx

Our original order was fish cake and fish pie
Actually, make that cream cake and fish pie instead.
>

We can also look ahead negatively, by using an exclamation mark instead of the equals sign:

/fish(?! cake)/

which will match "fish" only if the following word is not " cake". If we adapt look1.plx like so:

#!/usr/bin/perl
# look2.plx
use warnings;use strict;
$_ = "fish cake and fish pie";
  print "Our original order was ", $_, "\n";
s/fish(?! cake)/cream/;
print "Actually, make that ", $_, " instead.\n";

[Lines 4 and 5 above are one line. They have been split for formatting purposes.]

then sure enough, it's "fish pie" that gets matched this time and not "fish cake".

>perl look2.plx

Our original order was fish cake and fish pie
Actually, make that fish cake and cream pie instead.
>

Lookaheads are very powerful as you'll soon discover if you experiment a little, particularly when you start to use less specific expressions (using metacharacters) with them.

However, we may also wish to look at the text preceding a matched pattern. We therefore have a similar pair of lookbehind operators. We now use the < sign to point 'behind' the match, matching "cake" only if "fish" precedes it. So to find all those boring old fish cakes, we use:

/(?<=fish )cake/

but to find all the cream cakes and chocolate cakes, do this:

/(?<!fish )cake/

Let's have fish and chips instead of our fish cakes and cream doughnuts instead of cream cakes:

   #!/usr/bin/perl
   # look3.plx
   use warnings;
   use strict;
   $_ = "fish cake and cream cake";
   print "Our original order was ", $_, "\n";
   s/(?<=fish )cake/and chips/;
   print "No, wait. I'll have ", $_, " instead\n";
   s/(?<!fish )cake/slices/;
   print "Actually, make that ", $_, ", will you?\n";

>perl look3.plx
Our original order was fish cake and cream cake
No, wait. I'll have fish and chips and cream cake instead
Actually, make that fish and chips and cream slices, will you?
>

One very important thing to note about lookbehind assertions is that they can only handle fixed-width expressions. So while you can use most of the metacharacters, indeterminate quantifiers like . , ?, and * aren't allowed.

More Advanced Topics - Page 17
Beginning Perl
Backreferences (again) - Page 19


Up to => Home / Authoring / Languages / Perl / BeginningPerl




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers