Lookaheads and Lookbehinds - Page 18
April 6, 2001
Sometimes you may want to say something along the lines of
'substitute the word "fish" with "cream", but only if the next
word is "cake".' You can do this very simply by saying:
s/fish cake/cream cake/
What does this do? The regular expression engine scans a
referenced string, looking for a match on "fish cake" On finding
one, it substitutes the text "cream cake". Not too bad - it does
the job. In this case it's not too big a deal that it has to
substitute five characters from each match with five identical
characters from the substitution string. It's not hard to see how
this sort of inefficiency could really start to bog a program
down if we used substitutions excessively.
What we want is a way of putting an assertion into the match - a
'match the text only if the next word is "cake"' clause - without
actually matching the assertion itself. Having matched "fish", we
really just want to look ahead , to see if it says " cake" (and
give the match a thumbs-up if it does), then forget about "cake"
altogether.
In life, that's not so easy. Fortunately in Perl we have an
operator for just this sort of thing:
/fish(?= cake)/
will match exactly what we want - it looks for "fish", does a
positive lookahead on " cake", and matches "fish" only if that
succeeds. For example:
#!/usr/bin/perl
# look1.plx
use warnings;
use strict;
$_ = "fish cake and fish pie";
print "Our original order was ", $_, "\n";
s/fish(?= cake)/cream/;
print "Actually, make that ", $_, " instead.\n";
will return
>perl look1.plx
Our original order was fish cake and fish pie
Actually, make that cream cake and fish pie instead.
>
We can also look ahead negatively, by using an exclamation mark
instead of the equals sign:
/fish(?! cake)/
which will match "fish" only if the following word is not "
cake". If we adapt look1.plx like so:
#!/usr/bin/perl
# look2.plx
use warnings;use strict;
$_ = "fish cake and fish pie";
print "Our original order was ", $_, "\n";
s/fish(?! cake)/cream/;
print "Actually, make that ", $_, " instead.\n";
[Lines 4 and 5 above are one line. They have been split for
formatting purposes.]
then sure enough, it's "fish pie" that gets matched this time and
not "fish cake".
>perl look2.plx
Our original order was fish cake and fish pie
Actually, make that fish cake and cream pie instead.
>
Lookaheads are very powerful as you'll soon discover if you
experiment a little, particularly when you start to use less
specific expressions (using metacharacters) with them.
However, we may also wish to look at the text preceding a matched
pattern. We therefore have a similar pair of lookbehind
operators. We now use the < sign to point 'behind' the match,
matching "cake" only if "fish" precedes it. So to find all those
boring old fish cakes, we use:
/(?<=fish )cake/
but to find all the cream cakes and chocolate cakes, do this:
/(?<!fish )cake/
Let's have fish and chips instead of our fish cakes and cream
doughnuts instead of cream cakes:
#!/usr/bin/perl
# look3.plx
use warnings;
use strict;
$_ = "fish cake and cream cake";
print "Our original order was ", $_, "\n";
s/(?<=fish )cake/and chips/;
print "No, wait. I'll have ", $_, " instead\n";
s/(?<!fish )cake/slices/;
print "Actually, make that ", $_, ", will you?\n";
>perl look3.plx
Our original order was fish cake and cream cake
No, wait. I'll have fish and chips and cream cake instead
Actually, make that fish and chips and cream slices, will you?
>
One very important thing to note about lookbehind assertions is
that they can only handle fixed-width expressions. So while you
can use most of the metacharacters, indeterminate quantifiers
like . , ?, and * aren't allowed.
More Advanced Topics - Page 17
Beginning Perl
Backreferences (again) - Page 19
|