Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Posix and Unicode Classes - Page 8

February 23, 2001

Perl 5.6.0 introduced a few more character classes into the mix - first, those defined by the POSIX (Portable Operating Systems Interface) standard, which are therefore present in a number of other applications. The more common character classes here are:

Shortcut Expansion Description
[[:alpha:]] [a-zA-Z] An alphabetic character.
[[:alnum:]] [0-9A-Za-z] An alphabetic or numeric character.
[[:digit:]] \d A digit, 0-9.
[[:lower:]] [a-z] A lower case letter.
[[:upper:]] [A-Z] An upper case letter.
[[:punct:]] ["#$%&'()*+,-./:;<=>?@\[\\\]^_`{|}~] A punctuation character - note the escaped characters [, \, and ].

The Unicode standard also defines 'properties', which apply to some characters. For instance, the 'IsUpper' property can be used to match any upper-case character, in whichever language or alphabet. If you know the property you are trying to match, you can use the syntax \p{} to match it, for instance, the upper-case character is \p{IsUpper}.

Alternatives

Instead of giving a series of acceptable characters, you may want to say 'match either this or that'. The 'either-or' operator in a regular expression is the same as the bitwise 'or' operator, |. So, to match either 'yes' or 'maybe' in our example, we could say this:

> perl matchtest.plx
Enter some text to find: yes|maybe
The text matches the pattern 'yes|maybe'.
>

That's either 'yes' or 'maybe'. But what if we wanted either 'yes' or 'yet'? To get alternatives on part of an expression, we need to group the options. In a regular expression, grouping is always done with parentheses:

> perl matchtest.plx
Enter some text to find: ye(s|t)
The text matches the pattern 'ye(s|t)'.
>

If we have forgotten the parentheses, we would have tried to match either 'yes' or 't'. In this case, we'd still get a positive match, but it wouldn't be doing what we want - we'd get a match for any string with a 't' in it, whether the words 'yes' or 'yet' were there or not.

You can match either 'this' or 'that' or 'the other' by adding more alternatives:

> perl matchtest.plx
Enter some text to find: (this)|(that)|(the other)
'(this)|(that)|(the other)' was not found.
>

However, in this case, it's more efficient to separate out the common elements:

> perl matchtest.plx
Enter some text to find: th(is|at|e other)
'th(is|at|e other)' was not found.

You can also nest alternatives. Say you want to match one of these patterns:

  • 'the' followed by whitespace or a letter,
  • 'or' You might put something like this:

    > perl matchtest.plx
    Enter some text to find: (the(\s|[a-z]))|or
    The text matches the pattern '(the(\s|[a-z]))|or'.
    
    >
    

    It looks fearsome, but break it down into its components. Our two alternatives are:

    • the(\s|[a-z])
    • or

    The second part is easy, while the first contains 'the' followed by two alternatives: \s and [a-z] . Hence 'either "the" followed by either a whitespace or a lower case letter, or "or". We can, in fact, tidy this up a little, by replacing (\s|[a-z]) with the less cluttered [\sa-z].

    > perl matchtest.plx
    Enter some text to find: (the[\sa-z])|or
    The text matches the pattern '(the[\sa-z])|or'.
    >
    

    Reoccurring Character Classes - Page 7
    Beginning Perl
    Repetition - Page 9


Up to => Home / Authoring / Languages / Perl / BeginningPerl




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers