Form Validation
May 10, 1999
Suppose that you have a web page that collects user
registrations -- these registrations store user information
in a simple database, such as telephone numbers and e-mail
addresses. Of course, for the sake of ethics, we're
assuming that the user has volunteered this information with
upfront knowledge about its ultimate use. In fact,
we'll look at coding this simple registration log in the next
example -- first, though, let's look at validating
the user's input. After all, this registration log isn't
of much use if it contains invalid data.
Telephone numbers are easier to validate than e-mail addresses,
so let's begin there. Assuming a standard North
American telephone number, the format for such a number is an
area code (3 digits) followed by seven digits. Spaces
and dashes should be optional but tolerated by our validator.
The aim in this example is to verify that the telephone
number provided in the user's registration is at least
of a valid format --obviously we can't be sure the number
isn't fictional, but some validation is better than nothing.
Within the HTML that constructs the registration form there
is a field where the user inputs their telephone number:
Telephone # (including area code):
<input type="text" width="10" name="userphone"><Br>
In our Perl program, we use the CGI object to retrieve the
value of the userphone parameter:
$userphone=$cgiobject->param("userphone");
And we can use a conditional pattern match to assign a true
or false value to a validation variable:
$fieldValid=$userphone=~/^\D*\d{3}?\D*\d{3}?\D*\d{4}?\D*$/;
Yikes! On the left hand side of the assignment operator is
our validation variable, $fieldValid. This variable
will ultimately receive a true or false value, depending
on the success of the right hand operation. That right
hand operation is the now-familiar conditional pattern
match. In this pattern match, the user's phone number ($userphone)
is compared against a somewhat cryptic regexp syntax. The
logic behind our regexp can be eloquently stated as:
"Starting at the beginning of the data there are
zero or more non-digits (the \D character class) followed
by exactly three digits (the \d character class), followed
by zero or more non-digits followed by exactly three
digits, followed by zero or more non-digits, followed by
exactly four digits followed by zero or more non-digits,
followed by the end of the data string." Whew!
The tolerance of this regular expression will successfully
match, for example, "(555) 555-2222" or
"555-555-2222" or "5555552222" and so
on, but will reject a phone number missing any digits
or with extra digits.
If this field has validated successfully, we might also
want to apply a substitution to $userphone, so that
all phone numbers reside in the same format in our future
registration log. We can simply strip out all non-digits
from the user's entry:
if ($fieldValid)
{ $userphone=~s/\D//g }
Thus far we've seen the bits of pieces of form validation.
Let's reconsider the big picture -- we're using form
validation as a precursor to storing a registration log.
So, let's begin building our real-life registration script,
register.cgi, focusing first on the validation code.
register.cgi (preliminary)
#!/usr/bin/perl
use CGI;
#create an instance of the CGI object
$cgiobject = new CGI;
#grab the values submitted by the user
$userphone=$cgiobject->param("userphone");
#output HTML header to web browser
print $cgiobject->header;
#test form validation, output error if necessary
#otherwise proceed to registration log
if ( &validateForm )
{ ®isterForm }
else
{ &output_fail }
# subroutine which validates form fields and
#returns a true or false result
sub validateForm
{ $failedFields="";
$formValid=1;
$fieldValid=$userphone=~/^\D*\d{3}?\D*\d{3}?\D*\d{4}?\D*$/;
if ($fieldValid)
{ $userphone=~s/\D//g }
else
{ $failedFields.="Telephone Number,";
$formValid=0 }
return $formValid
}
#subroutine which outputs failure message
#if form does not validate
sub output_fail
{ chop($failedFields);
$resultPage="<html><head>".
"<title>Uh-Oh: Registration Problem</title>".
"</head><body bgcolor=\"white\">".
"<h2>Sadly, there seems to be a problem with your ".
"form submission. Specifically, the following ".
"mandatory fields were filled in improperly:</h2>".
"<Br><h3>$failedFields</h3>".
"<Br>Please go back and try again.".
"</body></html>";
print $resultPage;
}
In looking over the first version of register.cgi, we
cover a fair amount of Perl territory. Notice the
introduction of subroutines -- we use subroutines to
"bundle" a section of code. Subroutines often return
a result, such as true or false, which lets us call the
subroutine from within a conditional statement -- in this
example, we call the &validateForm subroutine
from within an if statement (the ampersand preceding
a subroutine name is often optional but it is good and safe
practice). This if statement is the main control
of program flow: if the form is valid then we proceed to
the registration subroutine, which is still fictional
at this point; if the form is not valid, we output an error
message to the user's browser detailing which field(s)
failed validation.
Returning attention to the task at hand, we probably want to
validate other fields in addition to the user's
telephone number. What other fields might we validate? Were
this the Family Feud, and were I a quaintly amorous
Richard Dawson, I'd hereby shout "Survey Says!?"
-- and, ding, the number one answer would be
"e-mail addresses"! So here is the bad news --
e-mail addresses are darn hard to validate. In fact, they
are so difficult to validate that we won't attempt it in
this article, but the Resources section will contain some
links to information on this very matter. In brief, the
reason e-mail address validation is such a trauma is because
the valid syntax for an e-mail address is quite flexible, and
too difficult to capture in a single regexp pattern
match.
For simplicity's sake, then, let's say that we will add
validation for the user's name and ZIP code. The name
field must simply contain any alphabetical input while
the ZIP code should conform to either the traditional 5-digit
number or the newfangled 5 + 4-digit ZIP. Modifying our
&validateForm subroutine with the proper logic
and regexp comparisons yields:
# subroutine which validates form fields and
#returns a true or false result
sub validateForm
{ $failedFields="";
$formValid=1;
#validate phone number
$fieldValid=$userphone=~/^\D*\d{3}?\D*\d{3}?\D*\d{4}?\D*$/;
if ($fieldValid)
{ $userphone=~s/\D//g }
else
{ $failedFields.="Telephone Number,";
$formValid=0 }
#validate user name
$fieldValid=$username=~/^[a-zA-Z]+/;
unless ($fieldValid)
{ $failedFields.="User Name,";
$formValid=0 }
#validate ZIP code
$fieldValid=$userZIP=~/^\d{5}(-\d{4})?$/;
unless ($fieldValid)
{ $failedFields.="ZIP Code,";
$formValid=0 }
return $formValid
}
Our new, beefier &validateForm subroutine simply
builds on its predecessor. The user name test verifies
that there be at least one alphabet character. The ZIP code
test uses a regular expression to allow either a 54321
ZIP code or a 54321-1234 ZIP code. Quick narration
of ZIP regexp logic: "Starting at beginning of data,
there must be 5 digits. The group of characters represented by
one dash followed by four digits may appear zero
or one times, followed by the end of the data."
At the start of the subroutine we set a flag, the variable
$formValid, to 1 -- meaning that we begin
validation with the assumption that the form is
valid (and that man is basically good). As we validate each
field, if that field should fail, then $formValid
is set to 0, tripping the flag to indicate that there
is an invalid field in the form. We use the $failedFields
variable to simply collect the names of fields
as they fail, for later output to the browser.
That, then, sums up the basics of form validation. Needless
to say, there are many types of data that a web
page may request, and validating different sorts of
information often requires different strategies. Typically,
though, regular expression pattern matching is a key
tool in validating user input.
CGI and Object Oriented Perl: Output
The Perl You Need to Know
Registration Log
|