CGI Sucks!
Well, as you might expect, for all its dynamism, CGI was not a holy grail.
In fact, there are a lot of sysadmins out there who would be ecstatic if
CGI were outlawed. CGI simply causes too many problems.
- CGI introduces security holes. Lincoln Stein writes the
following eloquent warning on the problem,
Unfortunately, there's a lot to worry about [when running a web server
with CGI]. The moment you install a Web server at your site,
you've opened a window into your local network that the entire Internet
can peer through. Most visitors are content to window shop, but a few will
try to peek at things you don't intend for public consumption. Others,
not content with looking without touching, will attempt to force the
window open and crawl in.
It's a maxim in system security circles that buggy software opens up
security holes. It's a maxim in software development
circles that large, complex programs contain bugs. Unfortunately, Web
servers are large, complex programs that can (and in
some cases have been proven to) contain security holes.
Furthermore, the open architecture of Web servers allows arbitrary CGI
scripts to be executed on the server's side of the
connection in response to remote requests. Any CGI script installed at
your site may contain bugs, and every such bug is a
potential security hole.
It is one thing to allow any freako on the Internet access to your web
server, when the communication is controlled through the boundaries
defined by HTTP and implemented by web browsers. It is another thing
to allow a stranger access to an unlimited amount of applications housed
on the same server through a renegade CGI script.
In the WWW Security FAQ, Stein identifies four overlapping types of risk:
- Private or confidential documents stored in the Web site's document
tree may fall into the hands of unauthorized individuals.
- Private or confidential information sent by the remote user to the
server (such as credit card information) might be intercepted.
- Information about the Web server's host machine might leak through,
giving outsiders access to data that can potentially allow them to
break into the host.
- Bugs can allow outsiders to execute commands on the server's host
machine, allowing them to modify and/or damage the system. This
includes "denial of service" attacks, in which the attackers pummel
the machine with so many requests that it is rendered effectively
useless.
I recommend checking out the following CGI Security sites if you are
interested in getting more detailed information.
- CGI is at the mercy of HTTP.
It is important to note that HTTP only provides for a one-time,
question/answer type of communication. Afterall, it was defined
primarily
for web browsers and web servers to exchange HTML documents. Thus,
by definition, HTTP is not very dynamic.
One-time, question/answer communication works like this: the web
browser and the web server are only connected as long as it takes
for the web browser to send one document request and the web server
to send one requested document. If the browser wants a second
document, it must recontact the server and ask again. Each request is
new, the server maintains no ongoing connection or record of past
exchanges.
While this is very efficient for network traffic (because the
bandwidth is only used when information needs to be exchanged), it
is a big pain in the butt when it comes to CGI, because CGI is about
conversations, not about one-time question/answers
Imagine that when talking on the phone you had to hang up and redial
every time you said something and received an answer. Imagine
further that everytime you called back you had to go over every
previous exchange before you could get to the next piece. That is the
way web browsers work with web servers and this makes communication
tough.
This makes communication tough for three reasons.
First, if the client
and server are to maintain information over several exchanges, the CGI
must be responsible for keeping a running dictation of the conversation
so that every time there is a new exchange, the web server can consult
the record of the entire conversation up to that point. This is what
CGI aficionados call "maintaining state". The CGI script must be
able to keep track of certain information like username or the contents
of a virtual shopping cart for every "instance" of a script. (1). That is, there must be a way to tie the current
HTTP request to related ones that have gone on before. Maintaining
state is possible with CGI using hidden variables, by encoding the
URL, or by maintaining a state file on the server, it's just not easy or
efficient. (2).
Second, every set of question/answers causes the web server to
execute a unique instance of the CGI script. This is pretty
expensive, especially on a high volume web site which may have 100
instances of a CGI script executing at any given moment, each,
perhaps, with its own Perl interpreter. (3) Every
one of those CGI scripts takes a little bit of
umph out of the server engine. If we were not limited to
question/answer format, we would not need to execute so many instances.
Consider the following CGI application executing....
Client: Hello?
Server: Welcome, what would you like
(CGI script executed once)
Client: I would like a list of products
you are selling
Server: Here is a list (another one)
Client: I want to purchase this product
Server: Okay. (yep)
Client: I'm done, can I check out?
Server: Yes, what is your credit card number?
(another script)
Client: Here it is.
Server: Thanks (another instance of the script
which also emails the results to some
store admin) (4)
Yuck, this exchanged caused 5 instances of the store script to be
executed as well as 5 Perl interpreters if the CGI script was written in
Perl.
Third, CGI is extremely slow. Everytime the client does something,
the CGI Script must recreate the entire dialog and execute a new
request. Add a new item to a virtual shopping cart - new request.
Calculate a running total - new request. Submit an order - yet another
request. Each request takes time and since the CGI script must be
executed again and everyone must wait for a busy internet.
- CGI is ugly. Finally, CGI scripts produce fairly ugly
user-interfaces. Basically, CGI is limited to bland HTML-based forms
and whatever bells and whistles can be provided by surrounding HTML
layout. Thus, no CGI application looks like your swank bootleg
copy of Word.
This may not seem like a big issue at first, but when you start
competing for web hits with multi-million dollar companies, image is
indeed everything. CGI simply cannot compare with web based
applications which are not limited to HTML.
Well, those are some pretty damning flaws. Like I said, many systems
administrators would love to see CGI fall off the face of the Earth.
Unfortunately for those system administrators, the fact is that CGI has
continued to be the workhorse of the web, powering 90% of the dynamic web
pages out there.
The fact is that CGI, especially CGI/Perl is easy to work with and most
non-technically oriented webmasters out there can get their needs filled,
and filled right away. However amazingly, brand-fantasmagorically
wonderful other technologies sound, they are still vaporware as far as
the average web developer is concerned. Either the ISP does not provide
those technologies, or the learning and development curve is too steep or
expensive. And of course for small applications typical of most
websites, the big guns of C or C++ are just overkill.
CGI, for all its flaws, works, and works pretty darn well if done
carefully. "Intranet" developers with massive budgets can yack all they
want to about servlets and SQL gateways and Server Side Includes and
customized server applications written in Java, but for most "internet"
developers out there, CGI is the only tool available for solving their
problems. And with creativity and care, CGI can also be the right tool.
Footnotes
- You can think of an instance of a script
as a unique and independent version of a generic script. It is called an
"instance" because ten web surfers could all execute a CGI script at the
same time. Though each web surfer would be using the same generic
CGI script, each instance of that script would be personalized to that
web surfer. Thus you may have ten instances of the exact same script
running in parallel on the web server hardware.
- Hidden variables allow you to maintain
state using the HTML "Hidden" form tag. Essentially, you
include information in your HTML form that will not be visible to
the user when they look at the form in their web browser window,
but which will be transferred to the CGI script with the
user-supplied data. The format of the tag looks something like the
following:
<INPUT TYPE = "HIDDEN" NAME = "first_name" VALUE = "selena">
<INPUT TYPE = "HIDDEN" NAME = "last_name" VALUE = "sol">
When the CGI script processes the information which the user enters
into the HTML form, it will also receive the variable "first_name" with
the value of "selena" as well as "last_name" equal to "sol".
If the user is not using a FORM tag to navigate through a site,
the admin can still encode state information in the URL by
using the HTTP standard for URL encoding. For example, the following
hyperlink would send the same info as above to the CGI script.
<A HREF = "www.extropia.com/test.cgi?first_name=selena&last_name=sol">click
here</A>
Notice that variables to be passed along are listed after the question
mark, name/value pairs are separated by the ampersand sign, and the
variable name and variable values are separated by an equal sign.
Finally, the CGI script may write out state information to a file on the
server and then simply pass along the location of the file using one or
both of the above methods. This is best when there is a large amount of
state information.
By the way, maintaining state can also be achieved using
Netscape Cookies,
however, we will not address cookies here because they require their
own article.
- Perl is a fun language to use because it keeps
the nuts and bolts of machine code as invisible as possible. One of the
ways Perl
does this is by adding an extra step between you and the computer. This
extra step is called a "Perl interpreter". This interpreter (which your
sysadmin must install) reads a Perl program that you write and
translates it "on the fly" into machine code which can be understood
by your computer. Your "executable" can then be moved to any other
system with a Perl interpreter and be run without problems.
Further, the code can be easily modified and understood.
Unfortunately, in order to run your executable, you must also run the
interpreter and this can be expensive in terms of server resources.
In more intense languages like C or C++, there is no interpreter.
You must use a special "compiler" program to translate your code into
machine code. This affords greater power to your programs since you do
not need to run a separate interpreter when you run your executable, but
it does mean that executables are specific to each operating system and
that the source code is stored separately from the executable code.
- Notice that CGI scripts must be smart enough to
answer all sorts of questions.
|