Parts of an HTTP Transaction: The Request
September 20, 1999
|
Underlying the user interface represented by browsers, is
the network and the protocols that travel the wires to the
servers or "engines" that process requests, and return the
various media. The protocol of the web is known as
HTTP,
for HyperText Transfer Protocol. HTTP is the underlying
mechanism on which CGI operates, and it directly determines
what you can and cannot send or receive via CGI.
|
Besides specifying information about the file being
transported, HTTP also defines the phases of a
request/response interaction.
HTTP provides two primary methods to request documents:
GET or
POST.
The foundation of HTTP/0.9 (the first implementation of the
HTTP protocol) was the definition of the GET method that was
used by a web browser to request a specific document.
For example, the following HTTP request would return the
document "index.html" that was located in the web server's
root directory called "webdocs."
GET /webdocs/index.html CRLF
Notice that the GET request began with the GET keyword,
included a document to retrieve, and ended with a carriage
return and line feed combination.
If you would like, you can try making a GET request by
connecting to your favorite web server and sending the GET
request yourself (as if you were a
web browser).
Below is a
GET session
I cut and pasted from a telnet window. In this case,
I used telnet to contact the web server "www.extropia.com"
and asked for the file "irobot.html" in the
"Scripts/Columns" directory. (Don't forget the two
carriage returns at the end). The server responded by sending
me the contents of that file (the HTML code you see).
selena: telnet www.extropia.com 80
Trying 206.53.239.130...
Connected to www.extropia.com.
Escape character is '^]'.
GET /Scripts/Columns/irobot.html
<HTML>
<HEAD>
<TITLE>Hello there</TITLE>
</HEAD>
<BODY>
Hello there. My, you are awfully good-looking
to be a web browser!
</BODY>
</HTML>
Connection closed by foreign host.
selena:
The beauty of web browsers of course, is that they take care
of the HTTP protocol specifications so that the user only
needs to enter the URL of the page they want to see. The
web browser formulates the actual GET request, sends it to
the web server, receives the HTML document back, and then
displays the HTML document according to the HTML instructions.
Besides allowing web browsers (or you
pretending to be one) to get documents from a web server, the
GET method also implements a method for a web browser to send
optional search parameters as well; (it was used with
ISINDEX HTML files originally).
Search parameters were encoded in a
special way that the web server can deal with.
Encoding works like this:
The URL is differentiated from the
search parameters by a question mark (?). In other words,
a URL generically looks like the following:
http://www.domain.com/dir/file?search parameters
Since you may want to have multiple search
parameters, the GET method specifies that parameters are
differentiated by placing an ampersand sign (&)
between them.
Thus, the encoded URL above becomes something like the
following:
http://www.domain.com/dir/file?search1&search2&search3
Next, search parameters themselves are
specified as "name/value pairs" separated by an equal sign (=)
such as in the following example that sets the variable "lname"
equal to "Sol" and the variable "fname" equal to
"Selena":
http://www.domain.com/dir/file?lname=Sol&fname=Selena
Further, any spaces in the encoding
string are replaced by plus signs (+) as in the following
example:
http://www.domain.com/dir/file?name=Selena+Sol&age=28
Finally, any non-alphanumeric characters
are replaced with their hexadecimal equivalents that are escaped
with the percent sign (%). For example, a single quote character
(') is encoded as %27 and a line break (which is a carriage
return plus a line feed) is encoded as %0D%0A. Thus, we might
see the following example that specifies that the variable
pageName is equal to "Selena Sol's Page":
http://www.domain.com/dir/file?pageName=Selena+Sol%27s+Page
Though the GET method was very useful, a couple of serious
problems remained.
First, the GET method only allowed a limited amount of data
(1024 characters) to be sent as URL encoded data.
If there were too many name/value pairs, some of them would be
clipped and data would get lost.
Further, since the information was sent as part of the URL,
the user could see all of that data. On the one hand,
that made URL's look really
ugly and scary. On the other hand, it meant that the user
got to see all of the inner workings of your CGI input.
This all changed with the development of HTTP/1.0.
The HTTP/1.0 protocol was developed from 1992 to 1996 in
order to satisfy the need to exchange more than simple text
information.
The first major change from the HTTP/0.9 specification was
the use of MIME-like headers in request and response messages.
The next HTTP change was the definition of new request methods:
HEAD and POST.
Let's look at both of these changes in greater depth.
Under HTTP/1.0 an HTTP transaction
consisted of a header followed by an empty line and then some
extra data.
We have already talked about the header.
The POST method of input was the other important change
brought about by the introduction of HTTP/1.0.
The POST method allowed web browsers to send an unlimited
amount of data to a web server by allowing them to tag it on
to an HTTP request after the request headers as the message
body.
Typically, the message body would be our old familiar encoded
URL string after the question mark (?).
Thus, it would not be strange for a web server to get a
POST request that looked something like the following:
POST /cgi-bin/phone_book.cgi HTTP/1.0
Referer: http://www.somedomain.com/Direcory/file.html
User-Agent: Mozilla/1.22 (Windows: I: 32bit)
Accept */*
Content-type: application/x-www-form-urlencoded
Content-length: 29
name=Selena+Sol&phone=7700404
Notice that the "Content-length" request header is equal to
the number of characters in the body of the request. This
is important because a CGI script could easily parse through
the variables in the body using the length.
Of course, as with the GET method, the user never needs to
deal with the protocol itself. Instead, the browser does all
the work of preparing the POST request headers and body.
So the million-dollar question is how does the browser get the
name/value pairs to put into the HTTP message body?
The answer to that is HTML
Forms.
Remember those things from last section?
Contents:
Parts of an HTTP Transaction: The Response
Encryption
Public Versus Private Key Encryption
Secured Transmission (SSL , HTTPS)
Introduction to the Web Application Development Environment (Tools)
Parts of an HTTP Transaction: The Response
|