|
Like
HTML,
XML
(also known as Extensible Markup Language) is a markup
language which relies on the concept of rule-specifying
tags and the use of a tag-processing application that knows how
to deal with the tags.
|
"The correct title of this specification, and the correct
full name of
XML,
is "Extensible Markup Language". "eXtensible Markup Language"
is just a spelling error. However, the abbreviation "XML" is
not only correct but, appearing as it does in the title of
the specification, an official name of the Extensible
Markup Language.
The name and abbreviation were invented by James Clark; other
options under consideration had included MGML,
(Minimal Generalized Markup Language),
MAGMA (Minimal Architecture For Generalized Markup Applications),
and SLIM (Structured Language for Internet Markup)" -
Extensible Markup Language (XML) 1.0 Specs,
The Annotated Version.
|
However,
XML
is far more powerful than
HTML.
This is because of the "X". XML is "eXtensible". Specifically,
rather than providing a set of pre-defined tags, as in the case
of HTML, XML specifies the standards with which you can define
your own markup languages with their own sets of tags.
XML is a meta-markup language which allows you to define an
infinite number of markup languages based upon the
standards defined by XML.
"The design goals for XML are:
- XML shall be straightforwardly usable over the Internet.
- XML shall support a wide variety of applications.
- XML shall be compatible with
SGML.
- It shall be easy to write programs which process XML
documents.
- The number of optional features in XML is to be kept to
the absolute minimum, ideally zero.
- XML documents should be human-legible and reasonably clear.
- The XML design should be prepared quickly.
- The design of XML shall be formal and concise.
- XML documents shall be easy to create.
- Terseness in XML markup is of minimal importance."
-
Extensible Markup Language (XML) 1.0 Specs, The Annotated
Version.
|
Let's consider a very simple example. Let's create a new markup
language called SCLML (Selena's Client List Markup Language).
This language will define tags to represent contact people and
information about contact people.
The set of tags will be simple. However, they will be
expressive. Unlike <UL> and <LI> XML tags can be
immediately understood just by reading the document.
<CONTACT>
<NAME>Gunther Birznieks</NAME>
<ID>001</ID>
<COMPANY>Bob's Fish Store</COMPANY>
<EMAIL>gunther@bobsfishstore.com</EMAIL>
<PHONE>662-9999</PHONE>
<STREET>1234 4th St.</STREET>
<CITY>New York</CITY>
<STATE>New York</STATE>
<ZIP>Zip: 10024</ZIP>
</CONTACT>
<CONTACT>
<NAME>Susan Czigany</NAME>
<ID>002</ID>
<COMPANY>Netscape</COMPANY>
<EMAIL>susan@eudora.org</EMAIL>
<PHONE>555-1234</PHONE>
<STREET>9876 Hazen Blvd.</STREET>
<CITY>San Jose</CITY>
<STATE>California</STATE>
<ZIP>90034</ZIP>
</CONTACT>
|
Note that the use of XML is not limited to text markup.
The very extensibility of XML means that it could just as
easily be applied to sound markup or image markup. A tag
such as <EMPHASIZE> might be displayed textually as
being bold but audibly as a louder voice!
|
What you see above is a very simple "XML document". As you
can see, it looks pretty similar to an HTML document.
But don't forget, as we said before, it is not enough to
simply encode (markup) the data. For the data to be decoded
by someone or something else, the encoding markup languages
must follow standard rules including:
- The syntax for marking up
- The meaning behind the markup
In other words, a processing application must know what a valid
markup is (perhaps a tag) and what to do with it if it is valid?
After all, how would Netscape
know what to do with the above document? What in the world is a
<PHONE> tag? Is it a legal tag? How should it be
displayed? Our markup language must somehow communicate the
syntax of the markup so that the processing application will know
what to do with it.
In
XML,
the definition of a valid markup is handled by a
Document Type Definition (DTD)
which communicates the structure of
the markup language. The DTD specifies what it means to be a valid
tag (the syntax for marking up).
We'll discuss the details of DTDs later. For now, just get
comfortable with the idea of a DTD as a separate component to
the equation.
Yet we must also communicate the meaning of the markup as
well as the syntax.
To specify what valid tags mean, XML documents are also
associated with style sheets
which provide
GUI
instructions for a processing application like a web browser.
A style sheet, the details of which we will discuss later,
might specify display instructions such as:
- Anytime you see a <CONTACT>, display it
using a <UL> tag. Similarly, </CONTACT> tags
should be converted to </UL>
- All <NAME> tags can be substituted for <LI> tags and
</NAME> tags should be ignored.
- All <EMAIL> tags can be substituted for <LI> tags and
</EMAIL> tags should be ignored.
etc.....
In this example, the style sheet utilizes the functionality of
HTML to define the formatting of SCLML. But if the XML document
was being processed by a program other than a web browser,
the HTML translation step might be bypassed.
Processing applications combine the logic of the style sheet,
the DTD, and the data of the SCLML document and display it
according to the rules and the data.
But wait, isn't this quite complex? Now instead of a single HTML
document which defines the data and the rules to display the
data, we have an SCLML document, a DTD, AND a style sheet.
That's three pieces as opposed to just one.
Further, we need a processing agent that can do the work of
putting the DTD, style sheet, and SCLML document together.
Remember, web browsers are made to read a specific
markup language (like HTML), not any markup language.
That means we have three documents to pull together plus one
processing program to write or buy. What a mess.
Actually however, though there are a few more hurdles to jump
in order to use XML, there are several reasons why all this
is worth it. Let's take a look at them. . . .
|