White Space Is Significant
April 5, 2002
White space consists of one or
more space characters, tabs, carriage returns, line feeds (denoted
as #x20, #x9, #xD, and
#xA, respectively). In the XML 1.0 Recommendation,
white space is symbolized in production rules by a capital
"S", with the following definition (See
http://www.w3.org/TR/REC-xml#sec-common-syn
and
http://www.w3.org/TR/REC-xml#sec-white-space
):
S ::= (#x20 | #x9 | #xD | #xA)+
In contrast to HTML, in which a sequence of white space characters
is collapsed into a single white space and in which newlines are
ignored, in XML all white space is taken literally. This means
that the following two examples are not equivalent:
<Publication>
<Published>1992</Published>
<Publisher>Harmony Books</Publisher>
</Publication>
<Publication>
<Published>1992</Published>
<Publisher>Harmony
Books</Publisher>
</Publication>
By default, XML parsers handle the Publisher
element differently since in the second example, the string
"Harmony Books" contains a newline between the two words.
The application that invokes the parser can either consider the
white space important, ignore it (i.e., strip it), or inform the
parser that it wants white space normalized (collapsed like in
HTML).
Comments
Comments in XML are just like they are in HTML. They begin
with the character sequence "<!--
" and end with the sequence "-->".
The parser ignores what appears between them, except to verify
that the comment is well-formed.
<Publication>
<Published>1992</Published>
<!-- This appears to be the second edition. -->
<Publisher>Harmony Books</Publisher>
</Publication>
In XML, however; there are several restrictions regarding comments:
-
Comments cannot contain the double hyphen combination "
--" anywhere except as part of the comment's
start and end tags. Thus, this comment is illegal:
<!-- illegal comment --->
-
Comments cannot be nested. This means you need to take care when
commenting out a section that already contains comments.
-
Comments cannot precede the XML declaration because that part of
the prolog must be the very first line in the document.
-
Comments are not permitted in a start or end tag. They can appear
only between tags (as if they were content) or surrounding tags.
-
Comments may be used to cause the parser to ignore blocks of
elements, provided that the result, once the commented-out block
is effectively removed by the parser, is still well-formed XML.
-
Parsers are not required to make comments available to the
application, so don't use them to pass data to an application;
use Processing Instructions, discussed next.
-
Comments are also permitted in the DTD, as discussed in chapter 4.
Processing Instructions
Processing instructions
(often abbreviated as PI) are directives intended for an
application other than the XML parser. Unlike comments, parsers
are required to pass processing instructions on to the application.
The general syntax for a PI is:
<?targetApplication applicationData ?>
Where targetApplication is the name (any XML Name) of
the application that should receive the instruction, and
applicationData is any arbitrary string that doesn't
contain the end delimiter. Often applicationData
is name/value pairs that resemble attributes with values, but
there is no requirement concerning the format. Aside from the
delimiters "<?" and
"?>", which must appear exactly as shown,
the only restriction is that there can be no space between the
initial question mark and the target. Some examples follow.
<?xml-stylesheet type="text/xsl" href="foo.xsl" ?>
<?MortgageRateHandler rate="7%" period="30 years" ?>
<?javaApp class="MortgageRateHandler" ?>
<?javaApp This is the data for the MortgageRateHandler, folks! ?>
<?acroread file="mortgageRates.pdf" ?>
Processing instructions are not part of the actual structure of
the document, so they may appear almost anywhere, except before
the XML declaration or in a CDATA section. The
parser's responsibility is merely to pass the PI and its data on
to the application. Since the same XML document could be processed
by multiple applications, it is entirely possible that some
applications will ignore a given PI and just pass it down the
chain. In that case, the processing instruction will be acted
upon only by the application for which it is intended (has
meaning).
Although an XML declaration looks like a processing instruction
because it is wrapped in the delimiters
"<?" and "?>",
it is not considered a PI. It is simply an XML declaration, the
one-of-a-kind markup that may or may not be the first line of the
document.
The target portion of the processing instruction can be a notation
(defined in chapter 4). For example:
<!NOTATION AcrobatReader SYSTEM "/usr/local/bin/acroread">
The corresponding PI would be:
<?AcrobatReader file="Readme.pdf" size="75%" ?>
Start and End Tags Must Match
XML Family of Specifications: A Practical Guide
Entity References
|