Entity References
April 5, 2002
Entity references are markup
that the parser replaces with character data. In HTML, there are
hundreds of predefined character entities, including the Greek
alphabet, math symbols, and the copyright symbol. There are only
five predefined entity references in XML, however, as shown in
Table 3-2.
|
Predefined Entity References
|
|
Character
|
Entity Reference
|
Decimal Representation
|
Hexidecimal Representation
|
|
<
|
<
|
<
|
<
|
|
>
|
>
|
>
|
>
|
|
&
|
&
|
&
|
&
|
|
"
|
"
|
"
|
"
|
|
'
|
'
|
'
|
'
|
We've already seen how entity references can be used as content.
They can also appear within attribute values. According to Table
3-2,
<CD title="Brooks & Dunn's Greatest Hits" />
is equivalent to the decimal representation:
<CD title="Brooks & Dunn's Greatest Hits" />
and to the hexidecimal representation:
<CD title="Brooks & Dunn's Greatest Hits" />
However, the next line is illegal because ampersand
("&") must be escaped by using either the entity
reference or one of its numeric representations:
<CD title="Brooks & Dunn's Greatest Hits" />
This is because ampersand and less-than are special cases.
-
Note: You are required to use the predefined entities
< and & to escape the
characters < and & in all cases
other than when these characters are used as markup delimiters,
or in a comment, a processing instruction, or a CDATA section.
In other words, the literal "<"
and "&" characters can appear only
as markup
-
delimiters, or within a comment, a processing instruction, or a
CDATA section.
Listing 3-3 illustrates the use of all five predefined character
entities, several decimal representations of Greek letters, and
the three legal variations of the Brooks & Dunn example. If
we run this through an XML parser, we can verify that it is
well-formed; we did not use the literal ampersand or the literal
less-than
Listing 3-3 Examples of Predefined Entities and Greek Letters
(predefined-entities.xml)
<?xml version="1.0" standalone="yes"?>
<Predefined>
<Test>The hot tip from today's <StockWatch> column is:
"AT&T stock is doing better than
Ralph Spoilsports Motors' stock."
</Test>
<PS>Now, wasn't that as easy as Π?
Or α, β, γ?</PS>
<CD title="Brooks & Dunn's Greatest Hits" />
<CD title="Brooks & Dunn's Greatest Hits" />
<CD title="Brooks & Dunn's Greatest Hits" />
</Predefined>

FIGURE 3-1 Predefined entities displayed in Internet Explorer
|
before the word StockWatch. Figure 3-1 shows how this example
looks in Internet Explorer, which renders the characters that
are represented by the entities. It also confirms that the three
Brooks & Dunn variations are equivalent.
HTML (and therefore XHTML) includes three large sets of predefined
entities: Latin1, Special, and Symbols. You can pull these into
your XML document using external entities, covered in chapter 4.
The files containing the entities are:
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
White Space Is Significant
XML Family of Specifications: A Practical Guide
CDATA Sections
|