Start and End Tags Must Match
April 5, 2002
Every start tag must have a corresponding end tag to properly
delimit the content of the element the tags represent. The start
and end tags are indicated exactly as they are in HTML, with
"<" denoting the beginning of a start
tag and "</" indicating the beginning
of the end tag. The end delimiter of each tag is
">".
<ElementName>content</ElementName>
Empty Elements
An exception to the rule about start and end tags is the case in
which an element has no content. Such empty elements
convey information simply by their presence or possibly by their
attributes, if any. Examples from XHTML 1.0 include:
<br />
<hr />
<img src="someImage.gif" width="100" height="200" alt="Some Image" />
An empty element begins like a start
tag but terminates with the sequence
"/>". Optional white space may be used
before the two terminating characters. This author prefers to
include a space to emphasize empty elements. The space before
"/>" is necessary for XHTML 1.0 to be
handled correctly by older browser versions. Of course, it's also
possible to specify an empty element by using regular start and
end tags, and this is syntactically identical (from the parser's
viewpoint) to the use of empty-element notation.
<img src="someImage.gif" width="100" height="200" alt="Some Image"></img>
Note that just like in HTML (or more
appropriately, XHTML), an empty element is often used as a
separator, such as <br /> and
<hr />, or to indicate by its
presence a particular piece of data, or to convey metadata by
its attributes. If the term empty element seems strange to you
when attributes are involved, just think in terms of the content
of the element. There is no content, even when there are
attributes, which is why it's called empty.
Proper Nesting of Start and End Tags
No overlapping of start and end tags from different elements is
permitted. Although this might seem like an obvious requirement,
HTML as implemented by major browsers is considerably more
forgiving and recovers for improper tag overlap. Correct nesting
looks like this:
<OuterElement>
<InnerElement>inner content</InnerElement>
</OuterElement>
An example of improper nesting is:
<OuterElement>
<InnerElement>inner content</OuterElement>
</InnerElement>
Believe it or not, most browsers recover from this type of error
in HTML, but they cannot and will not in XML or any language based
on XML syntax. The improper nesting example results in either one
or two fatal errors, with a message similar to this (depending on
the parser):
Fatal error: end tag '</OuterElement>' does not match start tag. Expected
'</InnerElement>'
Fatal error: end tag '</InnerElement>' does not match start tag. Expected
'</OuterElement>'
Parent, Child, Ancestor, Descendant
The notion of the root element and the proper nesting rules leads
us to some conclusions and terminology about the hierarchy of
elements that are invariant across all XML documents. The terms
ancestor and descendant are not used in the XML 1.0 Recommendation,
but they certainly are in the DOM, XSLT, XPath, and so on.
-
An element is a child of exactly one parent, which is the element
that contains it.
-
A parent may have more than one child.
-
Immediate children and also children of a child are descendants
of the parent.
-
An element is an ancestor of all its descendants.
-
The root is the ancestor of all elements.
-
Every element is a descendant of the root.
-
Every element has exactly one parent, except the root, which has
no parent.
Attribute Values Must Be Quoted
In HTML (but not in XHTML), we are permitted to be inconsistent
in the use of quotation marks to delimit the values of attributes.
Generally, single-word values do not require quotes in HTML. For
example, both of these are acceptable and equivalent in HTML:
<IMG SRC=someImage.gif>
<IMG SRC="someImage.gif">
In XML (and in XHTML), however, we are not allowed to be so
cavalier about quotes. All attribute values must be quoted, even
if there are no embedded spaces.
<img src="someImage.gif" />
<img src='someImage.gif' />
<img src="someImage.gif" width="34" height="17"/>
Notice that either single or double quotes may be used to delimit
the attribute values. Of course, if the atribute value contains
double quotes, then you must use single quotes as the delimiter,
and vice versa.
<Book title="Tudor's Guide to Paris" />
<Object width='5.3"' height='7.1"' />
Elements and Attributes Are Case-Sensitive
XML Family of Specifications: A Practical Guide
White Space Is Significant
|