Elements, Tags, Attributes and Content
March 29, 2002
To understand XML syntax, we must first be familiar with
several basic terms from HTML (and SGML) terminology. XML
syntax, however, differs in some important ways from both HTML
and SGML, as we´ll see.
Elements are the essence of document structure. They represent
pieces of infor-mation and may or may not contain nested
elements that represent even more spe-cific information,
attributes, and/or textual content. In our employee directory
example from chapter 2 (Figure 2-1), some of the elements were
Employees , Employee , Name , First , Last , Project ,
and PhoneNumbers.
Tags are the way elements are indicated or marked up in a
document. For each element,1 there is typically a start tag that
begins with "<"(less than) and ends with
">"(greater than), and an end tag that begins with
"</"and ends with ">". Some of the
start tags in our example were <Employees>,
<Employee>, <Name>, and so forth. The corresponding
end tags for these elements were </Employees>,
</Employee>, and </Name>.
If an element has one or more attributes, they must appear
between the "<"and ">"delimiters of the start tag. Attributes
are qualifying pieces of information that add detail and further
define an instance of an element. They are typically details
that the language designer feels do not need to be nested
elements themselves; the
assumption is that the attributes will generally be accessed
less often than the elements that contain them, but this tends
to be application dependent.2 In our employee example,
the only element that had an attribute was Employee,
and the attribute was sex , with two kinds of
instances:
<Employee sex="male">
or
<Employee sex="female">.
Each attribute has a value, the quoted text to the
right of the equal sign. In the pre-ceding examples, the values
of the two instances of the sex attribute are “male” and
“female”. Although in this case the value is a single word,
values can be any amount of text, enclosed in single or double
quotes. HTML permits attributes that do not require values
(e.g., the selected attribute to denote a default choice in a
form, as in <OPTION selected>), but this so-called
attribute minimization is expressly not permitted in XML.
Content is whatever an element contains. Sometimes
element content is simply text. In other cases, elements contain
nested elements; the inner (child) elements are called the
content of the outer (parent) element. Content is the data that
the element contains. For example, in this fragment:
<Address>
<Street>123 Milky Way</Street>
<City>Columbia</City>
<State>MD</State>
<Zip>20777</Zip>
</Address>
“123 Milky Way” is the text content of the Street element,
“Columbia” is the text content of the City element, and Street ,
City , State , and Zip are all nested element content of the
parent Address element, in other words, “123 Milky Way Columbia
MD 20777”. (The space preceding the last three words in due to
new lines, as we’ll see.)
Notice that the content of Zip is the text string “20777”. Why
do we not say that this is a number or, better yet, an example
of some zip code datatype (constrained to either the valid five-
digit or five-plus-four-digit ddddd-dddd values for zip codes)?
Because there is nothing about the Zip element that conveys its
content is numeric! We could, however, denote the element’s
datatype explicitly by means of an attribute.
<Zip type="integer">20777</Zip>
We’ll eventually see how an alternative to DTDs called XML
Schema makes data typing easier and far more flexible.
Another possibility, called mixed content, was illustrated in
the section “Document-Centric vs. Data-Centric” in chapter 2, in
which both text and element content may appear as the content of
a parent element. We’ll see how to handle this in chapter 4.
-
1. With the exception of something called an empty
element, as we will soon discuss.
-
2. This is a tremendous oversimplification. See
“Elements vs. Attributes: Guidelines,” in
chapter 4.
XML Family of Specifications: A Practical Guide
XML Document Structure
|