Elements and Attributes Are Case-Sensitive
April 5, 2002
Unlike HTML, which is case insensitive (as is the SGML
metalanguage of which HTML is an appplication), XML is strictly
case-sensitive, and so therefore is every application of XML
(e.g., XSLT, MathML, SVG and so forth, plus any languages you
create). Therefore, the following elements are all unique and
are in no way related to one another in XML:
price
Price
PRICE
The case sensitivity nature of XML often confuses novices. Be
sure to remember this when doing string comparisons in code.
The W3C's Extensible HyperText Markup Language (XHTML) recasts
HTML in XML syntax. In XHTML, all elements and attributes have
lowercase names, such as:
body
h1
img
href
Notice that this is not merely a convention; it is an absolute
requirement. An XHTML document that contains capital letters in
element or attribute names is simply invalid, even though
uppercase or mixed-case names such as BODY,
Body, or even bOdY would be perfectly
acceptable in HTML.
Uppercase Keywords
Since XML is case-sensitive, it should not be surprising that
certain special words must appear in a particular case. In general,
the keywords that relate to DTDs (e.g., DOCTYPE,
ENTITY, CDATA, ELEMENT,
ATTLIST, PCDATA, IMPLIED,
REQUIRED, and FIXED) must be all
uppercase. On the other hand, the various strings used in the XML
declaration (e.g., xml, version,
standalone, and encoding) must appear
in all lowercase.
Case Conventions or Guidelines
When creating your own XML vocabulary, it would be desirable if
there were conventions to explain the use of uppercase, lowercase,
mixed case, and underscores, and hyphens. Unfortunately, no such
conventions exist in XML 1.0. It is a good idea to adopt your own
conventions and to apply them consistently, at least across your
project, but ideally throughout your entire organization.
For example, for element names I prefer using what is often called
CamelCase because the initial letter of each word in a multiword
name is uppercase and all others are lowercase, creating humps
like a camel's back. (It's also sometimes called TitleCase because
it resembles the title of a book.) For example:
<DiscountPrice rate="20%" countryCode="US" />
Note that for attributes, I also use CamelCase, except the first
word is always begun with a lowercase letter, as in "
countryCode". In fact, the terms UpperCamelCase
(as I use for elements) and lowerCamelCase (as I use for
attributes) are often used to make this distinction more clear.
One reason that I favor this convention is that in any context
(including documentation), it's easy to distinguish elements from
attributes.
It would be just as reasonable, however, to use all uppercase
letters for elements, all lowercase for attributes, and a hyphen
to separate multipart terms as in the following examples, or even
to use all uppercase for elements and attributes.
<DISCOUNT-PRICE rate="20%" country-code="US" />
As stated earlier, for XHTML, the W3C elected to use all lowercase
letters. The most important thing is to pick a convention for your
project (or your company) and to be consistent across developers
and applications.
We've seen UpperCamelCase for elements and lowerCamelCase for
attributes in the employee example: Employee with its
sex attribute, Address,
PhoneNumbers, and so on. The following fragment from
the W3C's SOAP 1.2 Part 2 Adjuncts Working Draft
(
http://www.w3.org/TR/2001/WD-soap12-part2-20011002/#N4008D
) illustrates its use of UpperCamelCase for element names and
lowerCamelCase for attributes, as well as for namespace prefixes.
<env:Body >
<m:GetLastTradePrice
env:encodingStyle="http://www.w3.org/2001/09/soap-encoding"
xmlns:m="http://example.org/2001/06/quotes" >
<m:symbol>DEF</m:symbol>
</m:GetLastTradePrice>
</env:Body>
Root Element Contains All Others
There must be one
root element, also known as the
document element, which is the parent of all other
elements. That is, all elements are nested within the root
element. All descendants of the root are its children. They
represent the content of the root. Recall that the name of the
root element is given in the DOCTYPE line if a DTD
is referenced (either an external or internal one). We also noted
that this document element must be the first element the parser
encounters (after the XML prolog, which does not contain elements).
A somewhat surprising aspect, at least to this author, is that
the XML Recommendation does not preclude a recursive root! In
other words, it is possible for a root element to be defined in
a DTD as containing itself. Although this is not common, it is
worth noting. For example, in NASA's IML DTD, we allowed that
the root element Instrument could contain other
Instrument children. (The DTD syntax shown here is
formally described in chapter 4.)
<!ELEMENT Instrument (Instrument | Port | CommandProcedureSet)* >
XML Syntax Rules
XML Family of Specifications: A Practical Guide
Start and End Tags Must Match
|