PHP XML
March 1, 2002
At the simplest level XML is structured text. We are
surrounded by structured text. This book is structured
text, as it contains chapters, sub-headings, and
paragraphs. A letter is structured text, as it typically
contains a date, salutation, and paragraphs. Each section
of a book or letter is defined by the structure of that
document. To make the structure visible, each of these
sections can be noted with markup tags (similar to HTML).
The tags used to mark up the document are the basics of
XML. XML is a way of writing the structured text in a
common human and machine-readable format.
XML can be used to describe any kind of structured text,
including other markup languages. There are over a dozen markup
languages based on XML. They are used to describe everything
from graphics to mathematical equations. To try to keep this
chapter under control, we are going to look at only the parts of
XML that are implemented in PHP and are used to read and write
an XML file. Specifically we will be looking at a basic XML
file, a way of specifying parts of an XML file (XPath), and a
simplified XML format called SML.
To learn more about the different XML standards, take a look at
the World Wide Web Consortium (W3C). The W3C is the organization
that manages Internet standards like XML. They are responsible
for issuing and maintaining the XML family of specifications and
recommendations. For more information visit their web site:
http://www.w3.org/.
In this chapter we will look at:
- The basics of XML, SML, and XPath
- XML as a datastore and programmatic interaction
- The PHP APIs (SAX, DOM, and PRAX) that allow interaction
with an XML document
- Examples of the APIs in action
- The Sablotron XSL support for PHP
At the time of writing this chapter, the support for XML within
PHP is still considered to be experimental. This experimental
aspect shows up when the behavior of the code is unexpected and
inconsistent.
Overview of XML
Like an HTML document, an XML document has tags and data. Unlike
HTML, XML tags can be named almost anything. For example, <B>,
<Bb>, and <4f5gt6g> are all valid (start) XML tags, but only <B>
in the preceding list is valid HTML. Like an HTML document an
XML document can have data between the start and end tags, for
example, <B>text</B> and <Bb>some text</Bb>. In XML the combined
start tag, data, and end tag are referred to as an <b>element</b>.
This figure shows the different parts of an XML element:
An element, consisting of a one start and one closing tag,
multiple optional attributes, optional character data content,
and sub-elements (child nodes) is considered a node. In an
element there are start and end tags, for example,
or . The name of the tag must be
unique and is case-sensitive. The element can be a container for
other elements or it can contain character data. An attribute is
part of an element, for example, where id="4" is
the attribute and first is the name of the element. An attribute
is similar to an array in that both have a key-value pair.
The XML tags in a document must have two characteristics. They must be:
- Well-formed An XML file is considered well formed
if all tags are closed and all elements are nested properly
and all attributes are enclosed in quotes
- ValidA valid document is one that must be
"well-formed" and complies with a referenced Document
Type Definition (DTD) or schema
Now let's look at what all these concepts look like
in XML files.
The following XML is not well-formed and not valid:
<root>
<title>
<name>some text</title>
<name>
- The <root> tag was not closed, and must be.
- The tags <title> and <name> are nested incorrectly.
The <name> tags should have been closed before the
<title> tag
- There's no DTD so the sample can't be validated.
The following XML is well-formed but it is not valid:
<root>
<title>
<name>some text</title>
<name>
- The <root> tag is closed.
- The tags <title> and <name> are nested correctly.
- There's no DTD so the sample can't be validated.
The following XML is well-formed and valid:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ELEMENT name (#PCDATA)>
<!ELEMENT root (title)>
<!ELEMENT title (name)>
]>
<root>
<title>
<name>some text</name>
</title>
</root>
- The <root> tag is closed.
- The tags <title> and <name> are nested correctly.
- There's a DTD so the sample can be validated.
Note the XML file doesn't need to include the
DTD, it can be referenced by replacing the
<! DOCTYPE [ ]> with <!DOCTYPE root SYSTEM "root.dtd">,
as long as the <! DOCTYPE [ ]> declaration is moved to
root.dtd.
A Sample LDAP Application in PHP Page 22
Professional PHP4 Programming
The XML Framework
|