XSLT and Alternatives (Con't) - Page 3
October 19, 2001
Dealing with an input tree instead of an input document also
gives you an important advantage that XML developers get from DOM
trees: at any given point in your pro-cessing, the whole document
is available to you. If your program sees a word in the
document’s first paragraph that’s defined in the glossary at the
end, it can go to the glossary to pull out the term’s definition.
Using an event-driven model such as the Simple API for XML (SAX)
to process a document instead of a tree-based model like XSLT
uses, your program would process each XML element as it read the
element in. While doing this, if you want to check some
information near the end of your docu-ment when reading an
element in the beginning, you need to create and keep track of
data structures in memory, which makes your processing more
complicated.
TIP When a discussion of XSLT issues talks
about a source tree and a result tree, you can think of these
trees as temporary representations of your input and output
documents.
Not all nodes of a document tree are element nodes. A diagram of
the tree that would represent this document
<?xml-stylesheet href="article.xsl" type="text/xsl"?>
<article>
<!-- here is a comment -->
<title author="bd">Sample Document</title>
<para>My 1st paragraph.</para>
<para>My 2nd paragraph.</para>
</article>
shows that there are nodes for elements, attributes, processing
instructions, com-ments, and the text within elements. (There are
also nodes for namespaces, but this document has no namespace
nodes.)
Figure 1.4 A document tree with several
different node types
It’s easy to match up the parts of the tree with the parts of the
corresponding docu-ment, except that it might appear that the
document has too many "text" nodes. The tree diagram shows text
between the comment and the title element, and text
between the two para elements; where is this text in the
document? You can’t see this text, but it’s there: it’s the
carriage returns that separate those components of the document.
If the two para elements had been written as one line,
like this,
<para>My 1st paragraph.</para>
<para>My 2nd paragraph.</para>
[The lines above are one line. They have been split for
formatting purposes.]
no text node would exist between those two elements, and you
wouldn’t see a text node between them in the tree diagram. (See
section 6.11, "Whitespace: preserving and controlling," page 229
for more on this.)
You also might wonder why, if article is the root element
of the document, it’s not the root node of the tree. According to
XSLT’s view of the data (its "data model"), the root element is a
child of a predefined root node of the tree (shown as a slash in
the diagram) because it may have siblings. In the example above,
the process-ing instruction is not inside the article
element, but before it. It is therefore not a child of the
article element, but its sibling. Representing both the
processing instruction and the article element as the
children of the tree’s root node makes this possible.
XSLT and Alternatives - Page 2
XSLT Quickly
A Simple XSLT Stylesheet - Page 4
|