The SAX Model
March 1, 2002
SAX is referred to as an event-driven model or system. SAX
parses an XML document and as it encounters the different parts
of an element it performs different functions. The different
functions performed depend on how the programmer sets up the
code.
There are three main elements that are interacted with the
start tag, the end tag, and the data between the tags. For
example, when a parser encounters a start tag, like <City>, the
parser runs a function that renders the tag with any defined
XML, like <td>. That is, SAX can parse XML documents to
substitute HTML for XML tags. Also, an XML document can be
analyzed while parsing it with SAX to produce a subset, before
transforming it to XML or HTML.
This means that while parsing, an event occurs when SAX
encounters a pre-defined part of an XML document. Each of these
events is defined with a handler. There are a total of seven
different kinds of events within SAX that PHP supports with
defined handlers:
| PHP Function to Set Handler
|
Event Description
|
| xml_set_element_handler()
|
Used to define the events for working with
or 'handling' the start and end tags within XML. Element events
are issued whenever the XML parser encounters start or end tags.
There are separate handlers for start tags and end tags.
|
| xml_set_character_data_handler()
|
Used to define the event for 'handling
character, for example, swapping HTML for a start tag. Character
data is roughly all the non-markup contents of XML documents,
including the whitespace between tags. Note that the XML parser
does not add or remove any whitespace; it is up to the
programmer to decide whether the whitespace is significant.
|
| xml_set_processing_instruction_handler()
|
PHP programmers should be familiar with
processing instructions (PIs) already. is a processing
instruction, where PHP is called the 'PI target'. The handling
of these is application-specific, except that all PI targets
starting with "XML" are reserved.
|
| xml_set_default_handler()
|
This handler is the default and should always be
used. It will be called for each piece of XML that doesn't have
a set handler. The structure is similar to the switch structure
in PHP, where this is the default case.
|
| xml_set_unparsed_entity_decl_handler()
|
This handler will be called when an unparsed (NDATA)
entity is found in the XML.
|
| xml_set_notation_decl_handler()
|
This handler is called when a notation is found in
the XML document.
|
| xml_set_external_entity_ref_handler()
|
This handler is called when the XML parser finds a
reference to an external parsed general entity. This can be a
reference to a file or URL. For a demonstration of the external
entity example, refer to: http://www.php.net/manual/en/
ref.xml.php#example.xml-external-entity
|
The SAX API doesn't allow for writing XML. So, here we will work
with the three handlers that can read an XML file:
-
xml_set_element_handler()
-
xml_set_character_data_handler()
-
xml_set_default_handler()
Using PHP's SAX Support
Support for SAX is built into PHP by default in the form of the
Expat extension. Expat allows programmers to parse XML, either
from strings or files, and create XML parsers. Expat doesn't
allow for validating or checking for well-formedness of XML.
If the XML file is not well-formed, SAX will process as much of
the file as it can, up to the point of error. When it encounters
an error, Expat will spit out error messages like this:
XML error: mismatched tag at line 4
The exact error message depends on how you set up error handling
in your code. There are over twenty different error codes that
can be returned.
SAX does not write XML. To write XML, two classes are available:
-
xmlwriterclass (http://freshmeat.net/projects/xmlwriterclass/)
by Manuel Lemos – writes well-formed valid XML documents to the
browser
-
XMLFile by Chris Monson – writes well-formed valid XML
documents to a file
Alternatively you can write your own custom code to write an
array to the file system formatted as XML.
Verifying XML Support
Professional PHP4 Programming
SAX Example Code Page 28
|