PHP XML Expat Parser

PHP XML Expat Parser

The Expat parser is an event-based parser built into PHP.

Unlike SimpleXML which builds a full tree of the document in memory, Expat reads the file piece by piece. When it encounters an opening tag, character data, or a closing tag, it triggers a custom function you define. This makes it incredibly fast and suitable for parsing massive XML files without running out of RAM.


How Event-Based Parsers Work

When you read the following XML block: <from>Jani</from>

The Expat parser generates three separate events:

  1. Start element: <from> (Triggers a "Start Handler" function)
  2. Character data: Jani (Triggers a "Data Handler" function)
  3. End element: </from> (Triggers an "End Handler" function)

Building an Expat Parser

To use Expat, we must initialize the parser, define our handler functions, and then feed the XML data into it.

Expat Parsing Example

<?php
// 1. Initialize the XML parser
$parser = xml_parser_create();

// 2. Define the Start Element Handler function startHandler($parser, $element_name, $element_attrs) { echo "Starting Tag: " . $element_name . "\n"; }

// 3. Define the End Element Handler function endHandler($parser, $element_name) { echo "Ending Tag: " . $element_name . "\n"; }

// 4. Define the Character Data Handler function charHandler($parser, $data) { // Ignore empty whitespace lines if (trim($data) != '') { echo " Data: " . $data . "\n"; } }

// 5. Register handlers xml_set_element_handler($parser, "startHandler", "endHandler"); xml_set_character_data_handler($parser, "charHandler");

// 6. Parse the Data $xmlData = "<note><to>Tove</to><from>Jani</from></note>"; xml_parse($parser, $xmlData);

// 7. Free the memory xml_parser_free($parser); ?>


Exercise

?

Why is the Expat parser better suited for very large XML files?