XmlReader and XmlWriter
- XML Reader Navigation
- Reading Attributes
- Movement and Advanced Reads
- Creation Using XMLWriter
- Validating Your Work
- Summary
The previous article in this series discussed the DOM API for XML data access in the .NET framework. It models an XML document as a tree of nodes kept in memory while a client is using the document. The top part of Figure 1 shows how the DOM parser produces a tree of nodes. The client access methods allow forward and backward movement, much like the client-side, static cursor in ADO data access on the Microsoft platform. The cost of this type of capability is the resource requirements needed to keep each node in memory. This can present a real problem when document sizes become too large to fit in memory in an efficient manner.
Figure 1 - DOM vs. SAX vs. XmlReader
The SAX API was one of the first attempts in the XML community to solve the problem of parsing large XML documents. The SAX parser uses a streaming model that moves through a document piece by piece and doesn't retain a copy of the nodes once they are traversed. A client implements a set of interfaces that are called by the parser when it passes over nodes in the document. Methods are called for the document, elements, attributes, and other types of nodes. This type of API is called a "push model" because the document information is pushed from the parser to the client. The middle part of Figure 1 illustrates the SAX approach to parsing.
The downside of this approach is that keeping track of the current context of an element or attribute in a complex document can be difficult. The client code must provide some sort of state model to guide its actions. Another limitation is that the client must implement all interface methods, and the methods are always called by the parser, regardless of whether the client is interested in them.
Microsoft recognized the state of parsers in the XML world and designed an efficient XML API that was also simple to use. Microsoft borrowed a concept from the "firehose" cursor used in ADO. The XmlReader abstract class describes an interface that provides a read-only, forward-only cursor model over the nodes in an XML document. It is called a "pull model" because the client pulls the data from the XmlReader a piece at a time and has control of moving the cursor along the document. The bottom part of Figure 1 contrasts the access methods of the XmlReader with those of the DOM and the SAX models.
XMLReader Navigation
The interface of XmlReader is more akin to an ADO recordset than the XML DOM. The XmlReader class has properties and methods that return the values and move the cursor across the document content. You can access value, node type, and namespace information quickly and easily using the properties shown in the following list.
- Value
- NodeType
- LocalName
- Prefix
- NamespaceURI
- Depth
- EOF
- HasAttributes
- AttributeCount
A key thing to remember is that the XmlReader class has to be positioned on a node to read its values. In fact, to read the first node of the document, you must call the Read method to position it. Subsequent calls to the Read method move the cursor down the line in the document.
Because the XmlReader class is abstract, concrete derivative classes provide the actual functionality. Three main implementations are available to clients in the .NET framework: TextReader, NodeReader, and ValidatingReader. TextReader operates over a document that is in a serialized, string format. NodeReader operates over a set of nodes that have already been parsed in an XmlNode format used by the XmlDocument class, described in the previous article. ValidatingReader is based on the TextReader but adds an extra layer that does XML schema validation of the document being read.
We bring back the purchase order XML document from the previous example to illustrate how to use the XmlReader in Listing 1.
Listing 1: PO.xml
<?xml version="1.0" encoding="utf-8" ?> <po:PurchaseOrder xmlns:po="http://michalk.com/XmlDOM/PO.xsd" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Number>1001</Number> <OrderDate>8/12/01</OrderDate> <BillToAddress> <Street>101 Main Street</Street> <City>Charlotte</City> <State>NC</State> <ZipCode>28273</ZipCode> </BillToAddress> <ShipToAddress> <Street>101 Main Street</Street> <City>Charlotte</City> <State>NC</State> <ZipCode>28273</ZipCode> </ShipToAddress> <LineItem Name="Computer Desk" Description="Wood desk for computer" SKU="12345A123" Price="499.99" Qty="1" /> </po:PurchaseOrder>
Our first crack at the XmlTextReader parsing shown in Listing 2 navigates through all the element nodes in the document. Element navigation is the default traversal of a document using the Read method. The loop that traverses the document uses the return value from Read to tell it when to break out of the loop. We could have used the EOF property to accomplish the same thing. The resulting output is shown in Listing 3.
Listing 2: XmlTextReader Default Navigation Code
using System; using System.Xml; namespace XmlDemos { public class ReaderNavDefault { public static void Main() { // load the PO document XmlTextReader reader = new XmlTextReader("PO.xml"); // ignore the whitespace in the document reader.WhitespaceHandling = WhitespaceHandling.None; Console.WriteLine("Walking nodes in document..\n"); // read each node in the document // using the default traversal path while (reader.Read()) { // call routine for printing out spacing PrintDepth(reader.Depth); // display node information Console.WriteLine("Name:{0} Type:{1} Value:{2}", reader.Name, reader.NodeType, reader.Value); } } public static void PrintDepth(int depth) { for (int i=0; i < depth; i++) Console.Write(" "); } } }
Listing 3: XmlTextReader Default Navigation Output
Walking nodes in document.. Name:xml Type:XmlDeclaration Value:version="1.0" encoding="utf-8" Name:po:PurchaseOrder Type:Element Value: Name:Number Type:Element Value: Name: Type:Text Value:1001 Name:Number Type:EndElement Value: Name:OrderDate Type:Element Value: Name: Type:Text Value:8/12/01 Name:OrderDate Type:EndElement Value: Name:BillToAddress Type:Element Value: Name:Street Type:Element Value: Name: Type:Text Value:101 Main Street Name:Street Type:EndElement Value: Name:City Type:Element Value: Name: Type:Text Value:Charlotte Name:City Type:EndElement Value: Name:State Type:Element Value: Name: Type:Text Value:NC Name:State Type:EndElement Value: Name:ZipCode Type:Element Value: Name: Type:Text Value:28273 Name:ZipCode Type:EndElement Value: Name:BillToAddress Type:EndElement Value: Name:ShipToAddress Type:Element Value: Name:Street Type:Element Value: Name: Type:Text Value:101 Main Street Name:Street Type:EndElement Value: Name:City Type:Element Value: Name: Type:Text Value:Charlotte Name:City Type:EndElement Value: Name:State Type:Element Value: Name: Type:Text Value:NC Name:State Type:EndElement Value: Name:ZipCode Type:Element Value: Name: Type:Text Value:28273 Name:ZipCode Type:EndElement Value: Name:ShipToAddress Type:EndElement Value: Name:LineItem Type:Element Value: Name:po:PurchaseOrder Type:EndElement Value: