Title Image

Don Xml's Grok This

The home of Don Demsak
Welcome to Don Xml's Grok This Sign in | Help
in Search

This Blog

Syndication

Site Sponsors

DonXml's All Things Techie

Not the Way to Introduce XmlTextReader

Thom Robbins is a great guy, but unfortunately for him he has bumped into one of my major pet peeves, Viral Coding Examples with his Introducing the XmlTextReader post.  It really isn’t his fault, since the code he uses is very similar to the code example in the XmlTextReader.Read()  documentation, and I complained about that code to the System.Xml team at the MVP Summit.  I did promise to write something up on it, and Thom’s post finally got me to do it (it has been months since I promised to write this up).

The problem is in the structure of the code:

Dim xmlFileStream As New FileStream("cust.xml", FileMode.Open)
Dim xmlRead As New XmlTextReader(xmlFileStream)

While xmlRead.Read
    xmlRead.MoveToContent()
     If xmlRead.HasValue Then
         MsgBox(xmlRead.Value)
     End If
End While

xmlRead.Close()

xmlFileStream.Close()

At first glance the code looks perfectly fine.  But, knowing full well that some developer new to System.Xml will be using this code as a template for bigger things we have an obligation to make it easier for them to adapt this code without causing “strange” errors.

Problem #1

No explicit setting of the WhitespaceHandling option.  Unless the developer is familiar with XmlTextReader (which shouldn't be in this case), they would not know that the default is WhitespaceHandling.All, which causes the reader to return all Whitespace and SignificantWhitespace nodes (which will definitely confuse the developer).  So after the declaration of the xmlRead variable you should set the WhitespaceHandling property.

xmlRead.WhitespaceHandling = WhitespaceHandling.None

At least now the developer realizes that there is a property for WhitespaceHandling, and will/can change it as needed.

Problem #2

Implicit Control of Reads in While Loops.  My biggest problem with the code examples used to XmlTextReader has to do with the While xmlRead.Read loops.  Although it looks very harmless, the while loop that executes a read at the beginning (or the end) of a looping structure will cause bugs to creep into the code because of the way other methods on the XmlTextReader handle the cursor used to point to the current node.  If all you do is execute Reads via the while loop, you are perfectly fine.  But once you add code that manipulates the cursor from within the while loop, now you run the chance of skipping nodes accidentally.

Here’s a great example.  You have an XML stream that looks like this:

<ROOT>
 <LEVEL1>
  <LEVEL2>1st Level2 text node</LEVEL2>
  <LEVEL2>2nd Level2 text node </LEVEL2>  
 </LEVEL1>
</ROOT>

And you want to print out the contents of the elements level2, so you modify the standard code example to look like this:

While xmlRead.Read
    xmlRead.MoveToContent()
    If xmlRead.IsStartElement() then
        If xmlReader.Name = “level2” then
           MsgBox(xmlRead.ReadInnerXml())
        End If
    End If
End While

And you know what, it works fine.  But say the XML stream does not have all that pretty whitespace, or that they took my advice and set the WhitespaceHandling property (in this case to None).  Now the code doesn’t work, since you had a bug in your code and you didn’t know it.  What?  How is that?  Well, the ReadInnerXml method reads to the first node past the EndElement.  In the case of the XML Stream with the nice formatting (and when WhitespaceHandling is All) the next node is a whitespace node, and when the while loop fires the Read method, all is well and the cursor is moved to the next Node (which should be the next StartElement, otherwise the MoveToContent method will move you to the next content node (which is any node that is non-white space text, CDATA, Element, EndElement, EntityReference, or EndEntity)).  But without the whitespace nodes to stop the ReadInnerXml method the cursor is positioned at the next non whitespace node (which in this case is the StartElement for level2) and then the while loop fires the Read method, and when we enter the loop we have now read past the StartElement and the if condition is not met (and we skip the whole element).

So, what is a better example for an introduction to the XmlTextReader?  Explicitly control when a Read is executed.

XmlRead.WhitespaceHandling = WhitespaceHandling.None
Dim Continue as Boolean
If xmlRead.Read = False then
    Continue = False
End If
While Continue
    If xmlRead.IsStartElement then
        If xmlRead.Name = “level2” then
            MsgBox(xmlRead.ReadInnerXml())
        Else
            Continue = xmlRead.Read()
        End If
    Else
        Continue = xmlRead.Read()
    End If
End While

Now we have explicit control over when a Read is executed, and in the case of rogue methods that place your cursor to the next node (that you haven’t tested yet), you can skip the implicit Read.

If you want, you can download a fully functional example with 5 different test cases.

Published Monday, November 29, 2004 7:01 PM by donxml
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

TrackBack said:

November 29, 2004 7:04 PM

Don Demsak said:

Anyone care to point out the bit of viral coding example that I purposely left in the final example? Ed over at SharpLogic noticed an infinite loop bug that I missed in my haste to convert C# to VB.Net (didn't set the Continue in the else statements).
November 29, 2004 9:33 PM

Jiho Han said:

Err, I don't know about viral coding but could you be missing some angle brackets in your XML stream? Or am I missing something?
November 30, 2004 6:09 AM

Don Demsak said:

Thanks. I had it fixed on my othere blog site, and thought I fixed it here, but it seems that I had to switch to HTML View to fix the error.
November 30, 2004 6:21 AM

TrackBack said:

November 30, 2004 7:34 AM

Chris Nevill said:

This was quite useful. I'm wondering... why isn't there an IsEndElement? Is there a way of telling?
February 5, 2007 4:00 AM

Keith said:

Thanks don, this was just what I was looking for to solve my innerXML problem
November 23, 2007 3:10 AM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About donxml

I’m an independent consultant, specializing in .Net solutions architecture, based out of New Jersey who also doubles as an evangelist for XML, Domain Driven Design, enterprise architecture and .Net. I do not work for Microsoft, the W3C or any other big company that you may know of (at least not yet). I’ve been an indie for over ten years, and although I’ve been tempted a couple times to take a job with companies like Microsoft, I’ve haven’t found something better than my current situation. I work mostly with the large pharmaceuticals that are based here in New Jersey, and usually find myself on long term contracts. Definitely not the prototypical indie consultant, but it lets me dedicate time to my non-income generating activities like the developer community stuff, plus financing open source projects like XPathmania and MVP-XML. If you would like to talk to me about doing some contract work, just contact me via the contact page. My rates vary widely, depending on lots of different variables, but mostly distance from Jersey, and type of work. Plus, I’ve been known to donate some of my code for various projects.
Powered by Community Server, by Telligent Systems