Title Image

Don Xml's Grok This

The home of Don Demsak
Welcome to Don Xml's Grok This Sign in | Help
in Search

This Blog

Syndication

Site Sponsors

DonXml's All Things Techie

Waking Up From a DOM Induced Coma

I couldn’t really decide on a title for this post.  Originally it was going to be “A Guide to the XML Parser API Wars”, but I thought that might be a little too negative.  I was looking for something that would catch the average reader’s eye, but also put a positive spin on a hotly debated topic.  If you have read this far, then it worked, and don’t stop reading until the end of this post, since (IMHO) this is a very important topic (well, for .Net developers, and others interested in XML).

I know I haven’t been posting as much high quality stuff over recent weeks, but it is due to time constraints (buying another house will do that to you), and not for a lack of interest.  Plus, there’s a ton of behind the scenes stuff going on, and the gory details would be boring, but the final result will (hopefully) blow you away.

Back to the original topic, waking up from a DOM induced coma, what do I really mean with this statement?  Well, as all good XML developers know, there are 2 main (could also be called “standard”) methods to parse your XML: using the DOM, which loads the whole document tree into memory and is (relatively) easy to use, or via SAX, which is extremely fast, doesn’t have a large memory footprint, but requires a bit of (repetitive) coding techniques.  Since the majority of .Net (and previously VB/MSXML) developers didn’t need the performance of SAX, we typically used the DOM method to parse XML in our applications.  When Microsoft rolled out .Net, they didn’t include a SAX parser for .Net, but something just a fast (and just as complicated to code against) called the XmlReader (which is basically the pull version equivalent of the push style SAX parser), and .Net developers still had basically two ways to parse XML, and most developers still used the DOM.  If you want to parse some XML, you used the DOM.  If you had to worry about memory or performance, you used one of the XmlReaders.  Life was good, and as developers we fell into a DOM induced coma.

But then. slowly (very slowly) word got out about this new fangled XPathNavigator method for parsing XML in .Net.   It was a cursor model style of parsing XML, that although it still had to load the entire XML document into memory, was a much faster parsing method than the DOM, and could be used in a lot of cases where the DOM was being used.  The down side was that it was read only, but hey, so is the XmlReader, and XPathNavigator was easier to use that an XmlReader.  Slowly, a few developers started to wake up, and start using the XPathNavigator instead of the DOM.  Then, at the 2003 PDC, it was announced that the next version of .Net was going to contain an updateable XPathNavigator, and the folks that woke up started to chant “The DOM is dead”.  But still, the majority slept.

The developers who did wake up to the news of new XML parsing methods started to communicate with others that had awaken from their slumber, and started to share stories, and began dreaming of new ways to parse XML.  Dare Obasanjo wrote an article in the October 2003 XML Journal, Can One Size Fit All – Exploring the Possibility of One API for XML Processing which spoke about the different ways to parse XML, and maybe, just maybe we can create one base API for processing XML, and then build others on top of that.  Then others started to play with newer XML processing models.  Daniel Cazzulino started prototyping something he called Xml Streaming Events which gave the performance of XmlReader (or SAX) but still let you use XPath like statements.  Oleg Tkachenko explained his ForwardXPathNavigator idea, which is similar to XPathNavigator (and Daniel’s XSE), but only allows for Forward only XPath statements.  Finally, it came back full circle to Dare, when he mentioned Arpan Desai’s (another MS Program Manager) Introduction to Sequential XPath Paper (from way back in May of 2001). 

But, Dare dropped a bomb that should eventually wake the rest from the DOM induced Coma, the XPathReader for processing XML.  It is very similar to some classes that are currently only available in Biztalk 2004 (which Oleg did mention in this post), and combines the best of Daniel’s and Oleg’s ideas, along with Arpan’s Sequential XPath (aka SXPath).  The very cool thing is that it isn’t part of the .Net framework, but will be open source, so we will not have to wait until Whidbey (or whatever release it may have made it into) before we can use it.  The down side is that Dare hasn’t release it yet, but will release it as part of the MSDN Xml Dev Center rollout.  Why do I think this is so cool?  Well it looks like .Net finally hit critical mass.  The developer community is large enough, and educated enough, to be able to dramatically impact one of the core tenets of .Net, XML.  It is something that the folks in Microsoft that believed in .Net way back in the beginning should be proud of.

Published Thursday, February 26, 2004 8:42 AM by donxml
Filed under: ,

Comments

Daniel Cazzulino said:

Having public discussion of the future of XML APIs in .NET is a very good thing.
My bet is that opensource developments will probably be better than MS's, just because an OS community can move faster, make breaking changes MS can't afford, and quickly respond to users' demand because users are usually the very same developers.
I'm not holding my breath on the XPathReader, however. I believe MS APIs have to be WAY more open, flexible and pluggable throughout in order to be of real use in advanced scenarios. Otherwise, what looks like a "hhuuuhhh" feature may end halfway for either an advanced developer (too simple/unflexible) and a Mort developer (a.k.a. "Joe Blow", too complex/I want DOM).
So, in this regard, I believe SUN is doing a good job at concentrating on pluggable and standard interfaces and specifications, and letting whoever wants to take the time to implement custom stuff.
I don't want to "new XmlTextReader". I want some app/system-wide factory take care of creating the appropriate parser implementation for me based on declarative configuration, and I want my to code to work against a single unified interface/base class always.
Changing the parser shouldn't mean I have to change my working app code. If MS provides the appropriate abstractions, it wouldn't even be necessary to rely on some implementation-specific feature such as XmlTextReader.GetRemainder that is not part of the abstract contract defined by XmlReader.
February 26, 2004 6:35 PM

TrackBack said:

February 27, 2004 7:12 AM

TrackBack said:

"We Love XPathReader" Petition
February 27, 2004 7:58 AM

TrackBack said:

February 28, 2004 3:18 AM

TrackBack said:

March 4, 2004 7:47 AM

TrackBack said:

May 6, 2004 1:46 PM
Anonymous comments are disabled

About donxml

I’m an independent consultant, specializing in .Net solutions architecture, based out of New Jersey who also doubles as an evangelist for XML, Domain Driven Design, enterprise architecture and .Net. I do not work for Microsoft, the W3C or any other big company that you may know of (at least not yet). I’ve been an indie for over ten years, and although I’ve been tempted a couple times to take a job with companies like Microsoft, I’ve haven’t found something better than my current situation. I work mostly with the large pharmaceuticals that are based here in New Jersey, and usually find myself on long term contracts. Definitely not the prototypical indie consultant, but it lets me dedicate time to my non-income generating activities like the developer community stuff, plus financing open source projects like XPathmania and MVP-XML. If you would like to talk to me about doing some contract work, just contact me via the contact page. My rates vary widely, depending on lots of different variables, but mostly distance from Jersey, and type of work. Plus, I’ve been known to donate some of my code for various projects.
Powered by Community Server, by Telligent Systems