February 03, 2006
Parsing XML in C#, Part IIMark M. Baker
Should I migrate code that uses MSXML to .NET and use the managed System.Xml namespace or stick with the COM based MSXML? (Part II)
Last time we began looking at the following C# 2.0 code showing how to use MSXML to load and process some XML relying on COM Interop to handle the negotiation between our managed C# code and the MSXML COM native code:
using System;
using System.Collections.Generic;
using System.Text;
namespace XmlTester
{
class XmlWithMsXml
{
static void Main(string[] args)
{
string xmlText = "<?xml version=\"1.0\"?>
<basket><fruit>apple</fruit><fruit>pear</fruit>
<fruit>orange</fruit></basket>";
try
{
// create an XML6 parser
MSXML2.DOMDocument60 msxml =
new MSXML2.DOMDocument60();
msxml.async = false;
msxml.resolveExternals = false;
// load XML
msxml.loadXML(xmlText);
if (msxml.parseError.errorCode != 0)
{
// some kind of parsing error found..
throw new Exception("parsing error: " +
msxml.parseError.reason);
}
// query the XML
MSXML2.IXMLDOMNodeList nodes =
msxml.selectNodes("/basket/fruit");
foreach( MSXML2.IXMLDOMNode node in nodes )
{
Console.WriteLine(node.text);
}
}
catch( Exception e )
{
// an exception happened.
Console.WriteLine(e.Message);
}
}
}
}
So we've created the parser and specified the default properties that we care about, especially that we want the parser to load the XML synchronously. Next, we load the XML text into the parser via a call to Load. Interestingly, the XML parser returns errors discovered during parsing via the parseError property which is actually another object containing the actual error information. In this case, we test for a non-zero error codeif we have one, we stop processing and leave the code.
An interesting aspect of this code for former COM developers is the lack of calls to AddRef/Release when using the parserparticularly in the error handler. Since .NET has generated an Interop layer for the imported COM DLL, the layer handles this quietly on behalf of the developer. So there's no concern that the MSXML COM object might never get deleted due to bad reference counting on the part of the developer. Just another example of how .NET lets you worry about your application logic rather than component housekeeping.
The last bit of code asks for the set of nodes in the XML that match the XPath string "/basket/fruit" which means "find all nodes that have /basket/fruit as the root of their XML path." The nodes are then just written out for this example. Not too bad.
A final word about MSXML before we move on to the System.Xml .NET namespace. One compelling reason to move to MSXML6 from the older MSXML3/4 versions is that MSXML6 is now conformant with several W3C specificationsXML 1.0, XML Schema 1.0, XPath 1.0 and XSLT 1.0. If you want conformance with the standards, MSXML6 is the way to go if you're sticking with this COM-based technology.
Now, we'll take a look at the same code but this time using the .NET framework itself, specifically the System.Xml namespace:
using System;
using System.Text;
using System.Xml;
namespace XmlTester
{
class XmlWithManagedXml
{
static void Main(string[] args)
{
string xmlText = "<?xml version=\"1.0\"?>
<basket><fruit>apple</fruit><fruit>pear</fruit>
<fruit>orange</fruit></basket>";
try
{
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlText);
XmlNodeList nodes =
xmlDoc.SelectNodes("/basket/fruit");
foreach( XmlNode node in nodes )
{
Console.WriteLine(node.InnerText);
}
}
catch( XmlException e )
{
Console.WriteLine(e.Message);
}
catch( Exception e )
{
// an exception happened.
Console.WriteLine(e.Message);
}
}
}
}
The first thing is that we now include the System.Xml namespace. This isn't required, of course, but it does make it easier when using the various System.Xml classes not to have to fully qualify the names.
Moving down through the code we see that instead of an MSXML2.DOMDocument60 object, we create an XmlDocument object. This is the top level XML class and is similar to the DOMDocument60 class in most ways. However, it is notable that the async property of MSXML2.DOMDocument60 is not provided in XmlDocument. Instead of the document providing support for this, the XmlDocument object can use an XmlReader object that reads the text a chunk at a time. This is another area where System.Xml differs from MSXML6there are great number of helper classes that can be used to offload tasks and that work collectively to manage things such as the text, schemas, etc.
The portion of the code that demonstrates looping across the set of nodes is remarkably similar to the MSXML6 approach. The method SelectNodes is called with a standard XPath expression. However, instead of accessing the value by calling the text property, we call the InnerText property.
Finally, you may have noticed that after loading the xml via LoadXml(), we do not test for errors. Instead of requiring the caller to verify that the load succeeded (actually the parsing), XmlDocument uses exceptions to indicate that an error has occurred. The typical exception that is thrown is System.Xml.XmlException and you can see from the code fragment that it is one of the exceptions we handle specifically.
This is a brief comparison of the two technologies with some examples. At the present time, MSXML6 is the best release of the COM based technology, and it fully supports related W3C specifications. System.Xml is the .NET implementation of the same specifications. However, over time it is likely that there will come a point where Microsoft deprecates use of MSXML6 (or whatever version it is at that point) in favor of the managed implementation. If you have code that uses MSXML6 right now that works, keep it. But also keep your eye on System.Xml and start the transition to it as time permits.
Whether you are moving to .NET from Win32 or have been using .NET since 1.0, I encourage you to listen to .NET Cast an Internet radio show that I host and produce for CMP Media. Hear from the people behind .NET and key experts in the industry working with it. For those of you with RSS readers, you can find the feed at http://syndication.sdmediagroup.com/feeds/public/cmp_podcast_dotnet.xml.
Mark M. Baker is the Chief of Research & Development at BNA Software located
in Washington, D.C.
|
|
||||||||||||||||||||||||||||
|
|
|
|