FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
Dobbs M-Dev
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
January 17, 2006
Parsing XML in C#

Mark Baker
Should I migrate code that uses MSXML to .NET and use the managed System.Xml namespace or stick with the COM based MSXML?
Parsing XML in C#

Few applications written today are missing XML support as the technology and acceptance of XML continues to expand. The myriad of proprietary text-based file formats from old Win16 style .INI files (name-value pairs) to comma-delimited (CSV) to row-based tagged formats (where each row's format is signaled by a row tag) are increasingly being consumed by use of standard XML (and its cousins XSD and XSLT). This is great news for developers working with text formats, particularly if they need to integrate with third parties either within their organization or outside it-the need for copious amounts of documentation supporting the often esoteric formats can be dispensed with to a great degree. It's amazing how quickly the discussion between two developers who are interacting for the first time about a file storage format turns into something like this:

Developer 1: “So what is your file format?”
Developer 2: “We use XML and have an XSD for it also”
Developer 1: “Okay, now for my next question...”

If you've been using XML for some time now, you've already seen the benefits of the technology and this is old news.

In the pre .NET world, the common Microsoft-based XML parsers were the MSXML family, most recently MSXML3, MSXML4, and now with .NET 2.0 being released, MSXML6. The COM-based components implement support for the DOM (Document Object Model) as defined by the W3C. Most applications for Windows that use Microsoft technology rely heavily on these parsers for their XML support.

In the .NET world, Microsoft reimplemented XML support within the System.Xml namespace to also be compliant with the W3C specification. Because the parsers follow a well-defined specification, most XML that worked with the earlier MSXML parsers will work fine with the .NET managed parser.

However, because .NET supports working with COM-based technologies through COM Interop (essentially a managed interface to unmanaged code), this presents a dilemma to developers who are working with code written to use the older MSXML parsers. Or even whether to continue to use MSXML for new code, particularly with the investment in learning the rich COM API of MSXML.

For the foreseeable future, MSXML provides a stable and known platform for interacting with XML in .NET. For developers who wish to continue to use it for the time being, this is a perfectly acceptable approach. Having said that, you need to appreciate that Microsoft is aggressively moving towards a pure managed code environment and future versions of Microsoft Windows such as Blackcomb will run as pure .NET environments with unmanaged code running in a type of “compatibility box.” For those of you who moved from DOS to Windows back in the early 1990s, you know what I'm referring to here.

Whether you choose MSXML or System.Xml as your XML technology is a matter of personal preference (for now). Down the road, you will certainly need to begin learning the System.Xml namespace and its API differences with MSXML if you elect to continue to use MSXML. To get an idea of some of these differences in using these technologies, we're going to examine some code that does the same set of tasks using MSXML and then System.Xml.

The following is a C# 2.0 code block that shows a small block of XML getting loaded, parsed, and queried:

using System;
using System.Collections.Generic;
using System.Text;
namespace XmlTester
{
class XmlWithMsXml
{
static void Main(string[] args)
{
string xmlText = "<?xml version=\"1.0\"?>
<basket><fruit>apple</ruit><fruit>pear</fruit>
<fruit>orange</fruit></basket>";
try
{
// create an XML6 parser
MSXML2.DOMDocument60 msxml =
new MSXML2.DOMDocument60();
msxml.async = false;
msxml.resolveExternals = false;
// load XML
msxml.loadXML(xmlText);
if (msxml.parseError.errorCode != 0)
{
// some kind of parsing error found..
throw new Exception("parsing error: " +
msxml.parseError.reason);
}
// query the XML
MSXML2.IXMLDOMNodeList nodes =
msxml.selectNodes("/basket/fruit");
foreach( MSXML2.IXMLDOMNode node in nodes )
{
Console.WriteLine(node.text);
}
}
catch( Exception e )
{
// an exception happened.
Console.WriteLine(e.Message);
}
}
}
}

Let's take a look at this code in more detail to see what is going on. First, though, I need to mention how the MSXML6 parser was imported into this project. In order to use a COM component such as MSXML in a .NET application, you must first add a reference to it in your Solution workspace. Simply open the References area under the Solution, browse to the MSXML DLL in your Windows\System32, and select it to be added to the workspace. In my case, I have MSXML3, MSXML4, and MSXML6 on my computer, so I selected MSXML6. Upon adding the COM component to my Solution, Visual Studio generates an interop DLL that contains the code that allows me to see the DLL as managed, even though it really is unmanaged.

The first few lines of the Main method involve creating an MSXML6 parser. Although I could have specified a “using namespace MSXML2” at the top of the file, I chose to leave that out and fully declare the name of the MSXML items I'm working with for clarity. For MSXML developers, this set of code is often seen in unmanaged applications-for example, I have to tell the parser to assume a synchronous method for parsing the XML since it is asynchronous by default.

Next time, we'll continue looking at this code and begin looking at a System.Xml version of it.

Whether you are moving to .NET from Win32 or have been using .NET since 1.0, I encourage you to listen to .NET Cast an Internet radio show that I host and produce for CMP Media. Hear from the people behind .NET and key experts in the industry working with it. For those of you with RSS readers, you can find the feed at http://syndication.sdmediagroup.com/feeds/public/cmp_podcast_dotnet.xml.


Mark M. Baker is the Chief of Research & Development at BNA Software located in Washington, D.C.
Do you have a Windows development question? Send it to dotnetcast@sunburstsoftware.com.


TOP 5 ARTICLES
No Top Articles.



MICROSITES
FEATURED TOPIC

ADDITIONAL TOPICS

INFO-LINK