FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
January 01, 2002
Taming the XML Beast, Part II

webreview.com: Taming the XML Beast, Part II

Rank: 3

XML Resources

Taming the XML Beast, Part I: The first part of this article, with a helpful resource list included.

XML School: From beginner to advanced, lots of helpful tutorials, links, and community.

MSDN XML Tutorial : Microsoft Developer Network provides comprehensive developer info on XML

Projectcool XML Developer Zone: Comprehensive tutorials, updates, and resources.

Last week, I introduced the basics of XML: I discussed how XML is about data rather than presentation, and I covered the basics of XML syntax, including elements and attributes. This week, we'll look more deeply at XML—its demand for syntactical rules, how you can gain control over your XML documents using document type definitions, and how to add style using style sheets.

Well-Formed XML Documents

In order for an XML document to be properly written—in XML jargon, well formed—the document must adhere to specific rules. I'll introduce the general rules here, and then follow up with details.

  • The document must begin with an XML declaration to identify it as an XML document.
  • An XML document may only have one unique root element.
  • All nonempty elements must have an opening and closing tag.
  • An empty element is terminated with a forward slash preceding the "less-than" bracket of the tag.
  • Nesting of tags is allowed, but tags must not overlap.
  • All attribute values must be quoted.
  • There are five predefined character entities.

Let's look at these rules in detail.

The XML Declaration

When writing an XML document, you must include the XML declaration at the top of the document:

<?xml version="1.0"?>

The <? and ?> are the start and end symbols denoting that this statement is a processing instruction. version is a required attribute, and 1.0 is the current XML version in use.

A Unique Root Element

XML documents should contain only one root element. A familiar example of this is found in XHTML, which is an application of XML. In that case, the root is the <HTML> element. Root elements may appear only once in a document. The rest of the page contents are contained inside this element.

Close All Non-Empty Elements

In HTML, many people insert <p> tags to start new paragraphs without closing them with the </p> end tag. While this is accepted in HTML, in XML, a non-empty element must begin and end with an open and closing tag respectively. Think of non-empty elements as containers with a lid and bottom. You open the container with a start tag, then fill the container with data and other elements. You then close the container with a closing tag.

Terminate Empty Tags

Not all elements contain content. For example, the HTML <br> is considered "empty." Empty elements in HTML don't use an end tag because there isn't any content to enclose. But XML requires that some kind of indicator be used to express that the element is complete. The way this is done in XML is to insert a forward slash preceding the > sign, for example, <br/>. However, some browsers may have difficulty reading the element apart from the slash, so it's legal—and recommended when writing documents for Web browsers—to put a space between the element name and the forward slash: <br />.

Nesting Symmetry

An element's content may contain other elements. But, you must close each start tag with its corresponding end tag in the reverse order it was opened (first opened; last closed). Back to our element container analogy, you can put other containers inside a container, but you have to close each inner container with its corresponding lid before you can close the outer containers:

An incorrect example:


 <name>
   <last_name>Shmoe<first_name>
   </last_name>Joe</name>
 </first_name>

A well-formed example:


  <name>
   <last_name>Shmoe</last_name>
   <first_name>Joe</first_name>
  </name>

Quoted Attribute Values

Unlike HTML attributes, attribute values in XML must be quoted, either by a pair of double quotes or a pair of single quotes.

Unacceptable non-quoted attribute:
<table width=400>

Correctly quoted attribute:
<table width="400"> or <table width='400'>

Character Entities

Whenever the XML parser encounters certain characters like the < and > symbols, it interprets them as instructions. So to use these symbols in your content text, you have to use their entity references. Most HMTL developers are familiar with the non-breaking space entity, &nbsp;. In XML, only five character entities have been predefined:

 &gt;  		> 	greater than
 &lt; 		< 	less than
 &amp;	 	& 	ampersand
 &apos; 	' 	apostrophe
 &quot; 	" 	double quote

A Well-Formed Addressbook: XML Document Sample

With these basic rules under our belt, let's examine a well-formed XML document:


<?xml version="1.0" standalone="yes"?>
<address_book>
<listing>
<name>
<last_name>Shmoe</last_name>
<first_name>Joe</first_name>
</name>
<address>
<street>1313 Mockingbird Lane</street>
<city>Beverly Hills</city>
<state>CA</state>
<zip>90210</zip>
</address>
</listing>
</address_book>
The very first line is the XML document declaration. Following the XML declaration is the root element <address_book>. The root element appears only once and everything else in the document is contained within this element. An XML document can also have other processing instructions which would appear outside the root element, following the document declaration in a fashion similar to an HMTL document's <head> section. Each subsequent element begins with a start tag, contains some content (either data or nested elements), and ends with a closing tag.

xml example
Figure 1: XML structure in IE5.
Save your document with the .xml extension and load it into a browser that supports XML (Figure 1). Different parsers will process a plain XML file differently. In IE5, you get a nice structured view of your document. You can click on the - (minus) sign to collapse an element, or the + (plus) sign to expand it again. If your document isn't well-formed, you will receive an error message explaining the problem.

Valid XML Documents

A well-formed XML document may be fine for standalone pages, but to make real use of XML, you'll want to specify unique guidelines. These guidelines describe elements your XML must contain, the sequence of those elements, and what contents those elements contain. This is done using a DTD (document type definition). When an XML document follows the basic XML rules for well-formedness and the rules of its specified DTD, it is said to be a valid XML document.

Why bother with a DTD? Well, let's say I want to share my address book with a friend. He wants to merge my data into his own address book. To do that, we must both share the same tag set so the data can be used in exactly the same way. So we would have to collaborate and work together to come up with a DTD that would work for both of us. That's another great advantage of XML: with an accepted DTD, different parties can share and exchange data regardless of the application used to process that data.

Writing DTD's can end up being a fairly complex process. I'm providing just a small sample below. You can imagine that the more details you want to have in your DTD, the longer and more involved it becomes. You can learn more about writing DTD's in detail at one of the tutorial sites included in the sidebar of this article.


<!ELEMENT  address_book (listing+)    >
<!ELEMENT  listing   (name, address)   >
<!ELEMENT  name   (last_name, first_name) >
<!ELEMENT  last_name  (#PCDATA)    >
<!ELEMENT  first_name  (#PCDATA)    >
<!ELEMENT  address   (street, city, (state|province), zip) >
<!ELEMENT  street   (#PCDATA)    >
<!ELEMENT  city   (#PCDATA)    >
<!ELEMENT  state   (#PCDATA)    >
<!ELEMENT  province  (#PCDATA)    >
<!ELEMENT  zip    (#PCDATA)    >
Each element of the XML document is explicitly defined with its element name and the contents it may contain specified in the parenthesis. Elements such as address_book may contain one or more listing elements as denoted by the plus sign (+). Elements listed as content must appear in the order and freqency indicated in the element definition. #PCDATA simply means the element contains data.

While IE5 checks to make sure XML documents are well formed and checks the syntax of your DTD, it doesn't validate XML documents. You would need to install a third party XML parser to validate your XML document to your DTD.

Adding a Little Style

The XML output on IE 5 is great for outlining the tree structure of a document, but it isn't really the way you'll be displaying your data to users. XML only defines the structure of your data. Remember—XML is just about the data. If you want to modify the way your data is presented on a page, you'll need to use a style sheet.

xml with css style
Figure 2: XML and CSS.
Those who are using CSS (Cascading Style Sheets) for HTML documents are already familiar with the separation of presentation and document formatting. As with HTML, you can embed your style within the head of your XML documents. You can also use an external file. CSS is simple enough to use, and most contemporary browsers can render at least some CSS. However, when it comes to XML, CSS does have its limitations—it allows you to arbitrarily style your XML tags, but there is no way to process them. So while you may certainly use CSS to design your page, an XML-based style language has been created to process style. XSL, or eXtensible Style Language, was created from XML and adds formatting to your structure. It is, however, a little more complex to learn than CSS.

Here's a sample CSS style sheet for my address book:



address_book { font-family : sans-serif; }

name { display: block; 
 font-size : 14pt; 
 font-weight : bold; 
 color: #800000; }

address { font-size : 10pt; }

street { display: block; }

CSS should be familiar enough to most people. The only item that needs further explanation is display:block. In terms of formatting, elements are rendered "inline" as a string of characters such as with the HTML anchor element, or as "blocks," where elements such as the HTML <p> appear as separate blocks. Since XML itself declares no formatting whatsoever, data would be displayed in one long string one right after another. So we need to style certain elements to be "block" level, in essence, causing a hard return at the end of the line.

Here's a sample XSL style sheet for my address book:


<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">

<xsl:template match="/">
<html>
<head>
<title>Address Book</title>
</head>
<bodybgcolor="#ffffff">
<h1 align="center">My Address Book</h1>

<xsl:for-each select="address_book/listing">
<h2><xsl:value-of select="name/last_name"/>, 
<xsl:value-of select="name/first_name"/></h2>
<p><xsl:value-of select="address/street"/><br/>
 <xsl:value-of select="address/city"/>,
<xsl:value-of select="address/state"/> 
 <xsl:value-of select="address/zip"/>
</p>
</xsl:for-each>

</body>
</html>
</xsl:template>

</xsl:stylesheet>

In this XSL sample, we are using XSL to define an HTML-based template. Within this template, we've added a page title and header. Then for each listing the style sheet finds in our address_book, it will pull out the name and address data, and stick them in the appropriate location according to our styled template.

xml and xsl example
Figure 3: XML and XSL.

There! We now have a well-formed XML document styled with XSL!

XML & Beyond

Of course, there's a lot more to XML than what I've covered in this article. There are XML-based technologies used to build XML applications, such as XSLT (XSL Transformations), and XLink, which allows you to make any element a hyperlink.

Then there are applications created from XML to enhance Web development, such as SVG (Scalable Vector Graphics) and WML (Wireless Markup Language). XML DTDs have been created to be industry-standard data structures, such as MathML for creating complex mathematical formulas, or VoiceML used for making internet information available via phone and voice.

But because XML is far-reaching doesn't mean it's difficult to understand. Learning how to code XML is really nothing more than remembering a few syntax rules and knowing your data intimately. Once you've mastered XML syntax and data, the possibilities of what you can create using XML are endless!


Bonnie is a technical writer who designs and develops Web sites and creates system documentation.

RELATED ARTICLES
No Related Articles
TOP 5 ARTICLES
No Top Articles.
DR. DOBB'S CAREER CENTER
Ready to take that job and shove it? open | close
Search jobs on Dr. Dobb's TechCareers
Function:

Keyword(s):

State:  
  • Post Your Resume
  • Employers Area
  • News & Features
  • Blogs & Forums
  • Career Resources

    Browse By:
    Location | Employer | City
  • Most Recent Posts:
    MEDIA CENTER  more
    NetSeminar
    Modernize your Development by Moving Build and Code Quality Upstream
    Moderated by Jon Erickson, Editor-in-Chief of Dr. Dobb's, this interactive panel discussion brings industry experts Anders Wallgren, CTO of Electric Cloud and Gwyn Fisher, CTO of Klocwork together for a candid discussion of the cost savings, productivity and quality benefits that can be achieved by stabilizing builds and code quality as early in the development cycle as possible.

    The reality of today's development environment - geographically distributed teams, the use of Agile development practices, increasing application complexity, etc. - is straining the viability of the traditional coding, build and release process. To stay ahead of the curve, development teams are modernizing their approach to dealing with these issues, and as a result are achieving new levels of development productivity. Register for the webcast.
    Date: Wednesday, July 15, 2009
    Time: 11 am PT/2 pm ET
    Modernize your Development by Moving Build and Code Quality Upstream
    Moderated by Jon Erickson, Editor-in-Chief of Dr. Dobb's, this interactive panel discussion brings industry experts Anders Wallgren, CTO of Electric Cloud and Gwyn Fisher, CTO of Klocwork together for a candid discussion of the cost savings, productivity and quality benefits that can be achieved by stabilizing builds and code quality as early in the development cycle as possible.

    The reality of today's development environment - geographically distributed teams, the use of Agile development practices, increasing application complexity, etc. - is straining the viability of the traditional coding, build and release process. To stay ahead of the curve, development teams are modernizing their approach to dealing with these issues, and as a result are achieving new levels of development productivity. Register for the webcast.
    Date: Wednesday, July 15, 2009
    Time: 11 am PT/2 pm ET
                                   
    INFO-LINK

    Resource Links: