Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

XML in the Real World


September 2000: Thinking Objectively: XML in the Real World

Extensible Markup Language, better known as XML, is currently one of the most over-hyped terms in the computer industry. XML is a data-formatting standard that is a no-frills version of Standard Generalized Markup Language. That’s it. Similar to Java’s write once, run anywhere (WORA) philosophy, XML promotes a simple, write once, publish everywhere (WOPE) approach. Many predict XML will become the data-sharing Esperanto—although it remains to be seen whether it could also share the fate of Esperanto as an elegant but ultimately forgotten solution. How can you apply XML? What issues must be addressed to use XML effectively? What can your organization do to improve its chances of success?

Because XML is designed for document exchange, it enables movement of both electronic commerce across the Internet and enterprise application integration (EAI) within your intranet. XML can also be used as a data storage format: You can write XML documents as a simple file on a disk, as a blob in a relational database or as a full-fledged XML object in an objectbase. Furthermore, XML has the potential to enable a generic approach to outputting data using technologies such as Extensible Stylesheet Language (XSL) that define the rendering of XML documents to output devices such as browsers and printers.

The primary barrier to e-commerce is not the reliability or security of the Internet; rather, it is the ability to share information effectively. For example, in business-to-business (B2B) e-commerce, a large number of documents needs to be exchanged between organizations, including orders, order status queries, invoices and detailed product information. Without an agreed-upon approach regarding these documents, B2B e-commerce cannot occur. Likewise, similar standards are required with business-to-consumer (B2C), particularly if the consumer is conducting their business transactions through specific e-commerce software such as electronic shopping agents, often called "shopbots," that scour the net looking for products. A common approach to describing a product for sale would facilitate this approach to shopping on the Internet, increasing sales on Web sites that implement this standard.

To make XML work for e-commerce, and for that matter EAI, the parties involved must agree on both the format and the semantics of the XML documents that they intend to share. Luckily, the format of an XML document can be easily defined and enforced by a standard Document Type Definition (DTD). A DTD precisely defines the elements that will be included in an XML document and thus can be used by an XML parser to validate an XML document. Ideally, the tags defined by a DTD (tags that identify an element within an XML document) follow a common naming convention and conform to the XML Namespaces standard (http://www.w3.org/TR/REC-xml-names). Using complete English capitalized words such as "AUTHOR" and "SHIPPING-ADDRESS" is common; although, I prefer using upper and lower case letters, such as "Author" and "ShippingAddress," for improved readability. Several vertical industry standards for XML documents are currently being defined; for more information see the sidebar. The good news is that XML formatting issues are fairly straightforward.

Figure 1. An XML Document for an Ice Cream Cone Order


<ConeOrder>

<Cone>Plain</Cone>

<Scoop>Chocolate</Scoop>

<Scoop>Vanilla</Scoop>

</ConeOrder>

The bad news is that the real problem lies within the semantics of the information contained within an XML document. Consider an Internet start-up that wants to sell ice cream cones over the Internet. A customer will visit their Web site, indicate the type of ice cream cone(s) that she wants, and the system contacts potential vendors to determine if they can supply such a cone. The system displays the potential offerings, the customer picks the cone that she wants and the chosen vendor delivers the ice cream to the customer. Luckily, a well-defined XML schema standard for ice cream-related e-commerce exists that every organization involved has adopted. The XML document representing the request to the vendors includes an element indicating the type of cone (sugar cone, waffle cone and so on) as well as an ordered collection of elements each of which indicates the flavor of a single scoop. Semantics becomes important because each vendor needs to fill in the elements of the XML document in a similar way. For example, a customer asks for a plain cone with a chocolate scoop and a vanilla scoop. This request would be written as the XML document in Figure 1 and transmitted to the ice cream vendors who would then respond with what they could provide; however, this only works if the values that could be put into the XML document are predefined. For example, "Chocolate" is a valid value for a flavor, but "Purple" and "1701" are not. Each vendor doesn’t have to offer each of the defined flavors—one vendor may choose not to sell chocolate ice cream at all, whereas another vendor may carry a wide range of chocolate flavors. However, each vendor must understand the request that is sent to them. In short, the type (integer, string, currency and so forth) needs to be agreed upon for each element, as well as the potential values that each element can take.

Figure 2. An XML Document for a Vendor's Response


<ConeOfferings>

<Vendor>WWW.I-ceCream.com</Vendor>

<ShippingTime>1 hour or the next one's free</ShippingTime>

<Cone>Plain</Cone>

<Scoop>Double Chocolate</Scoop>

<Scoop>Vanilla</Scoop>

<Price>1.25</Price>

<Tax>0.00</Tax>

<ShippingCost>0.00</ShippingCost>

<Currency>USD</Currency>

<Cone>Plain</Cone>

<Scoop>Mocha Chocolate</Scoop>

<Scoop>Vanilla</Scoop>

<Price>1.50</Price>

<Tax>0.00</Tax>

<ShippingCost>0.00</ShippingCost>

<Currency>USD</Currency>

<Cone>Plain</Cone>

<Scoop>Ultra Chocolate</Scoop>

<Scoop>Vanilla</Scoop>

<Price>1.50</Price>

<Tax>0.00</Tax>

<ShippingCost>0.00</ShippingCost>

<Currency>USD</Currency>

</ConeOfferings>

Figure 2 illustrates a potential response from a vendor, also an XML document, and once again semantics are an issue—the receiving software needs to know that the information identified by the "Price" tags represents a floating point number (all data within an XML document are actually strings). The semantics of the flavors indicated in the response to the query wouldn’t matter, because the system would simply display them to the customer trusting that it is meaningful to them. Efforts for defining the semantics of an XML document are currently underway and described at http://www.w3.org/TR/NOTE-xml-schema-req. The fundamental issue for e-commerce is for disparate software to interoperate effectively: XML supported by defined semantics addresses this need.

The second major use of XML is in enterprise application integration, the unrestricted sharing of data and business processes among any connected software applications within your organization. XML is a primary enabler of data-level EAI because XML is platform independent and thus can be used to share data between disparate systems. The major issues that you need to overcome with EAI are typically not technological, but instead are often people- or process-related: The politics of convincing the owners of the various systems within your organization to open them up to others and the process of analyzing the legacy systems are far more complex than defining the XML structures to be shared between the systems. Never underestimate the effort required to analyze a legacy system. The documentation is rarely current, and you often are unable to even find people who understand the system, particularly now that many organizations have shed their Year 2000 staff. Furthermore, due to years of "developer abuse," the source data may be convoluted: Data elements will be used for multiple purposes; several elements will be used together for a single purpose; the data will be highly de-normalized; and the data will often be inaccurate. The biggest part of your development effort will be to write code that "cleanses" the source data and writes it into XML format. You will also need to take input XML documents, parse them, and "uncleanse" the data so it can be understood by the legacy system. To learn more about EAI, read David S. Linthicum’s Enterprise Application Integration (Addison-Wesley, 2000) as well as his series of three articles published in Software Development (Apr., June and Sept. 1999) and recently reprinted in The Unified Process Elaboration Phase (Ambler and Constantine, CMP Books, 2000).

Impedance Mismatch

XML documents could be used as a format to persist data within your organization. Unfortunately, XML doesn’t fit well for organizations that use relational databases (RDBs) as their primary storage mechanisms. As with objects and RDBs there is an impedance mismatch between XML and RDBs: XML is based on the traversal of trees, whereas RDBs are based on the joining of data. Yes, you could store XML documents as blobs within your RDB, extract them when needed and process them, but you effectively lose most of the benefits of your RDB with this approach because you cannot use it to manipulate the information within an XML document.

Luckily, the impedance mismatch between XML and objects is smaller; with XML, you traverse a tree of data while you traverse collections of objects with the object paradigm. I suspect that XML might be the long-awaited "killer app" for objectbases, also known as object databases (ODBs). For example, eXcelon Corporation, formerly known as Object Design, is a leading vendor of XML and e-commerce offerings, and its product line is based on an ODB it calls ObjectStore. As readers of this column know, I have not been a great supporter of ODBs in the past. While the technology is fantastic, it has been greatly overshadowed in the marketplace by relational approaches. Perhaps XML will move ODBs out of the shadows and into the mainstream persistence market.

Finally, you may decide to use XML as a major component of your output strategy. One of the thorniest problems that developers face when building systems that are meant to work on a variety of platforms is how to render output to various formats. Sometimes information needs to be rendered to a browser. Sometimes it needs to be rendered to a personal laser printer, to a heavy-duty printer in another building or to a text-based line printer. Perhaps your information needs to be rendered to a fax machine, to an image file, or simply to the screen of a workstation. The secret is to find a way to do this in a generic manner, and Extensible Stylesheet Language (XSL) promises to do just that.

The basic idea is that an XSL program accepts as input an XML document and then renders it to a specific output device, effectively acting as a wrapper to that device. XSL offers great potential but in practice it is hard to work with: Its syntax is difficult and it seems to promote spaghetti code that is difficult to maintain and to enhance. You’ll find that your organization needs to invest in training, mentoring, coding standards or guidelines, and even improvements to your software process to be effective with XSL. Be that as it may, many developers are using XSL effectively in mission-critical applications.

Make That Change

What does your organization need to do to be successful using XML? First, it must be prepared to invest in training and education. Just like any new technology, XML has a learning curve that must be overcome: There’s a reason why Charles F. Goldfarb and Paul Prescod’s The XML Handbook 2nd Edition (Prentice Hall, 2000) is over a thousand pages in length. Second, set standards for and then invest in tools. You’ll need XML editors, parsers, and management systems. There are many XML tools available online, most of them free or trial versions that you can download; select a core suite of tools and use them consistently. Third, as indicated earlier, set naming conventions. An XML tag should say what the information is, not what it looks like: "BirthDate" is a good tag name but not "YYYYMMDD." Fourth, set metadata conventions. To accurately describe the schema of an XML document you need to identify its format (likely using a DTD) as well as what each element represents. To describe an element, you need to know its purpose, its type (for example, string, currency, integer and so on), its valid values, its invariants, its default value, and any appropriate formatting rules. This is all standard persistence and data modeling stuff, something your 50-year old data modelers have likely been doing long before your 22-year old XML developers were even born. Fifth, invest the time to understand XML-based standards mentioned in this article such as XML Namespaces, XML Schema, and the Simple Object Access Protocol (SOAP) http://search.ietf.org/internet-drafts/draft-box-http-soap-01.txt.

XML holds great promise; it could truly enable a write once, publish everywhere (WOPE) mindset within the software development community in which XML documents are used on any platform regardless of the document’s source. This will promote e-commerce, EAI, data storage and potentially even generic output capabilities within organizations. However, never forget that XML is merely a way to format data. You still need competent, highly-skilled staff who follow an accepted software process and work together effectively to be successful. XML isn’t magic; it takes work to make XML work for you.

XML and Related Resources on the Web

World Wide Web Consortium (W3C) www.w3.org Manages the development of the XML standards.

Organization for the advancement of Structured Informations Standards (OASIS) www.oasis-open.org An industry consortium with a large number of XML publications and resources
XML.ORG www.xml.org An XML resource page sponsored by OASIS.
Microsoft BizTalk Schemas www.biztalk.org XML schemas, information and development tools.
FinXML www.finxml.org Definition of XML schemas for data interchange among financial institutions.
Product Data Markup Language (PDML) www.pdit.com/pdml Definition of XML schemas for interchange of product information among commercial and government systems.
RosettaNet www.rosettanet.org A nonprofit industry consortium for the definition of e-commerce standards, including XML DTDs.
CommerceNet www.commerce.net An international organization that promotes the growth of business on the Internet. They offer a registry service for XML-based e-commerce.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.