Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

XQuery: A Flexible Query Language for XML


XQuery: A Flexible Query Language for XML

Alex Cheng is Engineering Director at Ipedo, where he oversees the design and development of native XML database technology.


XML is a flexible markup language capable of representing information in diverse data sources, ranging from application data to structured and semi-structured documents to relational and object-oriented database output. As more information is stored, exchanged, and presented using XML, the ability to intelligently query XML-based information across diverse data sources becomes increasingly important. Developers need a flexible query language designed to be applicable across all types of XML data sources.

One XML query language, XPath, has been proposed as a standard for selecting data contained within a single XML document. However, the language itself is not flexible enough to handle many sophisticated querying scenarios that developers are accustomed to in querying against relational data using SQL. In addition to the ability to select and join XML data across diverse sources, developers need to concisely specify the output format of the selection, which is usually different from the format of the original data sources. Another proposed standard, XQuery, provides this. In a nutshell, XQuery allows you to both specify what you are selecting and designate what the output format should look like, all in the same query.

Brief History of XQuery

XQuery started its life as Quilt, which in turn was influenced by a number of other query languages, including XPath, XQL, XML-QL, SQL, OQL, Lorel, and YATL. The W3C XML Query Working Group was created in 1999 and has since produced specifications defining XQuery syntax and semantics. Many of the original contributors to the predecessor languages eventually became members of the Working Group. In 2000, the Working Group proposed several major working drafts, including requirements for XQuery, data model, use cases, and query algebra.

In June 2001, the latest working draft of the XQuery 1.0 specification was released by the W3C XML Query Working Group in collaboration with the W3C XSL Working Group. To support many powerful features of XQuery, several significant enhancements were proposed for XPath 2.0. While XPath has limitations as a query language for XML, it does provide foundation capabilities relied on by XQuery. The proposed working draft for XQuery 1.0 and XPath 2.0 Data Model was released in June 2001. The XML Query Formal Semantics specification [XQFS] replaced the XML Query Algebra spec. The latest addition to the list of specifications under the working group include an XML syntax of XQuery, called XQueryX [No public draft] and a Functions and Operators document, describing the functions and operators on XML Schema datatypes defined in XML Schema - Part 2. All the specifications produced by the W3C Query working group are works in progress at this time, and are as such subject to change.

XQuery Data Model

The XQuery data model defines the information contained in the input to and output from an XQuery processor. It is based on the XML information set with additional features including, among other things, support for XML Schema data types. In essence, an instance of the XQuery data model may consist of nodes, simple values, or sequences. A node can be one of the following eight types:

  • Document node
  • Element node
  • Attribute node
  • Namespace node
  • Comment node
  • Processing Instruction node
  • Text node
  • Reference node

A simple value in the data model can be of one of the nineteen primitive data types defined in XML Schema Part 2: Datatypes, namely xsd:string, xsd:boolean, xsd:decimal, xsd:float, xsd:double, xsd:duration, xsd:dateTime, xsd:time, xsd:date, xsd:gYearMonth, xsd:gYear, xsd:gMonthDay, xsd:gMonth, xsd:gDay, xsd:hexbinary, xsd:base64Binary, xsd:anyURI, xsd:QName, and xsd:Notation.

A sequence can be thought of as a flat list of nodes and/or values. It replaces the node-set in XPath 1.0, which could only contain a set of unique nodes. A sequence could thus model an instance of the data model representing collections of documents, a node-set a la Xpath, or a list of simple values. Another important feature of the sequence is that the data model does not make a distinction between a single node (or value) and a sequence containing that single node.

The XQuery Language: A Quick Introduction

XQuery is a functional language in which a query is represented as an expression. The query expressions can be arbitrarily nested, similar to how expressions can be nested within a SQL expression. An XQuery expression leverages the capabilities of XML by allowing both specification of what is being selected and designation of the output format. In this way, it provides some additional conveniences for the developer.

An easy way to understand XQuery features is to go through an actual example. Example 1 is a fragment of a sample XML included in the Use Cases document of the XQuery Working Group. Let’s call the input data “books.xml”. Say we want to select the books published by Addison-Wesley after 1991, including their year and title. We want the resulting output document to look like this:

<bib>
    <book year="1994">
        <title>TCP/IP Illustrated</title>
    </book>
</bib>

The XQuery that generates the result is as shown in Example 2. To generate the result, the work essentially involves scanning through all books in “books.xml” and searching for every book where publisher is “Addison-Wesley” and the published year is greater than 1991. Once a book is found, extract the year and title information and discard all other tags. This query uses some common XQuery expressions including path expressions, element and attribute constructors, and a FLWR expression.

Path Expressions

XQuery path expressions are based on XPath 1.0 path expressions. A path expression provides a way to address specific parts of an XML document by providing a path to the content of interest in the document tree. For example, in the preceding query, to refer to all book elements, which are children of the bib root element in the document “books.xml”, we use the path expression:

    document("books.xml")/bib/book

Here, the document method returns the root node of the document “books.xml”, and the path continues down the tree from the root element bib, onto all book elements, which are children of the bib element.

Element and Attribute Constructors

The result contains two elements: <bib> and <book>. To construct these elements, the elements <bib>...</bib> and <book>...</book> are written directly into the body of the query itself. Braces “{“ and “}” are used to separate literal text content from any sub-expressions inside the element. The sub-expressions enclosed within the braces will be evaluated by the XQuery processor.

In a similar fashion, attribute constructors are specified by in-lining a sub-expression within braces, like:

<book year={ $b/@year }>
FLWR (FOR-LET-WHERE-RETURN) Expressions

FLWR (pronounced “flower”) expressions provide a SQL like syntax for extracting data, performing selections on the extracted data, and returning results in a structure of the users choice. The syntax resembles SQL in the way it uses clauses like FOR, LET and WHERE in a way similar to SELECT, WHERE, and so on, in SQL. Consider the clauses in the FLWR expression in the sample query:

 FOR $b IN document("books.xml")/bib/book

The FOR clause iterates over the sequence of nodes returned by the path expression, binding the variable $b to each of the book nodes returned. The behavior is similar to the Unix for command. Once the variable is bound, you can further access any of its sub-parts by using XPath expressions. For instance, $b/@year returns the “year” attribute of the <book> node and $b/title returns its <title> child element. If you do not want the enclosing tag, you would have to specify $b/title/text().

WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1991

The WHERE clause selects those book nodes whose publisher child element contains the string “Addison-Wesley” and the year attribute contains 1991.

RETURN
    <book year={ $b/@year }>
    { $b/title }
    </book>

The RETURN clause, in this FLWR expression, constructs book elements, including the publishing year as an attribute and the title as a child element.

The syntax for a FLWR expression can be expressed concisely as follows:

FOR and LET clauses generate a list of tuples of bound expressions.

As you can see, each expression term, expr, can itself be replaced by another FLWR expression, thus giving it the expressive power to contain nested queries within queries.

Additional XQuery Features

The SORTBY clause is used to impose order on sequences. For example, to retrieve all book nodes in the preceding sample document ordered by their title, we can use the following expression:

document("books.xml")/bib/book SORTBY (title)

In this example, the path expression document("books.xml") /bib/book is evaluated first and the resulting sequence of nodes is ordered by the SORTBY clause. For each node in the sequence, the ordering expression is evaluated (which is a path expression title, the path being relative to the book node in the sequence), and the sequence is ordered based on its value.

NAMESPACE declarations are used to declare namespace-prefix mappings for namespaces referred to in an XQuery query module. It can also be used to declare a default namespace for an XQuery module. For example, the following query returns all books in the namespace identified by the URI “http://www.bibliophile.com”:

Namespace booklovers="http://www.bibliophile.com"
document("books.xml")//booklovers:book

XQuery comes with a core library of functions that are defined in the Functions and Operators specification. The document function is an example of one such function. In addition to this core library, XQuery allows users to define their own functions. Example 3 illustrates the syntax for defining functions. This example defines a function that takes a string as an argument, and returns a sequence of book nodes that have a matching title. The return type is defined to be of type book_seq, which is assumed to have been defined in the schema “books.xsd” included with the Schema clause.

Example Scenario

To better understand XQuery, let’s consider a scenario for which we need to devise queries. Assume a supply-chain management scenario in which a number of suppliers provide parts catalogs, each using a different XML schema. An order management system uses a common XML format to represent information about available parts from all the suppliers.

XQuery is used to query across all supplier data sources and transform the query results to the common XML format used by the order management system. The order management system delivers the aggregated XML data to the order entry applications on the web, PDAs, and wireless devices.

Assume two catalog files from different suppliers, Alex and Bob. These suppliers use different schema for their catalogs. We need to write queries that extract catalog information from sources with different schema and consolidate into a catalog with a common schema. A couple of example supplier catalogs and the resulting consolidated catalog are shown in Figure 1. The XQuery query that returns the consolidated catalog element is shown in Example 4.

Conclusion

With the proliferation of XML content, developers need a means to query XML to find needed information. SQL is only relevant for the relational data model, so a new query language with similar capabilities but for a different kind of data was needed. After years of work by the W3C, the XQuery spec meets this need.

References

All the specifications produced by the W3C Query working group are works in progress at this time, and are as such subject to change.

XQuery 1.0 and XPath 2.0 Data Model, W3C Working Drafts, www.w3.org/TR/query-datamodel/

XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0, W3C Working Draft, www.w3.org/TR/xquery-operators/

XML Information Set, www.w3.org/TR/xml-infoset/

Quilt - XML Query language specification at Don Chamberlin’s home page, www.almaden.ibm.com/cs/people/chamberlin/quilt_euro.html

XQuery 1.0 Formal Semantics, W3C Working Draft, www.w3.org/TR/query-algebra/

XQuery 1.0: An XML Query Language, W3C Working Draft, www.w3.org/TR/xquery/

XML Schema -Part 2 Datatypes- XML Schema language specification describing the datatype definition in XML document schema definitions, www.w3.org/TR/xmlschema-2/


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.