Alex Cheng is Engineering Director at Ipedo, where he oversees the design and development of native XML database technology.
XML is a flexible markup language capable of representing information in diverse data sources, ranging from application data to structured and semi-structured documents to relational and object-oriented database output. As more information is stored, exchanged, and presented using XML, the ability to intelligently query XML-based information across diverse data sources becomes increasingly important. Developers need a flexible query language designed to be applicable across all types of XML data sources.
One XML query language, XPath, has been proposed as a standard for selecting data contained within a single XML document. However, the language itself is not flexible enough to handle many sophisticated querying scenarios that developers are accustomed to in querying against relational data using SQL. In addition to the ability to select and join XML data across diverse sources, developers need to concisely specify the output format of the selection, which is usually different from the format of the original data sources. Another proposed standard, XQuery, provides this. In a nutshell, XQuery allows you to both specify what you are selecting and designate what the output format should look like, all in the same query.
XQuery started its life as Quilt, which in turn was influenced by a number of
other query languages, including XPath, XQL, XML-QL, SQL, OQL, Lorel, and YATL.
The W3C XML Query Working Group was created in 1999 and has since produced specifications
defining XQuery syntax and semantics. Many of the original contributors to the
predecessor languages eventually became members of the Working Group. In 2000,
the Working Group proposed several major working drafts, including requirements
for XQuery, data model, use cases, and query algebra.
In June 2001, the latest working draft of the XQuery 1.0 specification was released by the W3C XML Query Working Group in collaboration with the W3C XSL Working Group. To support many powerful features of XQuery, several significant enhancements were proposed for XPath 2.0. While XPath has limitations as a query language for XML, it does provide foundation capabilities relied on by XQuery. The proposed working draft for XQuery 1.0 and XPath 2.0 Data Model was released in June 2001. The XML Query Formal Semantics specification [XQFS] replaced the XML Query Algebra spec. The latest addition to the list of specifications under the working group include an XML syntax of XQuery, called XQueryX [No public draft] and a Functions and Operators document, describing the functions and operators on XML Schema datatypes defined in XML Schema - Part 2. All the specifications produced by the W3C Query working group are works in progress at this time, and are as such subject to change.
The XQuery data model defines the information contained in the input to and output
from an XQuery processor. It is based on the XML information set with additional
features including, among other things, support for XML Schema data types. In
essence, an instance of the XQuery data model may consist of nodes, simple
values, or sequences. A node can be one of the following eight types:
A simple value in the data model can be of one of the nineteen primitive data types defined in XML Schema Part 2: Datatypes, namely xsd:string, xsd:boolean, xsd:decimal, xsd:float, xsd:double, xsd:duration, xsd:dateTime, xsd:time, xsd:date, xsd:gYearMonth, xsd:gYear, xsd:gMonthDay, xsd:gMonth, xsd:gDay, xsd:hexbinary, xsd:base64Binary, xsd:anyURI, xsd:QName, and xsd:Notation.
A sequence can be thought of as a flat list of nodes and/or values. It replaces the node-set in XPath 1.0, which could only contain a set of unique nodes. A sequence could thus model an instance of the data model representing collections of documents, a node-set a la Xpath, or a list of simple values. Another important feature of the sequence is that the data model does not make a distinction between a single node (or value) and a sequence containing that single node.
XQuery is a functional language in which a query is represented as an expression.
The query expressions can be arbitrarily nested, similar to how expressions can
be nested within a SQL expression. An XQuery expression leverages the capabilities
of XML by allowing both specification of what is being selected and designation
of the output format. In this way, it provides some additional conveniences for
the developer.
An easy way to understand XQuery features is to go through an actual example.
Example 1 is a fragment of a sample XML included in the
Use Cases document of the XQuery Working Group. Lets call the input data
books.xml. Say we want to select the books published by Addison-Wesley
after 1991, including their year and title. We want the resulting output document
to look like this:
The XQuery that generates the result is as shown in Example
2. To generate the result, the work essentially involves scanning through
all books in books.xml and searching for every book where publisher
is Addison-Wesley and the published year is greater than 1991. Once
a book is found, extract the year and title information and discard all other
tags. This query uses some common XQuery expressions including path expressions,
element and attribute constructors, and a FLWR expression.
XQuery path expressions are based on XPath 1.0 path expressions. A path expression
provides a way to address specific parts of an XML document by providing a path
to the content of interest in the document tree. For example, in the preceding
query, to refer to all
Here, the
In a similar fashion, attribute constructors are specified by in-lining a sub-expression within braces, like:
FLWR (pronounced flower) expressions provide a SQL like syntax
for extracting data, performing selections on the extracted data, and returning
results in a structure of the users choice. The syntax resembles SQL in the
way it uses clauses like FOR, LET and WHERE in a way similar to SELECT, WHERE,
and so on, in SQL. Consider the clauses in the FLWR expression in the sample
query:
The FOR clause iterates over the sequence of nodes returned by the path expression, binding the variable
The WHERE clause selects those
The RETURN clause, in this FLWR expression, constructs
The syntax for a FLWR expression can be expressed concisely as follows:
As you can see, each expression term,
The SORTBY clause is used to impose order on sequences. For example, to retrieve
all book nodes in the preceding sample document ordered by their title, we can
use the following expression:
In this example, the path expression
NAMESPACE declarations are used to declare namespace-prefix mappings for namespaces referred to in an XQuery query module. It can also be used to declare a default namespace for an XQuery module. For example, the following query returns all books in the namespace identified by the URI http://www.bibliophile.com:
XQuery comes with a core library of functions that are defined in the Functions
and Operators specification. The
To better understand XQuery, lets consider a scenario for which we need
to devise queries. Assume a supply-chain management scenario in which a number
of suppliers provide parts catalogs, each using a different XML schema. An order
management system uses a common XML format to represent information about available
parts from all the suppliers.
XQuery is used to query across all supplier data sources and transform the query results to the common XML format used by the order management system. The order management system delivers the aggregated XML data to the order entry applications on the web, PDAs, and wireless devices.
Assume two catalog files from different suppliers, Alex and Bob. These suppliers
use different schema for their catalogs. We need to write queries that extract
catalog information from sources with different schema and consolidate into
a catalog with a common schema. A couple of example supplier catalogs and the
resulting consolidated catalog are shown in Figure 1.
The XQuery query that returns the consolidated catalog element is shown in Example
4.
With the proliferation of XML content, developers need a means to query XML to find needed information. SQL is only relevant for the relational data model, so a new query language with similar capabilities but for a different kind of data was needed. After years of work by the W3C, the XQuery spec meets this need.
All the specifications produced by the W3C Query working group are works in progress at this time, and are as such subject to change.
XQuery 1.0 and XPath 2.0 Data Model, W3C Working Drafts, www.w3.org/TR/query-datamodel/
XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0, W3C Working
Draft, www.w3.org/TR/xquery-operators/
XML Information Set, www.w3.org/TR/xml-infoset/
Quilt - XML Query language specification at Don Chamberlins home page,
www.almaden.ibm.com/cs/people/chamberlin/quilt_euro.html
XQuery 1.0 Formal Semantics, W3C Working Draft, www.w3.org/TR/query-algebra/
XQuery 1.0: An XML Query Language, W3C Working Draft, www.w3.org/TR/xquery/
XML Schema -Part 2 Datatypes- XML Schema language specification describing
the datatype definition in XML document schema definitions, www.w3.org/TR/xmlschema-2/
Brief History of XQuery
XQuery Data Model
The XQuery Language: A Quick Introduction
<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
</book>
</bib>
Path Expressions
book
elements, which are children of the bib
root element in the document books.xml, we use the path expression:
document("books.xml")/bib/book
document
method returns the root node of the document books.xml, and the path continues down the tree from the root element bib
, onto all book
elements, which are children of the bib
element.
Element and Attribute Constructors
The result contains two elements: <bib>
and <book>
.
To construct these elements, the elements <bib>...</bib>
and <book>...</book>
are written directly into the body
of the query itself. Braces { and } are used to separate
literal text content from any sub-expressions inside the element. The sub-expressions
enclosed within the braces will be evaluated by the XQuery processor.
<book year={ $b/@year }>
FLWR (FOR-LET-WHERE-RETURN) Expressions
FOR $b IN document("books.xml")/bib/book
$b
to each of the book
nodes returned. The behavior is similar to the Unix for
command. Once the variable is bound, you can further access any of its sub-parts by using XPath expressions. For instance, $b/@year
returns the year attribute of the <book> node and $b/title
returns its <title>
child element. If you do not want the enclosing tag, you would have to specify $b/title/text()
.
WHERE $b/publisher = "Addison-Wesley" AND $b/@year > 1991
book
nodes whose publisher
child element contains the string Addison-Wesley and the year attribute contains 1991.
RETURN
<book year={ $b/@year }>
{ $b/title }
</book>
book
elements, including the publishing year
as an attribute and the title
as a child element.
FOR and LET clauses generate a list of tuples of bound expressions.
expr
, can itself be replaced by another FLWR expression, thus giving it the expressive power to contain nested queries within queries.
Additional XQuery Features
document("books.xml")/bib/book SORTBY (title)
document("books.xml") /bib/book
is evaluated first and the resulting sequence of nodes is ordered by the SORTBY
clause. For each node in the sequence, the ordering expression is evaluated
(which is a path expression title
, the path being relative to the book
node in the sequence), and the sequence is ordered based on its value.
Namespace booklovers="http://www.bibliophile.com"
document("books.xml")//booklovers:book
document
function is an example
of one such function. In addition to this core library, XQuery allows users
to define their own functions. Example 3 illustrates the
syntax for defining functions. This example defines a function that takes a
string as an argument, and returns a sequence of book
nodes that
have a matching title. The return type is defined to be of type book_seq
,
which is assumed to have been defined in the schema books.xsd included
with the Schema clause.
Example Scenario
Conclusion
References