March 30, 2002
The Languages of the Semantic WebUche Ogbuji
XML-based description formats like RDF and DAML+OIL are slowly fulfilling our hopes for a more meaningful, helpful Internet.
The Languages of the Semantic Webby Uche OgbujiJune 2002
To create the Web as we know it, Tim Berners-Lee put aside much of the existing research on hypertext technologies and built a simple system that was easy to understand, use, and maintain. This simplification became an important factor in the Web's rapid growth. Despite this success, the realities of information management are illuminating some problems of simplification. While the Web continues to be useful for retrieving information from individuals or organizations of close collaborators, it is much harder to use if you want to gain a broad understanding of a particular subject. For example, while we can visit the Burton snowboards Web site to find out what products the company offers, read about its corporate policies and philosophies, and even browse a selection of links about snowboarding in general, it's much more difficult to find a wider perspective on snowboarding as an industry and interest. It's even harder to bind together the many Web sites that discuss snowboarding. This is where the Semantic Web comes in. The Semantic Web is a vision of a next-generation network that lets content publishers provide notations designed to express a crude "meaning" of the page, instead of merely dumping arbitrary text onto a page. Autonomous agent software can then use this information to organize and filter data to meet the user's needs. There has been much effort to refactor the Web more along these lines since the success of the current Web. Proponents of this goal often refer to it as the Intelligent Web. For those who focus on the problem of how to express the contextor, the semanticsof content in distributed systems like the Web, this goal is called the Semantic Web. Even though this next-generation Web has yet to become a reality, much of the current work on the Semantic Web centers on a variety of technologies that are already in widespread, practical use. In particular, the Resource Description Framework (RDF)which lets content creators express structured metadata statements describing URIs.
Limits of Today's WebWith the current state of the Web, there are only two real methods of gaining broader information about documents. The first is to use a directory or portal site, and thus rely on human editors to scour the Web and appropriately categorize pages and their associated links. Such portals are the heroes of today's Web. After all, the most effective information management tool on Earth is still the human librarian, and probably will be for years to come. The problem is that directories take tremendous effort to maintain. Finding new links, updating old ones, and maintaining the database technology add to a portal's administrative burden and operating costs.
Search engines are the alternative. Good search engines pay special attention to metadata in the pages that they spider and add to their index databases. In the simplest case, this metadata might take the form of content in Search engines take less human effort on the content management end, but they require a frightfully large resource investment. It's also very difficult to produce valuable indices efficiently. It's no secret that some of the most advanced search engines are so primitive that queries often turn up an unmanageable number of poorly differentiated hits. A user who tries to finely craft his or her search to zero in on a point risks filtering out potentially relevant search results. The Web needs to support something in between portals and search engines. Of course, until there's a server as sophisticated as HAL 9000 (but, hopefully, not as neurotic), we probably won't be able to completely replace the human portal editor with a computer program. But if we could provide standardized means for Web publishers to catalog and classify their own content, then we could develop more effective agents that work on this substrate of better-organized information. The result of having better standard metadata would be a Web where users and agents could directly tap the latent information in linked and related pages. This would help free us from having to scour for information site by site, and from relying on portals and search engines. It wouldn't be hard to outfit each user with personal portal generators and search agents tailored to their particular interests, needs, and constraints. These agents might even be configured to learn and respond to personal details with the help of artificial intelligence techniques. The Semantic Web's ChallengesIt's fine to talk about enabling each Web publisher to properly place content in context, but there are several problems to overcome before any such initiative will gain critical mass:
Semantic Web proponents are looking to XML and RDF to meet these challenges. XML would let a publisher use markup that differentiates a catalog entry of a snowboard product from an independent review of the same item. However, this method relies on custom tags, and agents need a way to grasp the "meaning" in such tags a facility called semantic transparency. Web metadata is the key to providing it. Because of its importance, the W3C developed RDF as a standard for Web metadata. Inside RDFRDF is indeed quite simple at its core, though it can get hairy in short order. It is a model of statements made about resources. A resource is anything with an associated URI. In practice, it's most often a document on the Web, but it can be anything to which people have agreed to assign a URI. In this way, one could even use RDF to make statements about abstractions like peace, or even imaginary entities like Gandalf the Wizard. RDF's statements are hardly as complex as those we use in natural language. They have a uniform structure of three parts: predicate, subject, and object. For example: The author [predicate] of The Lord of the Rings [subject] is J.R.R. Tolkien [object].
This simplicity and uniformity make RDF's statements generic. They can be used to encode the above natural-language statement, as well as, say, an object-oriented model. For example, if you had written a class called RDF lets you express such statements in a formal way that software agents can read and act on. It lets us express a collection of statements as a graph, as a series of (subject, predicate, object) triples, or even in XML form. The first form is the most convenient for communication between people, the second for efficient processing, and the third for flexible communication with agent software. If a portal were to create a directory of snowboarding sites, it could use such an RDF/XML document to help RDF-enabled agents and tools better understand the information that the sites offer. Example 1 is loosely based on the format used by the Open Directory Project (www.dmoz.org), a community effort to build a universal Web site directory.
The first document element,
The first child is an
The
Next comes another property element, Of course, it would be best for the community if each Web page could maintain its own metadata. The RDF specification provides a convention for people to place RDF within HTML pages. Example 2 illustrates how the maintainers of various snowboarding sites might use RDF to do this for their own pages.
The empty |
|
|||||||||||
|
|
|
|