|
![]() |
|
This article is in four parts:The Web is many things to many people but, for publishers and authors, it is another media comparable to print, radio, and TV. Don't get me wrong, I recognize that the Internet has unique characteristics, but its reach is comparable to other popular media. As proof, look at initiatives by existing publishers to offer their content online (visit www.informit.com), the emergence of new publishers (such as www.earthweb.com), and, of course, the growing involvement of authors (such as my own www.marchal.com). Furthermore, a growing number of companies, who are not necessarily publishers, use their Web sites to distribute information, articles, and reports (such as developer.iplanet.com). However, the medium is still young and changing. At the peak of the rivalry between Microsoft and Netscape, the so-called "browser war," Web fashion was changing every six months. We are now enjoying more stability, but, mark my words, the browser war is about to start again with new actors. And this time, it will be more painful for the under-prepared. According to the W3C, non-desktop browsers might account for as much as 75% of all surfers by 2002. Non-desktop browsers include mobile phones, PDAs (such as the PalmPilot), and WebTV. Most of these devices simply won't use HTML. During the browser war, designers could at least rely on some level of commonality between the two major browsers. This won't be the case any more because mobile phones use a special language, Wireless Markup Language (WML), which is incompatible with HTML. What to do? Should content providers (publishers, authors, and companies) limit themselves to either HTML or WAP? Should they support both formats? Should they prepare for even more formats? Developing original content (articles, books, reports, and so on) is expensive. To offset the cost, content owners want to distribute their content as widely as possible. Ideally, it should not matter whether the reader uses a PC, a mobile phone, or another device. In this chapter, we will see how XML helps address this challenge. As you know, XML's roots are in the publishing industry, and that heritage guarantees that there is no lack of quality tools for publishing problems. ArchitectureWebmasters typically edit their Web sites with an HTML editor. The major disadvantage of this approach is that it freezes the site. Indeed, to change the presentation, you must manually re-edit every page. It's possible to do, but it's a lot of work. The XML solution is to separate authoring from publishing. The author of the pages writes the document in XML. While doing so, she ignores presentation. She instead adopts an XML vocabulary that focuses on the organization of the document: sections, titles, abstracts, and more. Publishing the document then simply requires converting the document into HTML, WML, or another popular format. Fortunately, this can be automated because the original XML document is structure rich. The operative word here is automated. For medium to large sites, it is more cost effective to automate publishing. Rewriting a couple of pages by hand is feasible; however, for a hundred pages, it is too expensive. Figure 4.1 illustrates how we'll apply these principles in this chapter.
The three main elements are as follows:
XML Stylesheet LanguageTo publish XML documents we will use XSL, the XML Stylesheet Language. More specifically, we will use XSLT, XSL Transformation. XSLT is a scripting language optimized for conversion between XML documents. In that respect it differs from early style sheet languages, such as CSS (Cascading Style Sheet), or word processor style sheets. CSS describes how each element should be presented onscreen: which font, which color, which size, and more. XSLT transforms the XML document into another XML document. It goes much further than simple presentation instructions. In fact, XSLT can completely reorganize a document and, for example, add a table of contents or delete a section. How does that help? The trick is to transform from a structure-rich XML document into a format that contains display instructions, such as HTML or WML. A browser (or another viewer) can render the second document onscreen or on paper. What display format should you use? The following are some popular options:
The XSLT standard is available online at www.w3.org/TR/xslt. XML VocabularyXML does not define any vocabulary. It is up to developers to create vocabularies for their applications. For this application, we have two realistic options. The first option is to use DocBook (www.docbook.org) or another standard SGML/XML vocabulary for documents. DocBook is particularly attractive because it is widely used and well supported. However, DocBook is so rich that it is too complicated for such a simple project. The second option, and the one we'll adopt in this chapter, is to create our own vocabularyone that is simple and limited to only the tags we need. Listing 4.1 illustrates the vocabulary we'll use in this chapter. As you can see, it is almost trivial: It's just a list of news items. Listing 4.1 index.xml<?xml version="1.0"?>
<News>
<URL>http://localhost:8080/publish/index</URL>
<Item>
<Title>Applied XML Solutions</Title>
<Author>Benoît Marchal</Author>
<Abstract>A new intermediate/advanced book for XML
developers.</Abstract>
<Para>Learn advanced XML programming with Applied XML
Solutions. This hands-on teaching book is filled with
practical examples.</Para>
<Para>Applied XML Solutions is a great complement to XML by
Example.</Para>
</Item>
<Item>
<Title>Jetty</Title>
<Author>Greg Wilkins</Author>
<Abstract>Open Source Java Server.</Abstract>
<Para>Jetty is a powerful, open-source Java web server. It
supports standard Java servlets making it the ideal
development environment.</Para>
<Para>Jetty is also highly-configurable which helps custom
developments.</Para>
</Item>
<Item>
<Title>Hypersonic SQL</Title>
<Author>Thomas Müller</Author>
<Abstract>Open Source SQL Database.</Abstract>
<Para>Hypersonic SQL is an open source database that
supports the JDBC API.</Para>
<Para>Hypersonic SQL is efficient and can run in three
modes: in-memory, standalone or client/server. This
provides lots of flexibility when writing
software.</Para>
</Item>
The list starts with a URL that points to the server where the document resides. The W3C suggests using the xml:base attribute for this purpose, but it turns out that Xalan, the XSLT processor I use, has a problem with the xml namespace, so I use a URL element as a workaround: <URL>http://localhost:8080/publish/index</URL>Each item has a title, author, abstract, and list of paragraphs: <Item> <Title>Applied XML Solutions</Title> <Author>Benoît Marchal</Author> <Abstract>A new intermediate/advanced book for XML developers.</Abstract> <Para>Learn advanced XML programming with Applied XML Solutions. This hands-on teaching book is filled with practical examples.</Para> <Para>Applied XML Solutions is a great complement to XML by Example.</Para> </Item> Figure 4.2 illustrates the structure.
How can you develop such a format? When should you use existing formats (such as DocBook) rather than develop your own? Unfortunately, there are no hard rules that you can follow to guarantee success. As you develop your XML vocabulary, remember that a good vocabulary achieves a reasonable compromise between two opposite goals: On the one hand, it must mark up as much information as possible; on the other hand, it must be simple. It is important to mark up as much data as is realistically possible because the markup drives the transformation to HTML, WML, and others. If something has not been marked up, transforming it will be difficult (or outright impossible). Yet, as you define the vocabulary, be realistic. If you provide too many tags and too many options, you will confuse authors. This is particularly true if authors don't use the format regularly. A format that is too complex can be dangerous because it gives the false impression that we're creating quality documents, whereas, in fact, authors usually ignore most of the markup. I am sure you have already encountered a database with a complex table organization. In most cases, developers have misused it and retrieving useful information is difficult. The same could happen with a markup vocabulary that is too complex. Tip - Consider using an XML editor, as introduced in the previous chapter,
to guide authors. Copyright Sams Publishing. All rights reserved. |
| Suits | Ponytails | Propheads | Contact WDJ | Discuss | Web Audio | Search |
