DHTML: Forget how it works, let's see some in action!
XML Content Syndication: Part 3
Multiple Formats From One XML File
"Applied XML Solutions," a new book from Benoît Marchal, shows professional developers how to apply XML to a variety of real-world applications. These include using XML as a scripting substitute and using XSLT to facilitate communication between incompatible systems. Here we present the third part of the chapter devoted to content syndication: producing HTML, WML and RSS from XML.
December 6, 2000
Extract published courtesy of Sams Publishing.
The servlet has three style sheets from which to choose: one for HTML, one
for WML, and one for RSS. It analyzes the request from the browser to decide
which style sheet to apply and to which document.
The servlet analyzes the request to select the XML document. The first request
is a generic request to the servlet so it returns the default document. The
other requests point to a specific document, index, which is really
the index.xml document.
As we will see, to select an XML document, the servlet essentially discards
the extension and replaces it with .xml.
To decide which style sheet to apply, the servlet studies the request headers,
in which the browser passes a lot of information. A typical request looks similar
to the following:
This code contains a lot of useful information, and the servlet is particularly
interested in the Accept field. Accept lists the MIME types
that the browser recognizes.
The servlet iterates over the MIME types looking for a known type: text/html
for HTML or text/vnd.wap.wml for WML. This is enough for most requests.
However, if it fails, the servlet looks at the extension.html
selects HTML, and .wml selects WML.
Warning - If you are used to file extensions, this algorithm might
be confusing. Why bother with MIME types? Why not look at the extension the
user requested? It is important to recognize that, on the Internet, file extensions
are not very important.
Browsers and servers rely on MIME type to decide what a file is. This algorithm
reflects their bias. In fact, many URLs have no extension, such as the following:
For another example, point your browser to http://www.w3.org/Icons/WWW/w3c_home.
This URL returns the W3C logo in the best format for your browser: text, HTML,
or graphics.
Notice that the servlet first analyzes the Accept header. I have found
that it is more reliable than the extension. For example, a mobile phone user
might accidentally type an address of the form http://localhost:8080/publish/index.html,
even though he should actually be requesting the WML document.
Even if visitors can make mistakes, the browser is always right. The Accept
header field is the most reliable source of information available:
Unfortunately, RSS requires a special procedure. It appears that portals don't
set the Accept header properly. To work around this, the servlet gives
higher priority to the .rss extension.
Tip - This servlet supports only one style sheet per formatone
HTML style sheet, one WML style sheet, and one RSS style sheet.
However, for some applications, having different style sheets might be beneficial.
For example, you might use a different "My Netscape"-branded style
sheet when the visitor is coming from My Netscape. This style sheet would
display a Netscape logo.
You can learn how to add this option through skins in Chapter 8 of the book, "Organize
Teamwork Between Developers and Designers."
Given the flexible approach we have chosen, the getDoc() method must
do a lot of work to convert the URLinto the proper XML file. The URL can be
pointing to a file with or without an extension, and getDoc() will
turn it into a path to an .xml file:
protected String getDocPath(String path){ if(null == path || path.trim().equals("/")) path = "index"; File file = new File("doc",path); path = file.getAbsolutePath(); if(-1 != file.getName().lastIndexOf('.')) // there's a dot in the filenamepath = path.substring(0,path.lastIndexOf('.')); return path + ".xml";}
Applying the Style Sheet
Applying the style sheet is the responsibility of the style() method,
which uses Xalan, the Apache XSL processor:
Warning - No standard API, which is similar to SAX or DOM, exists
for XSL processors. Currently, the API is specific to each processor. Therefore,
this method works only with Xalan.
The style() method manages a small cache. Most documents will be called
again and again, so it is more cost-effective to apply the style sheet once
and store the result until the next request.
The cache is very simple and effective. After styling, style() stores
the result in a hash table; the key to which is a combination of the filename,
the style sheet, and a timestamp for the XML document.
Although this method is simple, it can be very costly. It runs the risk of
the cache growing indefinitely, consuming all the memory. Therefore, when the
cache contains 10 documents, style() empties it.
The cache is emptied every 10 documents, not every 10 requests. If visitors
make a thousand requests to a single document, that document is styled only
once and served for the cache for the next 999 requests.