WebDevelopersJournal.comTips on Web Page Design, HTML and Graphics
SITE SEARCH
Newsletters
Java/Open Source Daily



Jobs at webdeveloper.com

Resources By Subject
Technical
Graphical
Authoring
Business
WDJ resources
Archive

internet.com

internet.commerce
  • Partner With Us
















Developer Channel


Find a web host with:
CGI Access DB Support Telnet Access
NT Servers UNIX Servers



Semi-automatic?

JavaScript
JavaScript Helper:
Meet Paige Turner, the least geeky geek we've ever come across.

Variables and Operators Explained:
First of a three part guide to JavaScript basics.

Controlling Forms:
Enhance your HTML forms with a touch of JS.

DHTML:
Forget how it works, let's see some in action!


XML Content Syndication:
Part 3

Multiple Formats From One XML File

"Applied XML Solutions," a new book from Benoît Marchal, shows professional developers how to apply XML to a variety of real-world applications. These include using XML as a scripting substitute and using XSLT to facilitate communication between incompatible systems. Here we present the third part of the chapter devoted to content syndication: producing HTML, WML and RSS from XML.
December 6, 2000
Extract published courtesy of Sams Publishing.

This article is in four parts:

Styling on Demand

Listing 4.4 is a servlet that takes XML documents and style sheets and returns HTML, WML, or RSS documents.

Listing 4.4 Publish.java

package com.psol.publish;

import java.io.*;
import java.util.*;
import org.xml.sax.*;
import javax.servlet.*;
import javax.servlet.http.*;
import org.apache.xalan.xslt.*;

public class Publish
  extends HttpServlet
{
  protected final static String
   HTML_STYLESHEET = "stylesheet/html.xsl",
   WML_STYLESHEET = "stylesheet/wml.xsl",
   RSS_STYLESHEET = "stylesheet/rss.xsl";
  protected Dictionary cache = new Hashtable();

  protected String getDocPath(String path)
  {
   if(null == path || path.trim().equals("/"))
     path = "index";
   File file = new File("doc",path);
   path = file.getAbsolutePath();
   if(-1 != file.getName().lastIndexOf('.'))
     // there's a dot in the filename
path = path.substring(0,path.lastIndexOf('.'));
   return path + ".xml";
  }

  protected void style(String document,
            String stylesheet,
            OutputStream output)
   throws IOException, SAXException
  {
   // periodically cleans the cache
   if(cache.size() > 10)
     cache = new Hashtable();
   File file = new File(document);
   String key = document +
          stylesheet +
          Long.toString(file.lastModified());
   ByteArrayOutputStream cached =
    (ByteArrayOutputStream)cache.get(key);
   if(null == cached)
   {
     cached = new ByteArrayOutputStream();
     XSLTProcessor processor =
      XSLTProcessorFactory.getProcessor();
     XSLTInputSource source =
      new XSLTInputSource(document);
     XSLTInputSource styleSheet =
      new XSLTInputSource(new FileInputStream(stylesheet));
     XSLTResultTarget target = new XSLTResultTarget(cached);
     processor.process(source,styleSheet,target);
     cache.put(key,cached);
   }
   cached.writeTo(output);
  }

  protected long getLastModified(HttpServletRequest request)
  {
   File file = new File(getDocPath(request.getPathInfo()));
   // read the warning in File.lastModified() but it's
   // the best thing we have :-(
   return file.lastModified();
  }

  public void doGet(HttpServletRequest request,
           HttpServletResponse response)
   throws ServletException, IOException
  {
   String document = request.getPathInfo();
   String styleSheet = null;
   if(null != document &&
     document.endsWith(".rss"))
   {
     response.setContentType("text/xml");
     styleSheet = RSS_STYLESHEET;
   }
   String accept = request.getHeader("Accept");
   if(null != accept &&
     null == styleSheet)
   {
     StringTokenizer acceptTok =
      new StringTokenizer(accept,",",false);
     while(acceptTok.hasMoreTokens())
     {
      String mimeType = acceptTok.nextToken().trim();
      if(mimeType.equals("text/html"))
      {
        response.setContentType("text/html");
        styleSheet = HTML_STYLESHEET;
        break;
      }
      else if(mimeType.equals("text/vnd.wap.wml"))
      {
        response.setContentType("text/vnd.wap.wml");
        styleSheet = WML_STYLESHEET;
        break;
      }
     }
   }
   if(null == styleSheet)
   {
     if(null !=document &&
      document.endsWith(".wml"))
     {
      response.setContentType("text/vnd.wap.wml");
      styleSheet = WML_STYLESHEET;
     }
     else
     {
      response.setContentType("text/html");
      styleSheet = HTML_STYLESHEET;
     }
   }
   try
   {
     style(getDocPath(document),
        styleSheet,
        response.getOutputStream());
   }
   catch(SAXException e)
   {
     Exception ex = e.getException() != null ?
            e.getException() : e;
     response.sendError(
      HttpServletResponse.SC_INTERNAL_SERVER_ERROR,
      ex.getMessage());
   }
  }

}

Selecting the Right Style Sheet

The servlet has three style sheets from which to choose: one for HTML, one for WML, and one for RSS. It analyzes the request from the browser to decide which style sheet to apply and to which document.

Requests can take the following forms:

http://localhost:8080/publish
http://localhost:8080/publish/index
http://localhost:8080/publish/index.wml
http://localhost:8080/publish/index.rss

The servlet analyzes the request to select the XML document. The first request is a generic request to the servlet so it returns the default document. The other requests point to a specific document, index, which is really the index.xml document.

As we will see, to select an XML document, the servlet essentially discards the extension and replaces it with .xml.

To decide which style sheet to apply, the servlet studies the request headers, in which the browser passes a lot of information. A typical request looks similar to the following:

GET /publish/index.html HTTP/1.1
User-Agent: Mozilla/4.5 [en] (Win98; U)
Host: localhost:8080
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
Accept-Encoding: gzip
Accept-Language: fr-BE,fr,en
Accept-Charset: iso-8859-1,*,utf-8
Extension: Security/Remote-Passphrase

This code contains a lot of useful information, and the servlet is particularly interested in the Accept field. Accept lists the MIME types that the browser recognizes.

The servlet iterates over the MIME types looking for a known type: text/html for HTML or text/vnd.wap.wml for WML. This is enough for most requests. However, if it fails, the servlet looks at the extension—.html selects HTML, and .wml selects WML.


Warning - If you are used to file extensions, this algorithm might be confusing. Why bother with MIME types? Why not look at the extension the user requested? It is important to recognize that, on the Internet, file extensions are not very important.

Browsers and servers rely on MIME type to decide what a file is. This algorithm reflects their bias. In fact, many URLs have no extension, such as the following:

http://www.marchal.com/

For another example, point your browser to http://www.w3.org/Icons/WWW/w3c_home. This URL returns the W3C logo in the best format for your browser: text, HTML, or graphics.


Notice that the servlet first analyzes the Accept header. I have found that it is more reliable than the extension. For example, a mobile phone user might accidentally type an address of the form http://localhost:8080/publish/index.html, even though he should actually be requesting the WML document.

Even if visitors can make mistakes, the browser is always right. The Accept header field is the most reliable source of information available:

public void doGet(HttpServletRequest request,
         HttpServletResponse response)
  throws ServletException, IOException
{
  String document = request.getPathInfo();
  String styleSheet = null;
  if(null != document &&
   document.endsWith(".rss"))
  {
   response.setContentType("text/xml");
   styleSheet = RSS_STYLESHEET;
  }
  String accept = request.getHeader("Accept");
  if(null != accept &&
   null == styleSheet)
  {
   StringTokenizer acceptTok =
     new StringTokenizer(accept,",",false);
   while(acceptTok.hasMoreTokens())
   {
     String mimeType = acceptTok.nextToken().trim();
     if(mimeType.equals("text/html"))
     {
      response.setContentType("text/html");
      styleSheet = HTML_STYLESHEET;
      break;
     }
     else if(mimeType.equals("text/vnd.wap.wml"))
     {
      response.setContentType("text/vnd.wap.wml");
      styleSheet = WML_STYLESHEET;
      break;
     }
   }
  }
  if(null == styleSheet)
  {
   if(null !=document &&
     document.endsWith(".wml"))
   {
     response.setContentType("text/vnd.wap.wml");
     styleSheet = WML_STYLESHEET;
   }
   else
   {
     response.setContentType("text/html");
     styleSheet = HTML_STYLESHEET;
   }
  }
}

Unfortunately, RSS requires a special procedure. It appears that portals don't set the Accept header properly. To work around this, the servlet gives higher priority to the .rss extension.


Tip - This servlet supports only one style sheet per format—one HTML style sheet, one WML style sheet, and one RSS style sheet.

However, for some applications, having different style sheets might be beneficial. For example, you might use a different "My Netscape"-branded style sheet when the visitor is coming from My Netscape. This style sheet would display a Netscape logo.

You can learn how to add this option through skins in Chapter 8 of the book, "Organize Teamwork Between Developers and Designers."


Given the flexible approach we have chosen, the getDoc() method must do a lot of work to convert the URLinto the proper XML file. The URL can be pointing to a file with or without an extension, and getDoc() will turn it into a path to an .xml file:

protected String getDocPath(String path)
{
  if(null == path || path.trim().equals("/"))
   path = "index";
  File file = new File("doc",path);
  path = file.getAbsolutePath();
  if(-1 != file.getName().lastIndexOf('.'))
   // there's a dot in the filename
path = path.substring(0,path.lastIndexOf('.'));
  return path + ".xml";
}

Applying the Style Sheet

Applying the style sheet is the responsibility of the style() method, which uses Xalan, the Apache XSL processor:

protected void style(String document,
           String stylesheet,
           OutputStream output)
  throws IOException, SAXException
{
  // periodically cleans the cache
  if(cache.size() > 10)
   cache = new Hashtable();
  File file = new File(document);
  String key = document +
        stylesheet +
        Long.toString(file.lastModified());
  ByteArrayOutputStream cached =
   (ByteArrayOutputStream)cache.get(key);
  if(null == cached)
  {
   cached = new ByteArrayOutputStream();
   XSLTProcessor processor =
     XSLTProcessorFactory.getProcessor();
   XSLTInputSource source =
     new XSLTInputSource(document);
   XSLTInputSource styleSheet =
    new XSLTInputSource(new FileInputStream(stylesheet));
   XSLTResultTarget target = new XSLTResultTarget(cached);
   processor.process(source,styleSheet,target);
   cache.put(key,cached);
  }
  cached.writeTo(output);
}

Warning - No standard API, which is similar to SAX or DOM, exists for XSL processors. Currently, the API is specific to each processor. Therefore, this method works only with Xalan.


The style() method manages a small cache. Most documents will be called again and again, so it is more cost-effective to apply the style sheet once and store the result until the next request.

The cache is very simple and effective. After styling, style() stores the result in a hash table; the key to which is a combination of the filename, the style sheet, and a timestamp for the XML document.

Although this method is simple, it can be very costly. It runs the risk of the cache growing indefinitely, consuming all the memory. Therefore, when the cache contains 10 documents, style() empties it.

The cache is emptied every 10 documents, not every 10 requests. If visitors make a thousand requests to a single document, that document is styled only once and served for the cache for the next 999 requests.



Copyright Sams Publishing. All rights reserved.

Part 4
Part 2

Suits PonytailsPropheadsContact WDJDiscussWeb AudioSearch


The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers