|
![]() |
|
XML concentrates on the structure of the information in a file and not its appearance. To view XML documents we need to format or style them. In practice, this often means converting the XML document to HTML. Here we'll concentrate on XSLT, a subset of XSL. XSLT is a language used to specify the transformation of XML documents. It takes an XML document and transforms it into another XML document. The HTML conversion is simply a special case of XML transformation To run the examples in this article, you need an XSL processor - a software component that implements the XSL standard. We'll use LotusXSL (version 0.19.1), which is available at no charge from www.alphaworks.ibm.com. Like most XML tools, LotusXSL is written in Java. Although you don't have to program in Java to use it, you must install either a Java Run-time Environment (JRE) or a Java Development Kit (JDK) on your computer. You can download a Java environment from Sun at java.sun.com. Basic XSLTI publish a monthly e-zine, Pineapplesoft Link. Every month, I email the e-zine to subscribers and I post a copy on my Web site. That's two formats to support - text and HTML. XML and XSL help because they enable me to write the document in one format (XML) and automatically create distribution copies in text and HTML. And because the styling is applied automatically, it's easy to change the layout of my Web site. All that's required is a change to the style sheet. In a world of changing Web fashions, this is a major advantage. Here is an abbreviated version of an article from Pineapplesoft Link that discussed XML style sheets. It's formatted in XML and clearly demonstrates the data structure, with different types of data enclosed within different tags. <?xml version="1.0"?> A Simple Style SheetOur goal is to convert the XML document into HTML. The style sheet that follows is an example of how to accomplish this. We'll look at the individual elements in detail later on, but first here's the entire style sheet. <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet <xsl:output method="html"/> <xsl:template match="/"> <xsl:template match="section/title"> <xsl:template match="article/title"> <xsl:template match="url"> <xsl:template match="url[@protocol='mailto']"> <xsl:template match="p"> <xsl:template match="abstract | date | keywords | copyright"/> </xsl:stylesheet> The style sheet is applied with LotusXSL, as explained previously. From the DOS prompt, change to the document directory and type the following command: NOTE: The LotusXSL processor won't work unless you have installed a Java run-time. If there is an error message similar to "Exception in thread "main" java.lang.NoClassDefFoundError", either the classpath is incorrect (you might have to adapt it) or you typed an incorrect class name for LotusXSL (com.lotus.xsl.Process). The parameters are self-explanatory: in is the document file (XML file), out is the result file (HTML file), xsl is the XSL file. The HTML parameter forces the processor to respect HTML syntax (for example, <BR> instead of <BR/>). If everything goes well, there is now a new HTML file, 19990101_xsl.html, in the document directory. You can view a screenshot of the HTML file by following this link. Those Elements in DetailStylesheet Element The style sheet is itself an XML document (XSL designers decided that XML was the best syntax for a style sheet). It describes the tree of the source document, the tree of the resulting document, and how to transform one into the other. To confuse matters further, the top-level element is also referred to as stylesheet: <xsl:stylesheet Because the style sheet contains elements from different documents, namespaces (the prefixes before the element names) are used to organize these elements: The xsl namespace is used for the XSL vocabulary. Its URI must be http://www.w3.org/1999/XSL/Transform The resulting document has another namespace. In this case, the default namespace is attached to HTML 4.0. Immediately after the xsl:stylesheet element comes the xsl:output element. xsl:output tells the XSL processor that we want to create an HTML document (other options are XML and text). <xsl:output method="html"/> Template Elements The bulk of the style sheet is a list of templates. The following code transforms the title of a section into an HTML paragraph with the text in italic. <xsl:template match="section/title"> So the output in our example becomes: <P><I>Styling</I></P> A template has two parts:
More on Paths The syntax for XML paths is similar to file paths. XML paths start from the root of the document and list elements along the way. Elements are separated by the "/" character. The root of the document is "/". The root is a node that sits before the top-level element. It represents the document as a whole. Here is an example. The following four paths match respectively the title of the article (XML Style Sheets), the keywords of the article, the top-most article element, and all sections in the article. Note that the last path matches several elements in the source tree. /article/title Note also that "/" points to the immediate children of a node. Therefore /article/title selects the main title of the article (XML Style Sheets) but not all the titles below the article element. It won't select the section titles. To select all the descendants from a node, use the "//" sequence. /article//title selects all the titles in the article. It selects the main title and the section titles. In the style sheet, most paths don't start at the root. XSL incorporates the notion of a current element. Paths in the match attribute can be relative to the current element. Again, this is similar to regular file systems. Double-clicking the accessories folder in the c:\program files folder moves to c:\program files\accessories folder, not to c:\accessories. If the current element is an article, then title matches /article/title but if the current article is a section, title matches one of the /article/section/title. To match any element, use the wildcard character "*". The path /article/* matches any direct descendant from article, such as title, keywords, and so on. It is possible to combine paths in a match with the "|" character, such as title | p which matches title or p elements. Matching on Attributes Paths can match on attributes, too. The following template applies only to "mailto" URLs. <xsl:template match="url[@protocol='mailto']"> This gives the following output in our earlier example: <A href="mailto:bmarchal@pineapplesoft.com">[ccc] bmarchal@pineapplesoft.com</A> It matches <url protocol="mailto">bmarchal@pineapplesoft.com</url> that has a protocol attribute with the value "mailto", but it does not match <url>http://www.w3.org/Style</url>. The more generic url path matches the later element. url[@protocol] matches URL elements that have a protocol attribute, no matter what its value is. It matches the <url protocol="http">www.w3.org/Style</url> but it does not match <url>http://www.w3.org/Style</url>. Following the ProcessorLet's follow the XSL processor for the first few templates in the style sheet. After loading the style sheet and the source document, the processor positions itself at the root of the source document. It looks for a template that matches the root and it immediately finds: <xsl:template match="/"> Because the root sits before the top-level element, it is ideal for creating the top-level element of the resulting tree. For HTML, this means it creates the HEAD and BODY tags. When it encounters xsl:appy-templates, the processor moves to the first child of the current node. The first child of the root is the top-level element or the article element. The style sheet defines no templates for article but can match template against a built-in template. Built-in templates are not defined in the style sheet. They are predefined by the processor. <xsl:template match="* | /"> The built-in template forces the processor to load the first children of article, that is, the title element. The following template matches: <xsl:template match="article/title"> Note that the processor matches on a relative path because the current node is article. It creates a paragraph in the HTML document. xsl:apply-templates loads title's children. The first and only child of title is a text node. The style sheet has no rule to match text but there is another built-in template that copies the text in the resulting tree. <xsl:template match="text()"> The title's text has no children so the processor cannot go to the next level. It backtracks to the article element and moves to the next child: the date element. This element matches the last template. <xsl:template match="abstract | date | keywords | copyright"/> This template generates no output in the resulting tree and stops processing for the current element. The processor backtracks again to article and processes its other children: copyright, abstract, keywords, and section. Copyright, abstract, and keywords match the same rule as abstract and generate no output in the resulting tree. But the subsequent section element matches the default template and so the processor moves to its children, title, and p elements. The processor continues to match rules with nodes until it has exhausted all the nodes in the original document. Creating Nodes in the Resulting TreeSometimes it is useful to compute the value or the name of new nodes. The following template creates an HTML anchor element that points to the URL. The anchor has two attributes. The first one, TARGET is specified directly in the template. However the processor computes the second attribute, HREF, when it applies the rule. Which gives the following output in our earlier example: <A target="_blank" href="http://www.w3.org/Style">[ccc] http://www.w3.org/Style</A> Customized ViewsXML lets Web designers organize a document by structure, so that changing a document's appearance is a simple matter of changing the definition of an element once, and letting the changes ripple through an entire file - or a huge Web site. In the future, more people will turn to specialized devices to view the Web. Already WebTV has achieved some success. Mobile phones and PDAs, such as the popular PalmPilot, will be increasingly used for Web browsing. The way pages are displayed has to be changed for these smaller devices. One solution may be to use XHTML, an XML simplified version of HTML. XSL will make it easy to manage the diversity of browsers and platforms by maintaining the document source in XML and converting to the appropriate XHTML subset with XSLT. Benoît Marchal is a software engineer and author based in Namur, Belgium. He has been working extensively on Java and XML. He also likes teaching. Ben runs his own software company, Pineapplesoft.
| |||||||||||||||||||||||||||||||||||||
| Suits | Ponytails | Propheads | Contact WDJ | Discuss | Web Audio | Search |
| Access FREE BIRT Developer Tools from Actuate: | ||
|
Webinar:
Automate Secure Report Delivery |
Demo:
Web-based Report Creation for End Users |
Demo:
Automate the Creation of Spreadsheet Reports |