XML to XHTML Transformations with XSLT Processors
Extensible Stylesheet Language Transformation (XSLT) is a powerful language for manipulating the data in Extensible Markup Language (XML) documents. To manipulate the data - in other words, to transform the XML document - you use an XSLT stylesheet, which contains the rules you've set up for transforming an XML document.
In this article, I'll explain how to use XSLT processors to transform XML documents to XHTML, an XML-compliant revision of HTML. The World Wide Web Consortium (W3C) introduced XHTML to succeed HTML, but neither XSLT 1.0 nor the XSLT 1.1 working draft have any special support for XML-to-XHTML transformations. That support is supposed to be coming in XSLT 2.0. However, you can still create XHTML documents with XSLT processors.
To illustrate, I'll be using an XML document called planets.xml that stores data about the planets Mercury, Venus, and Earth, such as their mass, length of their day, density, distance from the sun, and so on. The XSLT stylesheet I'll be using to transform planets.xml is called planets.xsl.
Outputting XHTML
More on XHTML
If you want to learn more about XHTML, take a look at my book Inside XML (© 2001 New Riders Publishing, ISBN 0-7357-1020-1). Or you can go to the source: the W3C XHTML 1.0 recommendation at http://www.w3.org/TR/xhtml1/, as well as the XHTML 1.1 recommendation at http://www.w3.org/TR/xhtml11/.
In addition to making sure your document adheres to the rules for XHTML (such as no standalone attributes, quoting all attribute values, using lowercase characters for markup, making sure every start tag has a corresponding closing tag, making sure the document is well-formed XML, and so on), the main issue is to make sure that a <!DOCTYPE> element appears in the result document.
Here are the <!DOCTYPE> elements you use with the three types of XHTML 1.0 - strict, transitional, and frameset (see Inside XML, mentioned in the note box, for information on how these versions are different):
NOTE
Note that breaks in the middle of a code element might happen because of spacing limitations on this page; simply continue typing so that the element stays on the previous line, as with the <!DOCTYPE> elements below. For code lines that begin all the way over at the left margin, the continued line is indented. For code lines that are indented, however, the continued line is not indented.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
And here's the <!DOCTYPE> element for XHTML 1.1:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<!DOCTYPE> Elements and HTML 4.01
Strictly speaking, even HTML documents are supposed to start with a <!DOCTYPE> element. Officially, there are three forms of HTML 4.01: strict, transitional, and frameset. Here are the complete <!DOCTYPE> elements for those versions:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
If you're producing rigidly correct HTML documents, considering adding this element to your documents. For more information, see http://www.w3.org/TR/html40/struct/global.html.
You can use the <xsl:output> element's doctype-system and doctype-public attributes to create a <!DOCTYPE> element if you set the output method to XML. Here's the <xsl:output> element that creates the <!DOCTYPE> element for transitional XHTML 1.0:
<xsl:output method="xml" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1- transitional.dtd" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" indent="yes"/>
Listing 1 shows the full stylesheet that uses this <xsl:output> element to convert planets.xml into a valid XHTML document, planets.html.
Listing 1: Transforming planets.xml into XHTML
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl= "http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1- transitional.dtd" doctype-public="-//W3C//DTD XHTML 1.0 Transitional// EN" indent="yes"/> <xsl:template match="/PLANETS"> <html> <head> <title> The Planets Table </title> </head> <body> <h1> The Planets Table </h1> <table> <tr> <td>Name</td> <td>Mass</td> <td>Radius</td> <td>Day</td> </tr> <xsl:apply-templates/> </table> </body> </html> </xsl:template> <xsl:template match="PLANET"> <tr> <td><xsl:value-of select="NAME"/></td> <td><xsl:apply-templates select="MASS"/></td> <td><xsl:apply-templates select="RADIUS"/></td> <td><xsl:apply-templates select="DAY"/></td> </tr> </xsl:template> <xsl:template match="MASS"> <xsl:value-of select="."/> <xsl:text> </xsl:text> <xsl:value-of select="@UNITS"/> </xsl:template> <xsl:template match="RADIUS"> <xsl:value-of select="."/> <xsl:text> </xsl:text> <xsl:value-of select="@UNITS"/> </xsl:template> <xsl:template match="DAY"> <xsl:value-of select="."/> <xsl:text> </xsl:text> <xsl:value-of select="@UNITS"/> </xsl:template> </xsl:stylesheet>
Listing 2 is the resulting XHTML file.
Listing 2: The Transformed XHTML File
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title> The Planets Table </title> </head> <body> <h1> The Planets Table </h1> <table> <tr> <td>Name</td> <td>Mass</td> <td>Radius</td> <td>Day</td> </tr> <tr> <td>Mercury</td> <td>.0553 (Earth = 1)</td> <td>1516 miles</td> <td>58.65 days</td> </tr> <tr> <td>Venus</td> <td>.815 (Earth = 1)</td> <td>3716 miles</td> <td>116.75 days</td> </tr> <tr> <td>Earth</td> <td>1 (Earth = 1)</td> <td>2107 miles</td> <td>1 days</td> </tr> </table> </body> </html>
This document, planets.html, validates as well-formed and valid transitional XHTML 1.0, according to the W3C HTML and XHTML validation program, which is at http://validator.w3.org/file-upload.html. Note that because XHTML documents are also well-formed XML documents, you use the XML output method, so this transformation is not too difficult; the only issue that takes a little thought is creating the <!DOCTYPE> element.