Creating XSLT Style Sheets
XSLT transformations accept a document tree as input and produce a tree as output. From the XSLT point of view, documents are trees built of nodes, and there are seven types of nodes XSLT recognizes; here are those nodes, and how XSLT processors treat them:
Node |
Description |
Document root |
Is the very start of the document |
Attribute |
Holds the value of an attribute after entity references have been expanded and surrounding whitespace has been trimmed |
Comment |
Holds the text of a comment, not including <!-- and --> |
Element |
Consists of all character data in the element, which includes character data in any of the children of the element |
Namespace |
Holds the namespaces URI |
Processing instruction |
Holds the text of the processing instruction, which does not include <? and ?> |
Text |
Holds the text of the node |
To indicate what node or nodes you want to work on, XSLT supports various ways of matching or selecting nodes. For example, the character / stands for the root node. To get us started, Ill create a short example here that will replace the root nodeand, therefore, the whole documentwith an HTML page.
As you might expect, XSLT style sheets must be well-formed XML documents, so you start a style sheet with the XML declaration. Next, you use a <stylesheet> element; XSLT style sheets use the namespace xsl, which, now that XSLT has been standardized, corresponds to http://www.w3.org/1999/ XSL/Transform. You must also include the version attribute in the <stylesheet> element, setting that attribute to the only current version, 1.0:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> . . .
Thats how you start an XSLT style sheet (in fact, if youre using a standalone program that requires you to give the name of the style sheet youre using, you can usually omit the <xsl:stylesheet> element). To work with specific nodes in an XML document, XSLT uses templates. When you match or select nodes, a template tells the XSLT processor how to transform the node for output. In this example, I want to replace the root node with a whole new HTML document, so I start by creating a template with the <xsl:template> element, setting the match attribute to the node to match, "/":
<?xml version="1.0"?> <xsl:stylesheet version="1.0"xmlns:xsl=">http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> . . . </xsl:template> </xsl:stylesheet>
When the root node is matched, the template is applied to that node. In this case, I want to replace the root node with an HTML document, so I just include that HTML document directly as the content of the <xsl:template> element:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <HTML> <HEAD> <TITLE> A trivial transformation </TITLE> </HEAD> <BODY> This transformation has replaced the entire document. </BODY> </HTML> </xsl:template> </xsl:stylesheet>
And thats all it takes; by using the <xsl:template> element, Ive set up a rule in the style sheet. When the XSL processor reads the document, the first node that it sees is the root node. This rule matches that root node, so the XSL processor replaces it with the HTML document, producing this result:
<HTML> <HEAD> <TITLE> A trivial transformation </TITLE> </HEAD> <BODY> This transformation has replaced the entire document. </BODY> </HTML>
Thats our first, rudimentary transformation. All weve done is replace the entire document with another one. But, of course, thats just the beginning.
The xsl:apply-templates Element
The template I used in the previous section applied to only one nodethe root nodeand performed a trivial action, replacing the entire XML document with an HTML document. However, you can also apply templates to the children of a node that youve matched, and you do that with the <xsl:apply-templates> element.
For example, say that I want to convert planets.xml to HTML. The document node in that document is <PLANETS>, so I can match that element with a template, setting the match attribute to the name of the element I want to match. Then I replace the <PLANETS> element with an <HTML> element, like this:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS"> <HTML> . . . </HTML> </xsl:template> . . . </xsl:stylesheet>
But what about the children of the <PLANETS> element? To make sure that they are transformed correctly, you use the <xsl:apply-templates> element this way:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS"> <HTML> <xsl:apply-templates/> </HTML> </xsl:template> . . . </xsl:stylesheet>
Now you can provide templates for the child nodes. In this case, Ill just replace each of the three <PLANET> elements with some text, which I place directly into the template for the <PLANET> element:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS"> <HTML> <xsl:apply-templates/> </HTML> </xsl:template> <xsl:template match="PLANET"> <P> Planet data will go here.... </P> </xsl:template> </xsl:stylesheet>
And thats it; now the <PLANETS> element is replaced by an <HTML> element, and the <PLANET> elements are also replaced:
<HTML> <P> Planet data will go here.... </P> <P> Planet data will go here.... </P> <P> Planet data will go here.... </P> </HTML>
You can see that this transformation works, but its still less than useful; all weve done is replace the <PLANET> elements with some text. What if we wanted to access some of the data in the <PLANET> element? For example, say that we wanted to place the text from the <NAME> element in each <PLANET> element in the output document:
<PLANET> <NAME>Mercury</NAME> <MASS UNITS="(Earth = 1)">.0553</MASS> <DAY UNITS="days">58.65</DAY> <RADIUS UNITS="miles">1516</RADIUS> <DENSITY UNITS="(Earth = 1)">.983</DENSITY> <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion--> </PLANET>
To gain access to this kind of data, you can use the select attribute of the <xsl:value-of> element.
Getting the Value of Nodes with xsl:value-of
In this example, Ill extract the name of each planet and insert that name into the output document. To get the name of each planet, Ill use the <xsl:value-of> element in a template targeted at the <PLANET> element, and Ill select the <NAME> element with the select attribute like this:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS"> <HTML> <xsl:apply-templates/> </HTML> </xsl:template> <xsl:template match="PLANET"> <xsl:value-of select="NAME"/> </xsl:template> </xsl:stylesheet>
Using select like this, you can select nodes. The select attribute is much like the match attribute of the <xsl:template> element, except that the select attribute is more powerful. With it, you can specify the node or nodes to select using the full XPath XML specification, as well see later in this chapter. The select attribute is an attribute of the <xsl:apply-templates>, <xsl:value-of>, <xsl:for-each>, and <xsl:sort> elements, all of which well also see in this chapter.
Applying the previous style sheet, the <xsl:value-of select="NAME"/> element directs the XSLT processor to insert the name of each planet into the output document, so that document looks like this:
<HTML> Mercury Venus Earth </HTML>
Handling Multiple Selections with xsl:for-each
The select attribute selects only the first node that matches its selection criterion. What if you have multiple nodes that could match? For example, say that you can have multiple <NAME> elements for each planet:
<PLANET> <NAME>Mercury</NAME> <NAME>Closest planet to the sun</NAME> <MASS UNITS="(Earth = 1)">.0553</MASS> <DAY UNITS="days">58.65</DAY> <RADIUS UNITS="miles">1516</RADIUS> <DENSITY UNITS="(Earth = 1)">.983</DENSITY> <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion--> </PLANET>
The <xsl:value-of> elements select attribute by itself will select only the first <NAME> element; to loop over all possible matches, you can use the <xsl:for-each> element like this:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS"> <HTML> <xsl:apply-templates/> </HTML> </xsl:template> <xsl:template match="PLANET"> <xsl:for-each select="NAME"> <P> <xsl:value-of select="."/> </P> </xsl:for-each> </xsl:template> </xsl:stylesheet>
This style sheet will catch all <NAME> elements, place their values in a <P> element, and add them to the output document, like this:
<HTML> <P>Mercury</P> <P>Closest planet to the sun</P> <P>Venus</P> <P>Earth</P> </HTML>
Weve seen now that you can use the match and select attributes to indicate what nodes you want to work with. The actual syntax that you can use with these attributes is fairly complex but worth knowing. Ill take a look at the match attribute in more detail first, and Ill examine the select attribute later in this chapter.
Specifying Patterns for the match Attribute
You can use an involved syntax with the <xsl:template> elements match attribute, and an even more involved syntax with the select attribute of the <xsl:apply-templates>, <xsl:value-of>, <xsl:for-each>, <xsl:copy-of>, and <xsl:sort> elements. Well see them both in this chapter, starting with the syntax you can use with the match attribute.
Matching the Root Node
As weve already seen, you can match the root node with /, like this:
<xsl:template match="/"> <HTML> <xsl:apply-templates/> </HTML> </xsl:template>
Matching Elements
You can match specific XML elements simply by giving their name, as weve also seen:
<xsl:template match="PLANETS"> <HTML> <xsl:apply-templates/> </HTML> </xsl:template>
Matching Children
You can use the / operator to separate element names when you want to refer to a child of a particular node. For example, say that you wanted to create a rule that applies only to <NAME> elements that are children of <PLANET> elements. In that case, you can match to the expression "PLANET/NAME". Heres a rule that will surround the text of such elements in an <H3> element:
<xsl:template match="PLANET/NAME"> <H3><xsl:value-of select="."/></H3> </xsl:template>
Notice the expression "." here. You use "." with the select attribute to specify the current node, as well see when discussing the select attribute.
You can also use the * character as a wildcard, standing for any element (* can match only elements). For example, this rule applies to all <NAME> elements that are grandchildren of <PLANET> elements:
<xsl:template match="PLANET/*/NAME"> <H3><xsl:value-of select="."/></H3> </xsl:template>
Matching Element Descendants
In the previous section, I used the expression "PLANET/NAME" to match all <NAME> elements that are direct children of <PLANET> elements, and I used the expression "PLANET/*/NAME" to match all <NAME> elements that are grandchildren of <PLANET> elements. However, theres an easier way to perform both matches: Just use the expression "PLANET//NAME", which matches all <NAME> elements that are inside <PLANET> elements, no matter how many levels deep. (The matched elements are called descendants of the <PLANET> element). In other words, "PLANET//NAME" matches "PLANET/NAME", "PLANET/*/NAME", "PLANET/*/*/NAME", and so on:
<xsl:template match="PLANETS//NAME"> <H3><xsl:value-of select="."/></H3> </xsl:template>
Matching Attributes
You can match attributes if you preface their name with @. Heres an example; in this case, Ill display the data in planets.xml in an HTML table. You might note, however, that the units for the various measurements are stored in attributes, like this:
<PLANET> <NAME>Earth</NAME> <MASS UNITS="(Earth = 1)">1</MASS> <DAY UNITS="days">1</DAY> <RADIUS UNITS="miles">2107</RADIUS> <DENSITY UNITS="(Earth = 1)">1</DENSITY> <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion--> </PLANET>
To recover the units and display them as well as the values for the mass and so on, Ill match the UNITS attribute with @UNITS. Heres how that looksnote that Im using the element <xsl:text> element to insert a space into the output document (more on <xsl:text> later):
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/PLANETS"> <HTML> <HEAD> <TITLE> The Planets Table </TITLE> </HEAD> <BODY> <H1> The Planets Table </H1> <TABLE> <TD>Name</TD> <TD>Mass</TD> <TD>Radius</TD> <TD>Day</TD> <xsl:apply-templates/> </TABLE> </BODY> </HTML> </xsl:template> <xsl:template match="PLANET"> <TR> <TD><xsl:value-of select="NAME"/></TD> <TD><xsl:apply-templates select="MASS"/></TD> <TD><xsl:apply-templates select="RADIUS"/></TD> </TR> </xsl:template> <xsl:template match="MASS"> <xsl:value-of select="."/> <xsl:text> </xsl:text> <xsl:value-of select="@UNITS"/> </xsl:template> <xsl:template match="RADIUS"> <xsl:value-of select="."/> <xsl:text> </xsl:text> <xsl:value-of select="@UNITS"/> </xsl:template> <xsl:template match="DAY"> <xsl:value-of select="."/> <xsl:text> </xsl:text> <xsl:value-of select="@UNITS"/> </xsl:template> </xsl:stylesheet>
Now the resulting HTML table includes not only values, but also their units of measurement. (The spacing leaves a little to be desired, but HTML browsers will have no problem with it; well take a look at ways of handling whitespace later in this chapter.)
<HTML> <HEAD> <TITLE> The Planets Table </TITLE> </HEAD> <BODY> <H1> The Planets Table </H1> <TABLE> <TD>Name</TD><TD>Mass</TD><TD>Radius</TD><TD>Day</TD> <TR> <TD>Mercury</TD><TD>.0553 (Earth = 1)</TD><TD>1516 miles</TD> </TR> <TR> <TD>Venus</TD><TD>.815 (Earth = 1)</TD><TD>3716 miles</TD> </TR> <TR> <TD>Earth</TD><TD>1 (Earth = 1)</TD><TD>2107 miles</TD> </TR> </TABLE> </BODY> </HTML>
You can also use the @* wildcard to select all attributes of an element. For example, "PLANET/@*" selects all attributes of <PLANET> elements.
Matching by ID
You can also match elements that have a specific ID value using the pattern id(). To use this selector, you must give elements an ID attribute, and you must declare that attribute of type ID, as you can do in a DTD. Heres an example rule that adds the text of all elements that have the ID Christine:
<xsl:template match = "id(Christine)"> <H3><xsl:value-of select="."/></H3> </xsl:template>
Matching Comments
You can match the text of comments with the pattern comment(). You should not store data that should go into the output document in comments in the input document, of course. However, you might want to convert comments from the <!--comment--> form into something another markup language might use, such as a <COMMENT> element.
Heres an example; planet.xml was designed to include comments so that we could see how to extract them:
<PLANET> <NAME>Venus</NAME> <MASS UNITS="(Earth = 1)">.815</MASS> <DAY UNITS="days">116.75</DAY> <RADIUS UNITS="miles">3716</RADIUS> <DENSITY UNITS="(Earth = 1)">.943</DENSITY> <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion--> </PLANET>
To extract comments and put them into <COMMENT> elements, Ill include a rule just for comments:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS"> <HTML> <xsl:apply-templates/> </HTML> </xsl:template> <xsl:template match="comment()"> <COMMENT> <xsl:value-of select="."/> </COMMENT> </xsl:template> </xsl:stylesheet>
Heres what the result is for Venus, where Ive transformed the comment into a <COMMENT> element:
Venus .815 116.75 3716 .943 66.8<COMMENT>At perihelion</COMMENT>
Note that the text for the other elements in the <PLANET> element is also inserted into the output document. The reason for that is that the default rule for each element is to include its text in the output document. Because I havent provided a rule for elements, their text is simply included in the output document. Ill take a closer look at default rules later in the chapter.
Matching Text Nodes with text()
You can match the text in a node with the pattern text(). Theres really not much reason to ever use text(), however, because XSLT includes a default rule: If there are no other rules for a text node, the text in that node is inserted into the output document. If you were to make that default rule explicit, it might look like this:
<xsl:template match="text()"> <xsl:value-of select="."/> </xsl:template>
You can override this rule by not sending the text in text nodes to the output document, like this:
<xsl:template match="text()"> </xsl:template>
In the previous example, you can see that a great deal of text made it from the input document to the output document because there was no explicit rule besides the default one for text nodesthe only output rule that I used was for comments. If you turn off the default rule for text nodes by adding the previous two lines to the version of planets.xsl used in the previous example, the text of those text nodes does not go into the output document. This is the result:
<HTML> <COMMENT>At perihelion</COMMENT> <COMMENT>At perihelion</COMMENT> <COMMENT>At perihelion</COMMENT> </HTML>
Matching Processing Instructions
You can use the pattern processing-instruction() to match processing instructions.
<xsl:template match="/processing-instruction()"> <I> Found a processing instruction. </I> </xsl:template>
You can also specify what processing instruction you want to match by giving the name of the processing instruction (excluding <? and ?>), as in this case, where Im matching the processing instruction <?xml-include?>:
<xsl:template match="/processing-instruction(xml-include)"> <I> Found an xml-include processing instruction. </I> </xsl:template>
One of the major reasons that XML makes a distinction between the root node (at the very beginning of the document) and the document node is so that you have access to the processing instructions and other nodes in the documents prolog.
Using the Or Operator
You can match to a number of possible patterns, which is very useful when your documents get a little more involved than the ones weve been using so far in this chapter. Heres an example; in this case, I want to display <NAME> and <MASS> elements in bold, which Ill do with the HTML <B> tag. To match either <NAME> or <MASS> elements, Ill use the Or operator, which is a vertical bar (|), in a new rule, like this:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS"> <HTML> <xsl:apply-templates/> </HTML> </xsl:template> <xsl:template match="PLANET"> <P> <xsl:apply-templates/> </P> </xsl:template> <xsl:template match="NAME | MASS"> <B> <xsl:apply-templates/> </B> </xsl:template> </xsl:stylesheet>
Here are the results; note that the name and mass values are both enclosed in <B> elements. (Also note that, because of the XSL default rules, the text from the other child elements of the <PLANET> element is also displayed.)
<HTML> <P> <B>Mercury</B> <B>.0553</B> 58.65 1516 .983 43.4 </P> <P> <B>Venus</B> <B>.815</B> 116.75 3716 .943 66.8 </P> <P> <B>Earth</B> <B>1</B> 1 2107 1 128.4 </P> </HTML>
You can use any valid pattern with the | operator, such as expressions like PLANET | PLANET//NAME, and you can use multiple | operators, such as NAME | MASS | DAY, and so on.
Testing with []
You can use the [] operator to test whether a certain condition is true. For example, you can test the following:
-
The value of an attribute in a given string
-
The value of an element
-
Whether an element encloses a particular child, attribute, or other element
-
The position of a node in the node tree
Here are some examples:
- This expression matches <PLANET> elements that have child <NAME> elements:
<xsl:template match = "PLANET[NAME]">
- This expression matches any element that has a <NAME> child element:
<xsl:template match = "*[NAME]">
- This expression matches any <PLANET> element that has either a <NAME> or a <MASS> child element:
<xsl:template match="PLANET[NAME | MASS]">
Say that we gave the <PLANET> elements in planets.xml a new attributeCOLORwhich holds the planets color:
<?xml version="1.0"?> <?xml-stylesheet type="text/xml" href="planets.xsl"?> <PLANETS> <PLANET COLOR="RED">
<NAME>Mercury</NAME> <MASS UNITS="(Earth = 1)">.0553</MASS> <DAY UNITS="days">58.65</DAY> <RADIUS UNITS="miles">1516</RADIUS> <DENSITY UNITS="(Earth = 1)">.983</DENSITY> <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion--> </PLANET> <PLANET COLOR="WHITE"> <NAME>Venus</NAME> <MASS UNITS="(Earth = 1)">.815</MASS> <DAY UNITS="days">116.75</DAY> <RADIUS UNITS="miles">3716</RADIUS> <DENSITY UNITS="(Earth = 1)">.943</DENSITY> <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion--> </PLANET> <PLANET COLOR="BLUE"> <NAME>Earth</NAME> <MASS UNITS="(Earth = 1)">1</MASS> <DAY UNITS="days">1</DAY> <RADIUS UNITS="miles">2107</RADIUS> <DENSITY UNITS="(Earth = 1)">1</DENSITY> <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion--> </PLANET> </PLANETS>
This expression matches <PLANET> elements that have COLOR attributes:
<xsl:template match="PLANET[@COLOR]">
What if you wanted to match planets whose COLOR attribute was BLUE? You can do that with the = operator, like this:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS"> <HTML> <xsl:apply-templates/> </HTML> </xsl:template> <xsl:template match="PLANET[@COLOR = BLUE]"> The <xsl:value-of select="NAME"/> is blue. </xsl:template> <xsl:template match="text()"> </xsl:template> </xsl:stylesheet>
This style sheet filters out all planets whose color is blue and omits the others by turning off the default rule for text nodes. Heres the result:
<HTML> The Earth is blue. </HTML>
In fact, the expressions you can use in the [] operators are W3C XPath expressions. XPath expressions give you ways of specifying nodes in an XML document using a fairly involved syntax. And because the select attribute, which were about to cover, uses XPath, Ill take a look at XPath as well.
Specifying Patterns for the select Attribute
Ive taken a look at the kinds of expressions that you can use with the <xsl:template> elements match attribute. You can use an even more involved syntax with the select attribute of the <xsl:apply-templates>, <xsl:value-of>, <xsl:for-each>, <xsl:copy-of>, and <xsl:sort> elements.
The select attribute uses XPath expressions, which is a W3C recommendation as of November 16, 1999. You can find the XPath specification at www.w3.org/TR/xpath.
Weve seen that you can use the match attribute to find nodes by name, child element(s), attributes, or even descendant. Weve also seen that you can make some tests to see whether elements or attributes have certain values. You can do all that and more with the XPath specification supported by the select attribute, including finding nodes by parent or sibling elements, as well as much more involved tests. XPath is much more of a true language than the expressions you can use with the match attribute; for example, XPath expressions can return not only lists of nodes, but also Boolean, string, and numeric values.
The XML for Java package has a handy example program, ApplyXPath.java, that enables you to apply an XPath expression to a document and see what the results would be. This is great for testing. For example, if I applied the XPath expression "PLANET/NAME" to planets.xml, here is what the result would look like, displaying the values of all <NAME> elements that are children of <PLANET> elements (the <output> tags are added by ApplyXPath):
%java ApplyXPath planets.xml PLANET/NAME <output> <NAME>Mercury</NAME><NAME>Venus</NAME><NAME>Earth</NAME></output>
XPath expressions are more powerful than the match expressions weve seen; for one thing, theyre not restricted to working with the current node or child nodes because you can work with parent nodes, ancestor nodes, and more. Specifying what node you want to work in relation to is called
specifying an axis in XPath. Ill take a look at XPath syntax in detail next.
Understanding XPath
To specify a node or set of nodes in XPath, you use a location path. A location path, in turn, consists of one or more location steps, separated by / or //. If you start the location path with /, the location path is called an absolute location path because youre specifying the path from the root node; otherwise, the location path is relative, starting with the current node, which is called the context node. Got all that? Good, because theres more.
A location step is made up of an axis, a node test, and zero or more predicates. For example, in the expression child::PLANET[position() = 5], child is the name of the axis, PLANET is the node test, and [position() = 5] is a predicate. You can create location paths with one or more location steps, such as /descendant::PLANET/child::NAME, which selects all the <NAME> elements that have a <PLANET> parent. The best way to understand all this is by example, and well see plenty of them in a few pages. In the meantime, Ill take a look at what kind of axes, node tests, and predicates XPath supports.
XPath Axes
In the location path child::NAME, which refers to a <NAME> element that is a child of the current node, the child is called the axis. XPath supports many different axes, and its important to know what they are. Heres the list:
Axis |
Description |
ancestor |
Holds the ancestors of the context node. The ancestors of the context node are the parent of context node and the parents parent and so forth, back to and including the root node. |
ancestor-or-self |
Holds the context node and the ancestors of the context node. |
attribute |
Holds the attributes of the context node. |
child |
Holds the children of the context node. |
descendant |
Holds the descendants of the context node. A descendant is a child or a child of a child, and so on. |
descendant-or-self |
Contains the context node and the descendants of the context node. |
following |
Holds all nodes in the same document as the context node that come after the context node. |
following-sibling |
Holds all the following siblings of the context node. A sibling is a node on the same level as the context node. |
namespace |
Holds the namespace nodes of the context node. |
parent |
Holds the parent of the context node. |
preceding |
Contains all nodes that come before the context node. |
preceding-sibling |
Contains all the preceding siblings of the context node. A sibling is a node on the same level as the context node. |
self |
Contains the context node. |
You can use axes to specify a location step or path, as in this example, where Im using the child axis to indicate that I want to match to child nodes of the context node, which is a <PLANET> element. (Well see later that an
abbreviated version lets you omit the child:: part.)
<xsl:template match="PLANET"> <HTML> <CENTER> <xsl:value-of select="child::NAME"/> </CENTER> <CENTER> <xsl:value-of select="child::MASS"/> </CENTER> <CENTER> <xsl:value-of select="child::DAY"/> </CENTER> </HTML> </xsl:template>
In these expressions, child is the axis, and the element names NAME, MASS, and DAY are node tests.
XPath Node Tests
You can use names of nodes as node tests, or you can use the wild card * to select element nodes. For example, the expression child::*/child::NAME selects all <NAME> elements that are grandchildren of the context node. Besides nodes and the wild card character, you can also use these node tests:
Node Test |
Description |
comment() |
Selects comment nodes. |
node() |
Selects any type of node. |
processing-instruction() |
Selects a processing instruction node. You can specify the name of the processing instruction to select in the parentheses. |
text() |
Selects a text node. |
XPath Predicates
The predicate part of an XPath step is perhaps its most intriguing part because it gives you the most power. You can work with all kinds of expressions in predicates; here are the possible types:
-
Node sets
-
Booleans
-
Numbers
-
Strings
-
Result tree fragments
Ill take a look at these various types in turn.
XPath Node Sets
As its name implies, a node set is simply a set of nodes. An expression such as child::PLANET returns a node set of all <PLANET> elements. The expression child::PLANET/child::NAME returns a node list of all <NAME> elements that are children of <PLANET> elements. To select a node or nodes from a node set, you can use various functions that work on node sets in predicates.
Function |
Description |
last() |
Returns the number of nodes in a node set. |
position() |
Returns the position of the context node in the context node set (starting with 1). |
count(node-set) |
Returns the number of nodes in node-set. Omitting node-set makes this function use the context node. |
id(string ID) |
Returns a node set containing the element whose ID matches the string passed to the function, or returns an empty node set if no element has the specified ID. You can list multiple IDs separated by whitespace, and this function will return a node set of the elements with those IDs. |
local-name(node-set) |
Returns the local name of the first node in the node set. Omitting node-set makes this function use the context node. |
namespace-uri(node-set) |
Returns the URI of the namespace of the first node in the node set. Omitting node-set makes this function use the context node. |
name(node-set) |
Returns the full, qualified name of the first node in the node set. Omitting node-set makes this function use the context node. |
Heres an example; in this case, Ill number the elements in the output document using the position() function:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS"> <HTML> <HEAD> <TITLE> The Planets </TITLE> </HEAD> <BODY> <xsl:apply-templates select="PLANET"/> </BODY> </HTML> </xsl:template> <xsl:template match="PLANET"> <P> <xsl:value-of select="position()"/>. <xsl:value-of select="NAME"/> </P> </xsl:template> </xsl:stylesheet>
Heres the result, where you can see that the planets are numbered:
<HTML> <HEAD> <TITLE> The Planets </TITLE> </HEAD> <BODY> <P>1. Mercury</P> <P>2. Venus</P> <P>3. Earth</P> </BODY> </HTML>
You can use functions that operate on node sets in predicates, as in child::PLANET[position() = last()], which selects the last <PLANET> child of the context node.
XPath Booleans
You can also use Boolean values in XPath expressions. Numbers are considered false if theyre zero and are considered true otherwise. An empty string ("") is also considered false, and all other strings are considered true.
You can use XPath logical operators to produce Boolean true/false results; here are the logical operators:
Operator |
Description |
!= |
Is not equal to. |
< |
Is less than. (Use < in XML documents.) |
<= |
Is less than or equal to. (Use <= in XML documents.) |
= |
Is equal to. (C, C++, Java, JavaScript programmers take notethis operator is one = sign, not two.) |
> |
Is greater than. |
>= |
Is greater than or equal to. |
You shouldnt use < directly in XML documents; use the entity reference < instead.
You can also use the keywords and and or to connect Boolean clauses with a logical And or Or operation, as weve seen when working with JavaScript and Java.
Heres an example using the logical operator >. This rule applies to all <PLANET> elements after position 5:
<xsl:template match="PLANET[position() > 5]"> <xsl:value-of select="."/></xsl:template>
There is also a true() functions that always returns a value of true, and a false() function that always returns a value of false.
You can also use the not() function to reverse the logical sense of an expression, as in this case, where Im selecting all but the last <PLANET> element:
<xsl:template match="PLANET[not(position() = last())]"> <xsl:value-of select="."/> </xsl:template>
Finally, the lang() function returns true or false, depending on whether the language of the context node (which is given by xml:lang attributes) is the same as the language you pass to this function.
XPath Numbers
In XPath, numbers are actually stored as in double-precision floating-point format. (See Chapter 10, "Understanding Java," for more details on doubles; technically speaking, all XPath numbers are stored in 64-bit IEEE 754 floating-point double-precision format.) All numbers are stored as doubles, even integers such as 5, as in the example we just saw:
<xsl:template match="PLANET[position() > 5]"> <xsl:value-of select="."/> </xsl:template>
You can use several operators on numbers:
Operator |
Action |
+ |
Adds. |
- |
Subtracts. |
* |
Multiplies. |
div |
Divides. (The / character, which stands for division in other languages, is already heavily used in XML and XPath.) |
mod |
Returns the modulus of two numbers (the remainder after dividing the first by the second). |
For example, the element <xsl:value-of select="180 + 420"/> inserts the string "600" into the output document. This example selects all planets whose day (measured in earth days) divided by its mass (where the mass of Earth = 1) is greater than 100:
<xsl:template match="PLANETS"> <HTML> <BODY> <xsl:apply-templates select="PLANET[DAY div MASS > 100]"/> </BODY> </HTML> </xsl:template>
XPath also supports these functions that operate on numbers:
Function |
Description |
ceiling() |
Returns the smallest integer larger than the number that you pass it |
floor() |
Returns the largest integer smaller than the number that you pass it |
round() |
Rounds the number that you pass it to the nearest integer |
sum() |
Returns the sum of the numbers that you pass it |
For example, heres how you can find the average mass of the planets in planets.xml:
<xsl:template match="PLANETS"> <HTML> <BODY> The average planetary mass is: <xsl:value-of select="sum(child::MASS) div count(descendant::MASS)"/> </BODY> </HTML> </xsl:template>
XPath Strings
In XPath, strings are made up of Unicode characters. A number of functions are specially designed to work on strings, as shown in this table.
Function |
Description |
starts-with(string string1, string string2) |
Returns true if the first string starts with the second string |
contains(string string1, string string2) |
Returns true if the first string contains the second one |
substring(string string1,number offset, number length) |
Returns length characters from the string, starting at offset |
substring-before(string string1, string string2) |
Returns the part of string1 up to the first occurrence of string2 |
substring-after(string string1, string string2) |
Returns the part of string1 after the first occurrence of string2 |
string-length(string string1) |
Returns the number of characters in string1 |
normalize-space(string string1) |
Returns string1 after leading and trailing whitespace is stripped and multiple consecutive whitespace is replaced with a single space |
translate(string string1, string string2, string string3) |
Returns string1 with all occurrences of the characters in string2 replaced by the matching characters in string3 |
concat(string string1, string string2, ...) |
Returns all strings concatenated (that is, joined) together |
format-number(number number1, string string2, string string3) |
Returns a string holding the formatted string version of number1, using string2 as a formatting string (create formatting strings as you would for Javas java.text.DecimalFormat method), and string3 as the optional locale string |
XPath Result Tree Fragments
A result tree fragment is a part of an XML document that is not a complete node or complete set of nodes. You can create result tree fragments in various ways, such as with the document() function when you point to somewhere inside another document.
You really cant do much with result tree fragments in XPath. Actually, you can do only two things: use the string() or boolean() functions to turn them into strings or Booleans.
XPath Examples
Weve seen a lot of XPath in theory; how about some examples? Heres a number of location path examplesnote that XPath enables you to use and or or in predicates to apply logical tests using multiple patterns.
Example |
Action |
child::PLANET |
Returns the <PLANET> element children of the context node. |
child::* |
Returns all element children (* only matches elements) of the context node. |
child::text() |
Returns all text node children of the context node. |
child::node() |
Returns all the children of the context node, no matter what their node type is. |
attribute::UNIT |
Returns the UNIT attribute of the context node. |
descendant::PLANET |
Returns the <PLANET> element descendants of the context node. |
ancestor::PLANET |
Returns all <PLANET> ancestors of the context node. |
ancestor-or-self::PLANET |
Returns the <PLANET> ancestors of the context node. If the context node is a <PLANET> as well, also returns the context node. |
descendant-or-self::PLANET |
Returns the <PLANET> element descendants of the context node. If the context node is a <PLANET> as well, also returns the context node. |
self::PLANET |
Returns the context node if it is a <PLANET> element. |
child::NAME/descendant::PLANET |
Returns the <PLANET> element descendants of the child <NAME> elements of the context node. |
child::*/child::PLANET |
Returns all <PLANET> grandchildren of the context node. |
/ | Returns the document root (that is, the parent of the document element). |
/descendant::PLANET |
Returns all the <PLANET> elements in the document. |
/descendant::PLANET/child::NAME |
Returns all the <NAME> elements that have a <PLANET> parent. |
child::PLANET[position() = 3] |
Returns the third <PLANET> child of the context node. |
child::PLANET[position() = last()] |
Returns the last <PLANET> child of the context node. |
/descendant::PLANET[position() = 3] |
Returns the third <PLANET> element in the document. |
child::PLANETS/child::PLANET[position() = 4 ]/child::NAME[position() = 3] |
Returns the third <NAME> element of the fourth <PLANET> element of the <PLANETS> element. |
child::PLANET[position() > 3] |
Returns all the <PLANET> children of the context node after the first three. |
preceding-sibling::NAME[position() = 2] |
Returns the second previous <NAME> sibling element of the context node. |
child::PLANET[attribute:: COLOR = "RED"] |
Returns all <PLANET> children of the context node that have a COLOR attribute with value of RED. |
child::PLANET[attribute::]COLOR = "RED"][position() = 3 |
Returns the third <PLANET> child of the context node that has a COLOR attribute with value of RED. |
child::PLANET[position() = 3][attribute::COLOR="RED"] |
Returns the third <PLANET> child of the context node, only if that child has a COLOR attribute with value of RED. |
child::MASS[child::NAME = "VENUS" ] |
Returns the <MASS> children of the context node that have <NAME> children whose text is VENUS. |
child::PLANET[child::NAME] |
Returns the <PLANET> children of the context node that have <NAME> children. |
child::*[self::NAME or self::MASS ] |
Returns both the <NAME> and <MASS> children of the context node. |
child::*[self::NAME or self::MASS][position() = first()] |
Returns the first <NAME> or <MASS> child of the context node. |
As you can see, some of this syntax is pretty involved and a little lengthy to type. However, there is an abbreviated form of XPath syntax.
XPath Abbreviated Syntax
You can take advantage of a number of abbreviations in XPath syntax. Here are the rules:
Expression |
Abbreviation |
self::node() |
. |
parent::node() | .. |
child::childname | childname |
attribute::childname | @childname |
/descendant-or-self::node()/ |
// |
You can also abbreviate predicate expressions such as [position() = 3] as [3], [position() = last()] as [last()], and so on. Using the abbreviated syntax makes XPath expressions a lot easier to use. Here are some examples of location paths using abbreviated syntaxnote how well these fit the syntax we saw with the match attribute earlier in the chapter:
Path |
Description |
PLANET |
Returns the <PLANET> element children of the context node. |
* |
Returns all element children of the context node. |
text() |
Returns all text node children of the context node. |
@UNITS |
Returns the UNITS attribute of the context node. |
@* |
Returns all the attributes of the context node. |
PLANET[3] |
Returns the third <PLANET> child of the context node. |
PLANET[first()] |
Returns the first <PLANET> child of the context node |
*/PLANET |
Returns all <PLANET> grandchildren of the context node. |
/PLANETS/PLANET[3]/NAME[2] |
Returns the second <NAME> element of the third <PLANET> element of the <PLANETS> element. |
//PLANET |
Returns all the <PLANET> descendants of the document root. |
PLANETS//PLANET |
Returns the <PLANET> element descendants of the <PLANETS> element children of the context node. |
//PLANET/NAME |
Returns all the <NAME> elements that have an <PLANET> parent. |
. |
Returns the context node itself. |
.//PLANET |
Returns the <PLANET> element descendants of the context node. |
.. |
Returns the parent of the context node. |
../@UNITS |
Returns the UNITS attribute of the parent of the context node. |
PLANET[NAME] |
Returns the <PLANET> children of the context node that have <NAME> children. |
PLANET[NAME="Venus"] |
Returns the <PLANET> children of the context node that have <NAME> children with text equal to Venus. |
PLANET[@UNITS = "days"] |
Returns all <PLANET> children of the context node that have a UNITS attribute with value days. |
PLANET[6][@UNITS = "days"] |
Returns the sixth <PLANET> child of the context node, only if that child has a UNITS attribute with value days. Can also be written as PLANET[@UNITS = "days"][6]. |
PLANET[@COLOR and @UNITS] |
Returns all the <PLANET> children of the context node that have both a COLOR attribute and a UNITS attribute. |
Heres an example in which I put the abbreviated syntax to work, moving up and down inside a <PLANET> element:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="PLANETS"> <HTML> <xsl:apply-templates select="PLANET"/> </HTML> </xsl:template> <xsl:template match="PLANET"> <xsl:apply-templates select="MASS"/> </xsl:template> <xsl:template match="MASS"> <xsl:value-of select="../NAME"/> <xsl:value-of select="../DAY"/> <xsl:value-of select="."/> </xsl:template> </xsl:stylesheet>
Default XSLT Rules
XSLT has some built-in, default rules that weve already seen in action. For example, the default rule for text nodes is to add the text in that node to the output document.
The most important default rule applies to elements and can be expressed like this:
<xsl:template match="/ | *"> <xsl:apply-templates/> </xsl:template>
This rule is simply there to make sure that every element, from the root on down, is processed with <xsl:apply-templates/> if you dont supply some other rule. If you do supply another rule, it overrides the corresponding default rule.
The default rule for text can be expressed like this, where, by default, the text of a text node is added to the output document:
<xsl:template match="text()"> <xsl:value-of select="."/> </xsl:template>
The same kind of default rule applies to attributes, which are added to the output document with a default rule like this:
<xsl:template match="@*"> <xsl:value-of select="."/> </xsl:template>
By default, processing instructions are not inserted in the output document, so their default rule can be expressed simply like this:
<xsl:template match="processing-instruction()"/>
The same goes for comments, whose default rule can be expressed this way:
<xsl:template match="comment()"/>
The upshot of the default rules is that if you dont supply any rules at all, all the parsed character data in the input document is inserted in the output document. Heres what an XSLT style sheet with no explicit rules looks like:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> </xsl:stylesheet>
Heres the results of applying this style sheet to planet.xml:
<?xml version="1.0" encoding="UTF-8"?> Mercury .0553 58.65 1516 .983 43.4 Venus .815 116.75 3716 .943 66.8 Earth 1 1 2107 1 128.4
XSLT Rules and Internet Explorer
One of the problems of working with XSLT in Internet Explorer is that that browser doesnt supply any default rules. You have to supply all the rules yourself.