Handling Documents and Irregular Data

Article
11/18/2015

A version of this page is also available for

4/8/2010

One of the benefits of using XML is the ability to model irregular data hierarchies, including data with the following characteristics:

Collections of heterogeneous elements
Structures with many optional elements
Structures where the order is important
Recursive structures
Structures with complex containment requirements

This sounds complex, but most of these conditions are present in XML that represents documents. The following Pole example exhibits many of these characteristics.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="pole.xsl"?>
<document>
  <title>To the Pole and Back</title>
  <section>
    <title>The First Day</title>
    <p>It was the <emph>best</emph> of days, it was the
      <emph>worst</emph> of days.</p>
    <list>
      <item><emph>best</emph> in that the sun was out.</item>
      <item><emph>worst</emph> in that it was 39 degrees below zero.</item>
    </list>
    <section>
      <title>Lunch Menu</title>
      <list>
        <item>ice cream</item>
        <item>popsicles</item>
      </list>
    </section>
  </section>
  <section>
    <title>The Second Day</title>
    <p>Ditto the first day.</p>
  </section>
</document>

Comparing this sample with the characteristics of irregular data, you can see that it contains heterogeneous collections of elements — a section can contain an arbitrary collection of <title> elements, <p> elements, <list> elements, and so on. Many elements are indeed optional — a section need not contain <p> or <list> elements, or other <section> elements. The order of most elements is important to preserve in the output — the first section comes before the second section. The structure is recursive because a <section> element can contain another section. The <emph> element is probably allowed anywhere — indicating a complex set of containment requirements.

The ability to handle such irregular and recursive data makes XSLT useful for transforming documents into a display language such as HTML.

The mechanism for handling data-driven transformations is similar to subroutines in programming languages. Template fragments, or subroutines, can be defined and called. Instead of calling the templates by name, however, the most appropriate fragment is chosen based on the type of element for which the template is designed.

To manage this data, start by writing an output template for the HTML wrapper, which inserts the document title into the output in two places, and then asks XSLT to find the appropriate template for section elements. The following code example shows the HTML wrapper.

<HTML>
  <HEAD>
    <TITLE><xsl:value-of select="document/title"/></TITLE>
  </HEAD>
  <BODY>
    <H1><xsl:value-of select="document/title"/></H1>
    <xsl:apply-templates select="document/section"/>
  </BODY>
</HTML>

The <xsl:apply-templates> element selects the section children of the document (not all of them, just the top level) and asks XSLT to find and apply an appropriate template. Now it is necessary to write a template that is appropriate for section elements.

<xsl:template match="section">
  <DIV>
    <xsl:apply-templates />
  </DIV>
</xsl:template>

The XSLT processor will output this template fragment for each of the section elements selected by the <xsl:apply-templates> element. The value of the match attribute indicates the kinds of nodes for which this template is appropriate. In this case, it indicates that this template is appropriate for section elements. The nodes selected by <xsl:apply-templates> are matched with the correct template.

The template for sections itself contains an <xsl:apply-templates> element. Without a select attribute, all the children will be selected, and the XSLT processor will take each one in order (title, p, list, section) and look for an appropriate template. There already is a section template — this one — and the XSLT processor will recursively apply it, resulting in a nested structure of <DIV> elements that mirrors the nested structure of section elements in the source document.

Now define some more templates to handle other element types.

<xsl:template match="title">
  <H2><xsl:apply-templates /></H2>
</xsl:template>
<xsl:template match="p">
  <P><xsl:apply-templates /></P>
</xsl:template>
<xsl:template match="list">
  <UL>
    <xsl:for-each select="item">
      <LI><xsl:apply-templates /></LI>
    </xsl:for-each>
  </UL>
</xsl:template>
<xsl:template match="emph">
  <I><xsl:apply-templates /></I>
</xsl:template>

In each case, you can include <xsl:apply-templates> to continue selecting the children, whatever they may be, and finding the appropriate template.

The <xsl:apply-templates> element is not limited to selecting element children, but can select other child nodes as well, including text. You can add a template to copy text children to the output.

<xsl:template match="text()"><xsl:value-of /></xsl:template>

When run against the Pole sample document, the preceding templates produce the following output.

<HTML>
  <HEAD>
    <TITLE>To the Pole and Back</TITLE>
  </HEAD>
  <BODY>
    <H1>To the Pole and Back</H1>
    <DIV>
      <H2>The First Day</H2>
      <P>It was the <I>best</I> of days, it was the
        <I>worst</I> of days.</P>
      <UL>
        <LI><I>best</I> in that the sun was out.</LI>
        <LI><I>worst</I> in that it was 39 degrees below zero.</LI>
      </UL>
      <DIV>
        <H2>Lunch Menu</H2>
        <UL>
          <LI>ice cream</LI>
          <LI>popsicles</LI>
        </list>
      </DIV>
    </DIV>
  </BODY>
</HTML>

By recursively processing the source document with <xsl:apply-templates>, this style sheet essentially converts the element types in the source XML to HTML element types. Even in this simple example, you can already see some additional structural modifications occurring, notably the creation of the <HEAD> element and the duplication of the document title in both the <H1> element and the <TITLE> element.

This collection of templates can be packaged into a style sheet file by placing them within an <xsl:stylesheet> element. The XSLT namespace must be declared here.

The top-level template, or root template, must be identified as such by placing it within a template and giving it the special pattern forward slash (/) to indicate that this is the template for the document root. Here is the final complete style sheet.

<xsl:stylesheet xmlns:xsl="https://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="/">
    <HTML>
      <HEAD>
        <TITLE><xsl:value-of select="document/title"/></TITLE>
      </HEAD>
      <BODY>
        <H1><xsl:value-of select="document/title"/></H1>
        <xsl:apply-templates select="document/section"/>
      </BODY>
    </HTML>
  </xsl:template>
  <xsl:template match="title">
  </xsl:template>
  <xsl:template match="section">
    <DIV>
      <H2><xsl:value-of select="title"/></H2>
      <xsl:apply-templates />
    </DIV>
  </xsl:template>
  <xsl:template match="p">
    <P><xsl:apply-templates /></P>
  </xsl:template>
  <xsl:template match="list">
    <UL>
      <xsl:for-each select="item">
        <LI><xsl:apply-templates /></LI>
      </xsl:for-each>
    </UL>
  </xsl:template>
  <xsl:template match="emph">
    <I><xsl:apply-templates /></I>
  </xsl:template>  

</xsl:stylesheet>

This example illustrates the data-driven model of XSLT processing in which you can create isolated templates for the types of nodes you expect to see in the output without too much consideration of their structure. In places where the structure is locally known, you can use <xsl:for-each> and <xsl:value-of> to populate the template. For example, <list> and <item> elements appear in a regular and predictable structure. The ability to switch smoothly between data-driven and template-driven transformation is important XSLT functionality.

Share via

Handling Documents and Irregular Data

See Also

Concepts

Additional resources