Jump to content

XSLT: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
More reorg to reduce redundancy that was recently introduced. I was also able to trim down the template rule processing summary slightly. (diff makes it look more drastic than it is)
(2 intermediate revisions by the same user not shown)
Line 25: Line 25:
==Overview==
==Overview==
The XSLT processing model traditionally involves:
The XSLT processing model traditionally involves:
* one or more source XML files (data model);
* one or more XML ''source'' document files (data model);
* one or more source XSLT files (processing templates);
* one or more XSLT ''stylesheet'' files (processing templates);
* the XSLT template processing engine; and
* the XSLT template processing engine (the ''processor''); and
* one or more output result documents.
* one or more ''result'' output documents.


In some deployments, there may be slight varations on this traditional framework. For example, when deployed on top of a [[scripting language]] or [[command shell]], the XML and XSLT input may not consist of files. In some instances, an XML input stream may not be specified at all.
In some deployments, there may be slight varations on this traditional framework. For example, when deployed on top of a [[scripting language]] or [[command shell]], the XML and XSLT input may not consist of files. In some instances, an XML input stream may not be specified at all.


The XSLT processor ordinarily takes two input files<ref name="doc_disambig">In this context, "document" and "file" refers to a file, a string, or any other recognizable input stream, not just a fixed file.</ref> an XML source document, and an XSLT stylesheet and produces an output document. The XSLT stylesheet contains the XSLT program text (or ‘source code’ in other languages) and is itself an XML document that describes a collection of ''template rules'': ''instructions'' and other hints that guide the processor toward the production of the output document.
===System elements===
The XSLT processor ordinarily takes two input files<ref name="doc_disambig">In this context, "document" and "file" refers to a file, a string, or any other recognizable input stream, not just a fixed file.</ref> - an XML source document, and an ''XSLT stylesheet'' file, and produces an output document. The XSLT stylesheet file contains the XSLT program text (or ‘source code’ in other languages) and is itself an XML document consisting of a collection of '''template rules'''. Each template rule contains a match condition and a list of instructions to be carried out.


===Processing steps===
===Template rule processing===
{{main|XML template engine#Template_rule_processing}}
The processor analyzes the input XML source document and builds a ''source tree''. It then scans the source tree according to a defined [[algorithm]] and finds nodes within the source tree which match some template rule's match condition and then executes the instructions specified within that matching template rule. The XSLT language is declarative — rather than listing an imperative sequence of actions to perform in a stateful environment, the instructions within template rules are processed ''as if'' they were sequential instructions, but, in fact, they comprise [[functional programming|functional]] [[expression (programming)|expression]]s, which are evaluated and ultimately cause nodes to be added to the ''result [[tree data structure|tree]]'', which is then finally written to the generated output document.

The XSLT language is declarative — rather than listing an imperative sequence of actions to perform in a stateful environment, template rules only define how to handle a node matching a particular XPath-like ''pattern'' if the processor should happen to encounter one, and the contents of the templates effectively comprise [[functional programming|functional]] [[expression (programming)|expression]]s which directly represent their ''evaluated'' form: the ''result tree'', which is the basis of the processor's output.

The processor follows a fixed algorithm: Assuming a stylesheet has already been read and prepared, the processor builds a ''source [[tree data structure|tree]]'' from the input XML document. It then starts by processing the source tree's root node, finding in the stylesheet the best-matching template for that node, and evaluating the template's contents. Instructions in each template generally direct the processor to either create nodes in the result tree, or process more nodes in the source tree in the same way as the root node. Output is derived from the result tree.


==Processor implementations==
==Processor implementations==
{{main|XML template engine}}
: ''See [[XSLT processor|XSLT processor main article]].''
XSLT processors may be delivered as standalone applications, or as software components or libraries intended for use by applications. Many web browsers and web server software have XSLT processor components built into them.
XSLT processors may be delivered as standalone applications, or as software components or libraries intended for use by applications. Many web browsers and web server software have XSLT processor components built into them.


Line 296: Line 299:
</xsl:stylesheet>
</xsl:stylesheet>
</pre>
</pre>

== Template rule processing ==

XSLT stylesheets are declarative, not procedural. Rather than defining a sequence of operations to execute, they define rules and other hints applied during processing, according to a fixed algorithm. [[XML_template_engine#Template_rule_processing|Details here]].


==References==
==References==

Revision as of 05:50, 27 November 2006

XSL Transformations
Filename extension
.xsl, .xslt
Internet media type
application/xslt+xml (Waiting for approval), text/xsl (unregistered)
Developed byWorld Wide Web Consortium
Type of formatStylesheet language
Extended fromXML
Standard1.0 (Recommendation),
2.0 (Proposed Recommendation)
File:TempEngXslt015.svg.jpg
Diagram of the basic elements and process flow of Extensible Stylesheet Language Transformations.

Extensible Stylesheet Language Transformations (XSLT) is a Turing complete[1][2] XML-based language used for the transformation of XML documents.

XSLT is a specific kind of template processor primarily designed to "transform" XML documents into other XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one.[3] The new document may be serialized (output) by the processor in standard XML syntax or in another format, such as HTML or plain text.[4] XSLT is most often used to convert data between different XML schemas or to convert XML data into HTML or XHTML documents for web pages, or into an intermediate XML format that can be converted to PDF documents.

As a language, XSLT's origins lie in functional language [5], and in text-based pattern matching languages in the tradition of SNOBOL and awk. Its most direct predecessor was DSSSL, a language that performed the same function for SGML as XSLT performs for XML.

XSLT was produced as a result of the Extensible Stylesheet Language (XSL) development effort within W3C during 19981999, which also produced XSL Formatting Objects (XSL-FO) and the XML Path Language, XPath. The editor of the first version (and in effect the chief designer of the language) was James Clark. The version most widely used today is XSLT 1.0, which was published as a Recommendation by the W3C on 16 November 1999. A greatly expanded version 2.0, under the editorship of Michael Kay, reached the status of Proposed Recommendation from the W3C on 21 November 2006.

Overview

The XSLT processing model traditionally involves:

  • one or more XML source document files (data model);
  • one or more XSLT stylesheet files (processing templates);
  • the XSLT template processing engine (the processor); and
  • one or more result output documents.

In some deployments, there may be slight varations on this traditional framework. For example, when deployed on top of a scripting language or command shell, the XML and XSLT input may not consist of files. In some instances, an XML input stream may not be specified at all.

The XSLT processor ordinarily takes two input files[6] – an XML source document, and an XSLT stylesheet – and produces an output document. The XSLT stylesheet contains the XSLT program text (or ‘source code’ in other languages) and is itself an XML document that describes a collection of template rules: instructions and other hints that guide the processor toward the production of the output document.

Template rule processing

The XSLT language is declarative — rather than listing an imperative sequence of actions to perform in a stateful environment, template rules only define how to handle a node matching a particular XPath-like pattern if the processor should happen to encounter one, and the contents of the templates effectively comprise functional expressions which directly represent their evaluated form: the result tree, which is the basis of the processor's output.

The processor follows a fixed algorithm: Assuming a stylesheet has already been read and prepared, the processor builds a source tree from the input XML document. It then starts by processing the source tree's root node, finding in the stylesheet the best-matching template for that node, and evaluating the template's contents. Instructions in each template generally direct the processor to either create nodes in the result tree, or process more nodes in the source tree in the same way as the root node. Output is derived from the result tree.

Processor implementations

XSLT processors may be delivered as standalone applications, or as software components or libraries intended for use by applications. Many web browsers and web server software have XSLT processor components built into them.

Most current operating systems have an XSLT processor installed. For example, Windows XP comes with the MSXML3 library, which includes an XSLT processor. Earlier versions may be upgraded and there are many alternatives, see the External Links section.

Early XSLT processors had very few optimizations; stylesheet documents were read into Document Object Models and the processor would act on them directly. XPath engines were also not optimized.

By 2000, however, implementors saw optimization opportunities in both XPath evaluation and template rule processing. For example, the Java programming language's Transformation API for XML (TrAX), later subsumed into the Java API for XML Processing (JAXP), acknowledged one such optimization: before processing, the XSLT processor could condense the template rules and other stylesheet tree information into a single, compact Templates object, free from the constraints and bloat of standard DOMs, in an implementation-specific manner. This intermediate representation of the stylesheet tree allows for more efficient processing by potentially reducing preparation time and memory overhead. Additionally, the formal API allows for the object to be cached and reused for multiple transformations, potentially providing higher performance if several input documents are to be processed with the same XSLT stylesheet. Parallels are often drawn between this optimization and the compilation of programming language source code to bytecode: the stylesheets are said to be "compiled", even though they don't usually produce native programming language bytecode; rather, they produce intermediate structures and routines that are stored and processed internally.[7]

XPath evaluation also has room for significant optimizations, and most processor vendors have implemented at least some of them, for speed. For example, in <xsl:if test="/some/nodes"> the test will evaluate to true if /some/nodes identifies any nodes, so evaluaton can stop as soon as the first matching node is found; continuing to look for the entire set of matching nodes would not change the result. Similar optimizations can be undertaken when processing xsl:when and xsl:value-of, as well as expressions relying on, either implicitly or explicitly, string(), boolean(), or number(), and those that use numeric and position()/last()-based predicates.

Specification

The XSLT specification defines a transformation in terms of source and result trees to avoid locking implementations into system-specific APIs and memory, network and file I/O issues. For example, the specification does not mandate that a source tree always be derived from an XML file, since it may be more efficient for the processor to read from an in-memory DOM object or some other implementation-specific representation. Output may be in a format not envisioned by the XSLT language's designers. However, XSLT processing often begins by reading a serialized XML input document into the source tree and ends by writing the result tree to an output document. The output document may be XML, but can be HTML, RTF, TeX, delimited files, plain text or other formats.

XSLT relies upon the W3C's XPath language for identifying subsets of the source document tree, as well as for performing calculations. XPath also provides a range of functions, which XSLT itself further augments. This reliance upon XPath adds a great deal of power and flexibility to XSLT.

The W3C finalized the XSLT 1.0 specification in 1999. The XSLT 2.0 specification is currently a Proposed Recommendation.

Examples

Example 1 (transforming XML to XML)

Transforming the XML document

<?xml version="1.0"?>
<persons>
  <person username="MP123456">
    <name>John</name>
    <family_name>Smith</family_name>
  </person>
  <person username="PK123456">
    <name>Morka</name>
    <family_name>Ismincius</family_name>
  </person>
</persons>

by applying the XSLT transform:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/> 
 <xsl:template match="/">
    <transform>
       <xsl:apply-templates/>
    </transform>
 </xsl:template>
 <xsl:template match="person">
     <record>
        <username>
           <xsl:value-of select="@username" />
        </username>
        <name>
           <xsl:value-of select="name" />
        </name>
     </record> 
  </xsl:template>
</xsl:stylesheet>

results in a new XML document, having another structure:

<?xml version="1.0" encoding="UTF-8"?>
<transform>
   <record>
      <username>MP123456</username>
      <name>John</name>
   </record>
   <record>
      <username>PK123456</username>
      <name>Morka</name>
   </record>  
</transform>

Example 2 (transforming XML to XHTML)

Example of incoming XML document:

<?xml version="1.0" encoding="UTF-8"?>

<domains>
    <sun.com ownedBy="Sun Microsystems Inc.">
        <host>
            www
            <use>World Wide Web site</use>
        </host>
        <host>
            java
            <use>Java info</use>
        </host>
    </sun.com>
    
    <w3.org ownedBy="The World Wide Web Consortium">
        <host>
            www
            <use>World Wide Web site</use>
        </host>
        <host>
            validator
            <use>web developers who want to get it right</use>
        </host>
    </w3.org>
</domains>

Example XSLT Stylesheet:

<?xml version="1.0" encoding="UTF-8" ?>

<xsl:stylesheet version="1.0" 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
        xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" indent="yes"
        doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" 
        doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
    
    <!--XHTML document outline--> 
    <xsl:template match="/">
        <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
            <head>
                <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
                <title>test1</title>
                <style type="text/css">
                    h1          { padding: 10px; padding-width: 100%; background-color: silver }
                    td, th      { width: 40%; border: 1px solid silver; padding: 10px }
                    td:first-child, th:first-child  { width: 20% } 
                    table       { width: 650px }
                </style>
            </head>
            <body>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>
    
    <!--Table headers and outline-->
    <xsl:template match="domains/*">
        <h1><xsl:value-of select="@ownedBy"/></h1>
        <p>The following host names are currently in use at
          <strong><xsl:value-of select="local-name(.)"/></strong>
        </p>
        <table>
            <tr><th>Host name</th><th>URL</th><th>Used by</th></tr>
            <xsl:apply-templates/>
        </table>
    </xsl:template>
    
    <!--Table row and first two columns-->
    <xsl:template match="host">
        <!--Create variable for 'url', as it's used twice-->
        <xsl:variable name="url" select=
            "normalize-space(concat('http://', normalize-space(node()), '.', local-name(..)))"/>
        <tr>
            <td><xsl:value-of select="node()"/></td>
            <td><a href="{$url}"><xsl:value-of select="$url"/></a></td>
            <xsl:apply-templates select="use"/>
        </tr>
    </xsl:template>

    <!--'Used by' column-->
    <xsl:template match="use">
        <td><xsl:value-of select="."/></td>
    </xsl:template>
        
</xsl:stylesheet> 

XHTML output that this would produce (whitespace has been adjusted here for clarity):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <meta content="text/html;charset=UTF-8" http-equiv="Content-Type" />
    <title>test1</title>
    <style type="text/css">
      h1          { padding: 10px; padding-width: 100%; background-color: silver }
      td, th      { width: 40%; border: 1px solid silver; padding: 10px }
      td:first-child, th:first-child  { width: 20% } 
      table       { width: 650px }
    </style>
  </head>
  <body>
    <h1>Sun Microsystems Inc.</h1>
    <p>The following host names are currently in use at <strong>sun.com</strong></p>
    <table>
        <tr>
          <th>Host name</th>
          <th>URL</th>
          <th>Used by</th>
        </tr>
        <tr>
          <td>www</td>
          <td><a href="http://www.sun.com">http://www.sun.com</a></td>
          <td>World Wide Web site</td>
        </tr>
        <tr>
          <td>java</td>
          <td><a href="http://java.sun.com">http://java.sun.com</a></td>
          <td>Java info</td>
        </tr>
    </table>
    
    <h1>The World Wide Web Consortium</h1>
    <p>The following host names are currently in use at <strong>w3.org</strong></p>
    <table>
      <tr>
        <th>Host name</th>
        <th>URL</th>
        <th>Used by</th>
      </tr>
      <tr>
        <td>www</td>
        <td><a href="http://www.w3.org">http://www.w3.org</a></td>
        <td>World Wide Web site</td>
      </tr>
      <tr>
        <td>validator</td>
        <td><a href="http://validator.w3.org">http://validator.w3.org</a></td>
        <td>web developers who want to get it right</td>
      </tr>
    </table>
  </body>
</html>

In a web browser, this XHTML appears as:

File:XSL example.gif
Rendered XHTML

Note: In this particular example empty elements, such as the meta element in the head, include a space before the final />, as was specified in the XSLT stylesheet. This behaviour is not required by the W3C specification for XSLT 1.0 processors, and so these spaces in Literal Result Elements[8] in XSLT may or may not be reproduced in the output depending on different XSLT processor implementations. The presence of this space in empty elements was specified in the HTML Compatibility Guidelines of the XHTML 1.0 specification[9] for serving XHTML 1.0 to non-XHTML-capable browsers with a 'text/html' Internet media type. It is not applicable to later versions of XHTML or XHTML served as 'application/xhtml+xml'[10]. It is not relevant to the most recent browsers (IE 7, Mozilla Firefox etc). XSLT 2.0 plans to address the problem by providing an 'xhtml' output method.[11]

Example 3 (XSLT without a separate XML file)

Although the traditional XSLT processing flow involves at least two external source documents[6] (an XML data source plus XSLT template), it is worth noting that a processed XSLT template can generate output without requiring externally-supplied XML.

The following example demonstrates a simplified XSLT template that exhibits this property. [12]

<?xml version="1.0"?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0" >
    <xsl:output method="html"/>

    <xsl:variable name="celebrity_name" select="'Lucien Bouchard'" />
    <xsl:template name="header"><h2>Standard Website Header</h2></xsl:template>

    <xsl:template match="/">
        <xsl:call-template name="header"/>
        <b>Hello to all the fans of <xsl:value-of select="$celebrity_name" /></b>
    </xsl:template>

</xsl:stylesheet>

References

  1. ^ http://www.unidex.com/turing/utm.htm
  2. ^ http://www.refal.net/~korlukov/tm/
  3. ^ http://www.w3.org/TR/xslt#section-Introduction
  4. ^ See e.g., http://www.w3.org/TR/xslt#output, specifying alternate output methods.
  5. ^ Dimitre Novatchev. "The Functional Programming Language XSLT - A proof through examples". TopXML. Retrieved May 27. {{cite web}}: Check date values in: |accessdate= (help); Unknown parameter |accessyear= ignored (|access-date= suggested) (help)
  6. ^ a b In this context, "document" and "file" refers to a file, a string, or any other recognizable input stream, not just a fixed file.
  7. ^ Saxon: Anatomy of an XSLT processor - An article describing the implementation and optimization details of a popular Java-based XSLT processor.
  8. ^ http://www.w3.org/TR/xslt#literal-result-element
  9. ^ http://www.w3.org/TR/2002/REC-xhtml1-20020801/#guidelines
  10. ^ http://www.w3.org/TR/xhtml-media-types/#media-types
  11. ^ http://www.w3.org/TR/xslt-xquery-serialization/#xhtml-output
  12. ^ The example works because template match="/" succeeds in matching the "root node," which is separate from the 'document element' (or 'root element') and 'processing instructions' ordinarily included in the "source data" XML. (see e.g. http://www.dpawson.co.uk/xsl/sect2/root.html)

See also

For implementations see XSLT processor.
Documentation
XSLT 1.0 W3C Recommendation
XSLT 2.0 W3C Proposed Recommendation
Zvon XSLT 1.0 Reference
XSL Concepts and Practical Use by Norman Walsh
Tutorial from developerWorks (1 hour)
Zvon XSLT Tutorial
XSLT Tutorial
Quick tutorial
What kind of language is XSLT?
XSLT and Scripting Languages
Mailing lists
The XSLT mailing list hosted by Mulberrytech
Blogs
A commentary, news, and evangelism weblog devoted to XSLT
Books
XSLT by Doug Tidwell, published by O’Reilly (ISBN 0-596-00053-7)
XSLT Cookbook by Sal Mangano, published by O’Reilly (ISBN 0-596-00974-7)
XSLT Programmer's Reference (ISBN 1-86100-312-9) by Michael Kay (ISBN 1-86100-312-9)
XSLT 2.0 Web Development by Dmitry Kirsanov (ISBN 0-13-140635-3)
XSL Companion, 2nd Edition by Neil Bradley, published by Addison-Wesley (ISBN 0-201-77083-0)
XSLT and XPath on the Edge (Unlimited Edition) (ISBN 0-7645-4776-3) by Jeni Tennison, published by Hungry Minds Inc, U.S. (ISBN 0-7645-4776-3)
XSLT & XPath, A Guide to XML Transformations (ISBN 0-13-040446-2) by John Robert Gardner and Zarella Rendon, published by Prentice-Hall (ISBN 0-13-040446-2)
XSLT code libraries
EXSLT is a widespread community initiative to provide extensions to XSLT.
FXSL is a library implementing support for Higher-order functions in XSLT. FXSL is written in XSLT itself.