<?xml version="1.0" encoding="UTF-8"?>
<?oxygen RNGSchema="http://www.oasis-open.org/docbook/xml/5.0/rng/docbookxi.rng" type="xml"?>
<book xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:xi="http://www.w3.org/2001/XInclude" version="5.0">
    <info>
        <title>Meaningful Markup</title>
        <subtitle>XML and Web Information</subtitle>
        <author>
            <personname>
                <firstname>Gary</firstname>
                <surname>Stringer</surname>
            </personname>
            <affiliation>
                <orgname>University of Exeter</orgname>
                <orgdiv>Creative Media and Information Technology (CMIT)</orgdiv>
                <address>Exeter, UK</address>
            </affiliation>
        </author>
        <publisher>
            <publishername>Creative Media and Information Technology</publishername>
            <address>University of <city>Exeter</city>, <country>UK</country></address>
        </publisher>
        <mediaobject>
            <imageobject>
                <imagedata fileref="../images/cmit.png"/>
            </imageobject>
        </mediaobject>
        <copyright>
            <year>2006-2008</year>
            <holder>Gary Stringer / University of Exeter </holder>
        </copyright>
        <xi:include href="../mit0000/legalnotice-by-nc-sa.xml"
            xmlns:xi="http://www.w3.org/2001/XInclude">
            <xi:fallback>
                <para>All rights reserved.</para>
            </xi:fallback>
        </xi:include>
    </info>
    <preface>
        <title>About this Module</title>
        <para>XML is the future of the web; it's a more flexible language than HTML, and can carry
            much more information. XML separates content and presentation, so that the same document
            can be viewed on high-resolution displays or mobile phones, and still appear designed
            for the medium. XML also allows the document author to create their own tags with
            meaningful names, incorporating semantics into the document structure. </para>
        <para> This module is designed to build on the skills you've acquired in The Internet
            (MIT2114/2214) or Aspects of Web Design (MIT2100/2200), and looks at modern website
            creation through the use of XML and its related technologies. You'll have the
            opportunity to create a website using scalable document management techniques,
            developing for a variety of browsing formats and creating 'house styles' using
            stylesheets and transformations. The module is practically based, looking at each
            technology from a 'how-to' perspective, whilst analysing the technologies usefulness and
            role in document management at each stage.</para>
        <sect1>
            <title>Prerequisites</title>
            <para>We'll begin our look at XML by examining the differences between XML and HTML, so
                a basic knowledge of how HTML tags work, and a little experience in hand-coding HTML
                pages, will be useful if not essential. </para>
            <para>A knowledge of file management using Microsoft Windows (or other Operating System)
                will be essential.</para>
            <para>The module is taught from a non-technical viewpoint, so skills in computer
                programming and other advanced subjects are <emphasis>not</emphasis> required nor
                expected!</para>
        </sect1>
        <sect1>
            <title>Essential Reading</title>
            <para>The recommended book for the practical side of the module is Elizabeth Castro's
                    <quote>XML for the World Wide Web: Visual Quickstart Guide</quote>
                <xref linkend="Castro2001"/>. This was chosen as it focusses on creating XML
                documents rather than more advanced applications. It covers the basics well, and as
                you progress through the module, will provide a handy reference for many of the
                tasks you'll be required to complete. It's available from <link
                    xlink:href="http://www.amazon.co.uk/exec/obidos/ASIN/0201710986/qid=1138789323"
                    >amazon.co.uk</link> for around £10 - you may want to order collectively to save
                postage. </para>
        </sect1>
    </preface>
    <chapter xml:id="ch-introduction">
        <title>Introducing XML</title>
        <para>With this module, we will be constantly referring to standards and documentation
            produced by the <link xlink:href="http://www.w3.org/">World Wide Web Consortium</link>,
            the <abbrev>W3C</abbrev>, who organise and shape the future of many web-based
            technologies, including <abbrev>XML</abbrev> and its friends. So, to begin, take a look
            at the W3C's summary of what XML is, their <link
                xlink:href="http://www.w3.org/XML/1999/XML-in-10-points">XML in 10
            points</link>.</para>
        <para>The remainder of this chapter will look at these points, and look at some real
            examples of XML along the way.</para>
        <sect1>
            <title>What is XML?</title>
            <para>Well, let's start by looking at a simple example.</para>
            <example>
                <title>An example of XML looking suspiciously like HTML</title>
                <programlisting>
    &lt;?xml version=&quot;1.0&quot;?&gt;
    &lt;html&gt;
      &lt;head&gt;
        &lt;title&gt;My Title&lt;/title&gt;
      &lt;/head&gt;
      &lt;body&gt;
        &lt;h1&gt;My Title&lt;/h1&gt;
        &lt;p&gt;The text of my document&lt;/p&gt;
        &lt;hr/&gt;
        &lt;p&gt;Some more text&lt;/p&gt;
      &lt;/body&gt;
    &lt;/html&gt;
			</programlisting>
            </example>
            <para>Hmmmm, looks familiar. Apart from the first line, it looks just like HTML, which
                we all know and love. That's because HTML can be written as an <glossterm
                    linkend="gl-application">application</glossterm> of XML - this has come to be
                known as XHTML. </para>
            <para>Here's another example:</para>
            <example xml:id="ex-home-made-markup">
                <title>An example of a home-made markup language in XML</title>
                <programlisting><![CDATA[
  <?xml version="1.0" encoding="UTF-8"?>
  <jokebook>
    <bookinfo>
      <title>My Best Jokes</title>
      <editor>Fred Bloggs</editor>
    </bookinfo>
    <joke>
      <simplejoke author="Traditional" rating="U">
        <question>Why did the chicken cross the road?</question>
        <punchline>To get to the other side!</punchline>
      </simplejoke>
    </joke>
  </jokebook>				
				]]></programlisting>
            </example>
            <para>You'll notice here that, though the structure is similar, the tags (elements) have
                changed, and are more <quote>meaningful</quote>, in that they tell you about the
                specific role of each part of the document. So, <code>&lt;punchline&gt;</code>
                clearly indicates the (not very funny) response to the joke. </para>
            <para>Also notice that there's very little about how the document is presented. The
                separation of <emphasis>content</emphasis> from <emphasis>presentation</emphasis> is
                an important theme in XML, and one we'll return to over and again.</para>
        </sect1>
        <sect1>
            <title>Strictly XML</title>
            <subtitle>Or, why we need more rigorous markup</subtitle>
            <para>One of the problems with HTML is it's too laid back and relaxed - you can get away
                with almost anything. This is in part due to the browsers we have, which will always
                make the best of whatever is thrown their way, displaying on screen as much as they
                can decipher from the messy HTML they have grabbed from the webserver.</para>
            <para>We, also, have played our part. It's difficult for humans to write in a
                mechanical, accurate manner. Think of how differently we all use English, how we
                flout the laws of spelling and grammar or use idioms or figures of speech - all very
                messy for the pedantic and rigidly-minded to understand. We get an impression of
                what is meant, but it may not be exactly what the speaker intended. This is a little
                like the use of HTML - because it is sloppy and loosely defined, we don't always get
                what we intended when the web browser displays it. </para>
            <para>But XML is coming to the rescue! XML is well-defined and precise, and encodes
                accurately the structure and content of a document. It is easily read by machines,
                and fairly easily read by humans too. And it's easy to translate into more useful
                forms, such as HTML itself, printed pages, summaries, lists, etc.</para>
            <note>
                <title>Practical task</title>
                <para>Create a new DocBook article using the &lt;oXygen/&gt; software, and use it to
                    document any notes or ideas you have on the topics we discussed in today's
                    class. Remember to create the document by choosing <userinput>File &gt; New from
                        Templates...</userinput>, and add some more sections, and some more
                    paragraphs.</para>
                <para>Experiment with the element drop-down list, which you can access by typing the
                    first character of a new element (&lt;) or by pressing <keycombo>
                        <keycap>Ctrl</keycap>
                        <keycap>Space</keycap>
                    </keycombo>. Read the descriptions of the elements - remember that what is
                    important is not presentation but meaning. Choose the element that is closest in
                    meaning to the intended text.</para>
                <para>Add some of the following elements, working out how to structure them by
                    following clues provided by the drop-down element list: <itemizedlist
                        spacing="compact">
                        <listitem>
                            <para>
                                <code><![CDATA[<emphasis>]]></code>
                            </para>
                        </listitem>
                        <listitem>
                            <para>
                                <code><![CDATA[<quote>]]></code>
                            </para>
                        </listitem>
                        <listitem>
                            <para>
                                <code><![CDATA[<itemizedlist>]]></code>
                            </para>
                        </listitem>
                        <listitem>
                            <para>
                                <code><![CDATA[<indexterm>]]></code>
                            </para>
                        </listitem>
                        <listitem>
                            <para>
                                <code><![CDATA[<code>]]></code>
                            </para>
                        </listitem>
                        <listitem>
                            <para>
                                <code><![CDATA[<computeroutput>]]></code>
                            </para>
                        </listitem>
                        <listitem>
                            <para>
                                <code><![CDATA[<programlisting>]]></code>
                            </para>
                        </listitem>
                        <listitem>
                            <para>
                                <code><![CDATA[<example>]]></code>
                            </para>
                        </listitem>
                    </itemizedlist></para>
            </note>
            <sect2>
                <title>Writing well-formed XML</title>
                <para>As an example, let's look at a few of the differences between the HTML you're
                    familiar with, and the XML version of HTML called XHTML. This XHTML follows very
                    similar rule of syntax to conventional HTML, though the syntax is more strictly
                    controlled. <indexterm class="startofrange" xml:id="ix-diff-html-xhtml">
                        <primary>XHTML</primary>
                        <secondary>differences between HTML and</secondary>
                    </indexterm> The main differences are: <itemizedlist spacing="compact">
                        <listitem>
                            <para>an XML version <emphasis>must</emphasis> be declared, e.g.
                                    <code>&lt;?xml version=&quot;1.0&quot;?&gt;</code>
                            </para>
                        </listitem>
                        <listitem>
                            <para>a <firstterm>root element</firstterm> is required</para>
                            <itemizedlist spacing="compact">
                                <listitem>
                                    <para>this must enclose all other elements in the
                                        document;</para>
                                </listitem>
                                <listitem>
                                    <para>anything outside the root element is considered to be a
                                        processing instruction;</para>
                                </listitem>
                            </itemizedlist>
                        </listitem>
                        <listitem>
                            <para>elements must always be properly closed </para>
                            <itemizedlist spacing="compact">
                                <listitem>
                                    <para>e.g. <code>&lt;element&gt;&lt;/element&gt;</code> or
                                            <code>&lt;element/&gt;</code> are both valid</para>
                                </listitem>
                                <listitem>
                                    <para> but
                                            <code>&lt;element&gt;&lt;element2&gt;&lt;/element&gt;</code>
                                        is not valid</para>
                                </listitem>
                            </itemizedlist>
                        </listitem>
                        <listitem>
                            <para>elements must be properly <firstterm>nested</firstterm></para>
                        </listitem>
                        <listitem>
                            <para>tags are always case-sensitive</para>
                            <itemizedlist spacing="compact">
                                <listitem>
                                    <para>and are conventionally lower-case throughout</para>
                                </listitem>
                            </itemizedlist>
                        </listitem>
                        <listitem>
                            <para>all attributes and values must be quoted</para>
                            <itemizedlist spacing="compact">
                                <listitem>
                                    <para><code>&lt;joke type=&quot;pun&quot;&gt;</code> is valid
                                        XML </para>
                                </listitem>
                                <listitem>
                                    <para>but <code>&lt;joke type=pun&gt;</code> is not</para>
                                </listitem>
                            </itemizedlist>
                        </listitem>
                        <listitem>
                            <para>all character entities (special characters) must be explicitly
                                declared, except</para>
                            <itemizedlist spacing="compact">
                                <listitem>
                                    <para>ampersand: &amp;amp;</para>
                                </listitem>
                                <listitem>
                                    <para>greater-than: &amp;gt;</para>
                                </listitem>
                                <listitem>
                                    <para>less-than: &amp;lt;</para>
                                </listitem>
                                <listitem>
                                    <para>quotes: &amp;quot;</para>
                                </listitem>
                                <listitem>
                                    <para>apostrophe: &amp;apos;</para>
                                </listitem>
                            </itemizedlist>
                        </listitem>
                        <listitem>
                            <para>literal text containing any markup must be entered as CDATA, e.g.
                                    <code>&lt;!CDATA[This is a piece of &lt;i&gt;tagged&lt;/i&gt;
                                    text.]&gt;</code></para>
                        </listitem>
                    </itemizedlist>
                    <indexterm class="endofrange" startref="ix-diff-html-xhtml"/>
                </para>
                <para>This rigid adherence to the strict syntax rules means that, though it seems
                    pedantic to us humans, the text requires much less work to parse in software,
                    because the computer is not having to cope with mistakes and unexpected
                    tagging.</para>
            </sect2>
        </sect1>
        <sect1>
            <title>The X Word: XML-related technologies and acronyms </title>
            <para>One of the hardest parts of learning XML is remembering all the abbreviations for
                the bewildering array of tools and additions that associate themselves with XML.
                Here's a quick guide to the most important ones that we'll look at in this
                module.</para>
            <sect2>
                <title>XML, DTDs and Schema</title>
                <para>XML is the core technology here, and almost every related technology is
                    written in XML, and either extends its usefulness, or allows us to manipulate it
                    in some way.</para>
                <para>When we write new document types in XML, we say we are creating a new
                        <firstterm linkend="gl-application">application</firstterm> of XML. We can
                    be informal about how we create this application, making up new elements (tags)
                    as we go, or we can specify which elements we want to use in detail
                    first.</para>
                <para>If we use the latter method, then we'll need to write a definition of our new
                    application. This definition will not only create new elements for us to use,
                    but will specify the structure that they will fit into, and the order we must
                    use them in. This rigidity seems very restrictive at first, but we'll see later
                    why this is necessary.</para>
                <para>To write our definition we'll use either a <firstterm linkend="gl-dtd"
                        >Document Type Definition</firstterm> (a <abbrev>DTD</abbrev>) or an
                        <firstterm linkend="gl-schema">XML Schema</firstterm>. DTDs have been around
                    for a very long time, and are rather clunky and inflexible - the specification
                    for HTML is written in a series of DTDs. The format of a DTD can be very messy
                    and unstructured, and is not in XML format.</para>
                <para>XML Schema are a modernised version of the DTD. Written in XML, they are
                    highly structured, and allow you to specify new applications very accurately.
                    We'll look at both of these formats as we progress. </para>
            </sect2>
            <sect2>
                <title>XPath</title>
                <para>The first major addition to XML that we'll need to examine is <firstterm
                        linkend="gl-xpath">XPath</firstterm>, which gives us a way to refer to and
                    examine specific parts of an XML document. Using XPath, we can give a precise
                    identifier to an individual element within a document, or even a pattern or
                    group of elements, useful when we want to examine only certain details recorded
                    in an XML file.</para>
                <para>Specifying a location in XPath is a little like writing down someone's postal
                    address. We can write a full address, with everything from street number and
                    name through to county and country, or in some contexts, we can just say
                        <quote>35 Juniper Avenue</quote>, where it's clear that we mean a particular
                    town or city.</para>
            </sect2>
            <sect2>
                <title>XSL Transformations and Formatting Objects</title>
                <para>When we want to convert our XML document into some other format, we have a
                    couple of tools that fit the purpose, grouped under the banner of <firstterm>XML
                        Stylesheet Languages</firstterm> (<abbrev>XSL</abbrev>). </para>
                <para><firstterm linkend="gl-xslt">XSL Transformations</firstterm> or
                        <abbrev>XSLT</abbrev> are used very widely, and are especially useful for
                    converting your XML documents into HTML (or rather, XHTML). The main focus of
                    this module is on using XSLT effectively to create multipurpose documents, that
                    can be viewed in a variety of formats and styles. XSLT is often built into web
                    browsers, which makes using XSLT to create HTML a simple process.</para>
                <para>Whereas <abbrev>XSLT</abbrev> is used to convert XML into other markup
                    languages, its sister technology <firstterm linkend="gl-xsl-fo">XSL Formatting
                        Objects</firstterm> or <abbrev>XSL-FO</abbrev> is used to convert documents
                    into more presentational formats, such as print documents.</para>
            </sect2>
            <sect2>
                <title>XLink and XPointer</title>
                <para>These are the linking mechanisms that allow us to insert the equivalent of the
                        <code>&lt;a href=&quot;url&quot;&gt;text&lt;/a&gt;</code> in HTML. </para>
                <para>Generally, <glossterm linkend="gl-xlink">XLink</glossterm> gives us the
                    ability to link to other locations on the web, and <glossterm
                        linkend="gl-xpointer">XPointer</glossterm> is used to link internally within
                    a document. </para>
            </sect2>
        </sect1>
        <sect1>
            <title>SGML to XML, HTML to XHTML</title>
            <para>XML has evolved from a previous generation of technologies, based around
                    <firstterm linkend="gl-sgml">Standard Generalised Markup Language</firstterm> or
                    <abbrev>SGML</abbrev>. XML can be considered the <quote>little brother</quote>
                of XML, in that it's more compact, easier to learn, and simpler to write
                applications and software for. We're not going to go into the technical details of
                SGML or the differences between SGML and XML, as they are mostly only of historical
                interest.</para>
        </sect1>
    </chapter>
    <chapter xml:id="ch-doctypes">
        <title>Different Types of Document</title>
        <para>This chapter will examine some of the applications of XML that are widely used in
            various communities today. Most have been designed to fulfill a single purpose; to
            encode a certain type of information, usually very rigidly constrained and predictable
            in nature. However, we'll start by looking at the most widely used, and least
            constrained of applications, XHTML.</para>
        <sect1>
            <title>Hypertext Markup Language</title>
            <sidebar role="highlights">
                <itemizedlist>
                    <listitem>
                        <para>General purpose markup language</para>
                    </listitem>
                    <listitem>
                        <para>Originated as text-only, but quickly gained media extensions</para>
                    </listitem>
                    <listitem>
                        <para>Often produced automatically by web authoring software</para>
                    </listitem>
                </itemizedlist>
            </sidebar>
            <para/>
            <example>
                <title>A basic XHTML page with minimal presentational markup</title>
                <programlisting><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <title>Layout example</title>
    <link rel="stylesheet" type="text/css" href="default.css" />
  </head>
  <body>
    <div xml:id="header">
      <h1>The Title of the Site </h1>
    </div>
    <div xml:id="menu">
      <ul>
        <li><a href="overview.html">Overview</a></li>
        <li><a href="background.html">Background</a></li>
        <li><a href="filename.html">Category</a></li>
        <ul>
          <li><a href="filename.html">Subcategory</a>
          <li><a href="filename.html">Subcategory</a>
        </ul>
        <li><a href="filename.html">Category</a></li>
        <ul>
          <li><a href="filename.html">Subcategory</a></li>
          <li><a href="filename.html">Subcategory</a></li>
        </ul>
        <li><a href="filename.html">Category</a></li>
        <li><a href="ratings.html">Ratings</a></li>
      </ul>
    </div>
    <div xml:id="content">
      <h1>Section Title</h1>
      <img src="web.jpg" alt="A Spider's Web with its creator" />
      <p>A paragraph of text. A paragraph of text. A paragraph 
         of text. A paragraph of text. A paragraph of text. A 
         paragraph of text. A paragraph of text. </p>
      <h2>Subsection Title</h2>
      <p>A paragraph of text. A paragraph of text. A paragraph 
         of text. A paragraph of text. A paragraph of text. A 
         paragraph of text. A paragraph of text. </p>
    </div>
    <div xml:id="footer">
      <p>Student ID: 987654321</p>
    </div>
  </body>
</html>]]></programlisting>
            </example>
        </sect1>
        <sect1>
            <title>Mathematical Markup Language (MathML)</title>
            <sidebar role="highlights">
                <itemizedlist spacing="compact">
                    <listitem>
                        <para>Special purpose: encoding mathematical equations and notation</para>
                    </listitem>
                    <listitem>
                        <para>Designed to be embedded in other markup languages</para>
                    </listitem>
                    <listitem>
                        <para>Rendered automatically by latest browsers</para>
                    </listitem>
                </itemizedlist>
            </sidebar>
            <para>Specifications: <link xlink:href="http://www.w3.org/TR/2003/REC-MathML2-20031021/"
                /></para>
            <example>
                <title>A very simple example of MathML</title>
                <programlisting><![CDATA[<math xmlns="http://www.w3.org/1998/Math/MathML">
  <msup>
    <msqrt>
      <mrow>
        <mi>a</mi>
        <mo>+</mo>
        <mi>b</mi>
      </mrow>
    </msqrt>
    <mn>27</mn>
  </msup>
  </math>]]></programlisting>
                <para>Source: <link xlink:href="http://www.w3.org/Math/XSL/pmathml2.xml"/> [accessed
                    2007-02-13] (which can be used to check if your browser can render MathML
                    embedded into XHTML).</para>
            </example>
        </sect1>
        <sect1>
            <title>RSS - Really Simple Syndication</title>
            <sidebar role="highlights">
                <itemizedlist spacing="compact">
                    <listitem>
                        <para>Special purpose: distribution of news items</para>
                    </listitem>
                    <listitem>
                        <para>Simple syntax which handles text only</para>
                    </listitem>
                    <listitem>
                        <para>Extended in several ways to encompass different media</para>
                    </listitem>
                </itemizedlist>
            </sidebar>
            <para/>
            <example>
                <title>An instance of RSS 2.0 with iTunes and Yahoo media extensions</title>
                <programlisting>
<![CDATA[<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet title="XSL_formatting" type="text/xsl" 
    href="http://downloads.bbc.co.uk/rmhttp/downloadtrial/common/rss_rm.xsl"?>
<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
	xmlns:media="http://search.yahoo.com/mrss">
	<channel>
		<title>Digital Planet</title>
		<link>http://news.bbc.co.uk/1/hi/technology/1478157.stm</link>
		<description>Find out how the digital revolution is changing our lives.... </description>
		<itunes:author>BBC World Service</itunes:author>
		<language>en-gb</language>
		<ttl>720</ttl>
		<itunes:image
			href="http://www.bbc.co.uk/radio/downloadtrial/images/programmes/300x300/worldservice_planet.jpg" />
		<copyright>(C) BBC 2006</copyright>
		<pubDate>Tue, 13 Feb 2007 03:00:15 +0000</pubDate>
		<itunes:category text="Technology" />
		<itunes:keywords>technology digital science world</itunes:keywords>
		<itunes:explicit>No</itunes:explicit>
		<item>
			<title>BBC Digital Planet February 13 2007</title>
			<description>In this week's Digital Planet, ...</description>
			<itunes:subtitle>BBC Digital Planet February 13 2007</itunes:subtitle>
			<itunes:summary>In this week's Digital Planet, ...</itunes:summary>

			<pubDate>Mon, 12 Feb 2007 16:00:00 +0000</pubDate>
			<itunes:duration>00:26:49</itunes:duration>
			<guid isPermaLink="false"
				>http://downloads.bbc.co.uk/rmhttp/downloadtrial/worldservice/digitalplanet/digitalplanet_20070212-1600_40_pc.mp3</guid>
			<enclosure length="11339863" type="audio/mpeg"
				url="http://downloads.bbc.co.uk/rmhttp/downloadtrial/worldservice/digitalplanet/digitalplanet_20070212-1600_40_pc.mp3" />
			<media:content bitrate="40" duration="1609" expression="full"
				fileSize="11339863" type="audio/mpeg"
				url="http://downloads.bbc.co.uk/rmhttp/downloadtrial/worldservice/digitalplanet/digitalplanet_20070212-1600_40_pc.mp3" />

		</item>
	</channel>

</rss>]]>

				</programlisting>
                <para>Source: <link
                        xlink:href="http://downloads.bbc.co.uk/rmhttp/downloadtrial/worldservice/digitalplanet/rss.xml"
                    /> [accessed 2007-02-13].</para>
            </example>
            <para>Specifications: <link xlink:href="http://www.rssboard.org/rss-specification"
                /></para>
        </sect1>
        <sect1>
            <title>DocBook</title>
            <sidebar role="highlights">
                <itemizedlist spacing="compact">
                    <listitem>
                        <para>Special purpose: computer documentation</para>
                    </listitem>
                    <listitem>
                        <para>Very strictly specified and not usually extended </para>
                    </listitem>
                    <listitem>
                        <para>Widely used in the publishing industry</para>
                    </listitem>
                    <listitem>
                        <para>Highly customisable stylesheets to output to different media</para>
                    </listitem>
                </itemizedlist>
            </sidebar>
            <para>For an example of DocBook, see the <link
                    xlink:href="http://www.services.ex.ac.uk/cmit/modules/meaningful_markup/mit3112-notes.xml"
                    >source XML</link> for these notes.</para>
            <para>Specifications: <itemizedlist spacing="compact">
                    <listitem>
                        <para>Version 5.0 (schema): <link
                                xlink:href="http://www.docbook.org/specs/docbook-5.0b6-spec-wd-01.html"
                            /> [accessed 2007-02-12].</para>
                    </listitem>
                    <listitem>
                        <para>Version 4.5 (DTD): <link
                                xlink:href="http://www.docbook.org/specs/docbook-4.5-spec.html"/>
                            [accessed 2007-02-12].</para>
                    </listitem>
                </itemizedlist></para>
        </sect1>
        <sect1>
            <title>The Text Encoding Initiative</title>
            <sidebar role="highlights">
                <itemizedlist>
                    <listitem>
                        <para>Special purpose: encoding of humanities texts</para>
                    </listitem>
                    <listitem>
                        <para>Also used for creating websites and general documentation</para>
                    </listitem>
                    <listitem>
                        <para>Very rich set of modular features to encode <quote>messy</quote>
                            data</para>
                    </listitem>
                    <listitem>
                        <para>Widely used as an archival format</para>
                    </listitem>
                </itemizedlist>
            </sidebar>
            <para>Website: <link xlink:href="http://www.tei-c.org/"/></para>
            <para>The TEI tagset is extremely feature-rich, and it is usual to customise the
                features available so as to limit the types of tags according to the nature of the
                document being encoded, e.g. poetry requires a different set of tags to prose or to
                performance texts. There are also features that allow the encoding of variations
                between manuscripts, uncertainty and omissions, unclear passages in audio
                transcriptions, etc. </para>
            <example>
                <title>A very simple TEI P5 document </title>
                <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<?oxygen 
    RNGSchema="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_allPlus.rng" 
    type="xml"?>
<TEI
	xmlns:xi="http://www.w3.org/2001/XInclude"
	xmlns:svg="http://www.w3.org/2000/svg"
	xmlns:math="http://www.w3.org/1998/Math/MathML"
	xmlns="http://www.tei-c.org/ns/1.0">	
	<teiHeader>
		<fileDesc>
			<titleStmt>
				<title>A Limerick</title>
			</titleStmt>
			<publicationStmt>
				<p>Privately published</p>
			</publicationStmt>
			<sourceDesc>
				<p>Uncertain source</p>
			</sourceDesc>
		</fileDesc>
	</teiHeader>
	<text>
		<body>
			<div type="limerick">
				<lg rhyme="aabba">
					<l>I went with the Duchess to <rhyme label="a">tea</rhyme>,</l>
					<l>Her manners were shocking to <rhyme label="a">see</rhyme>;</l>
					<l>Her rumblings abd<rhyme label="b">ominal</rhyme></l>
					<l>Were simply phen<rhyme label="b">omenal</rhyme></l>
					<l>And everyone thought it was <rhyme label="a">me</rhyme>.</l>
				</lg>
			</div>
		</body>
	</text>
</TEI>
]]></programlisting>
            </example>
        </sect1>
        <sect1>
            <title>Scalable Vector Graphics (SVG)</title>
            <sidebar role="highlights">
                <itemizedlist spacing="compact">
                    <listitem>
                        <para>Special purpose: vector graphics language</para>
                    </listitem>
                </itemizedlist>
            </sidebar>
            <example>
                <title>The simplest of SVG examples</title>
                <programlisting><![CDATA[<?xml version="1.0" standalone="no"?>

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" 
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

<svg width="100%" height="100%" version="1.1"
xmlns="http://www.w3.org/2000/svg">

<circle cx="100" cy="50" r="40" stroke="black"
stroke-width="2" fill="red"/>

</svg>]]>
</programlisting>
                <para>Source: <link xlink:href="http://www.w3schools.com/svg/svg_example.asp"/>
                    [accessed 2007-02-13].</para>
            </example>
        </sect1>
        <sect1>
            <title>Declaring a DTD</title>
            <para>You'll have noticed that in many of these formats, there are a few lines at the
                top of the document that contain URLs or other references, to standards sites and
                documents. These lines declare the standard being adhered to, and determine what
                'validity' means for that document. </para>
            <para>Like CSS stylesheets, DTDs can be defined within the XML document itself, or
                within a separate DTD file. Obviously, the latter is more flexible, since the DTD
                can be used easily for multiple files and is separated from the content of the
                document.</para>
            <para>To declare an internal DTD, use something like this: </para>
            <example>
                <title>Declaring a DTD internally</title>
                <programlisting>
                    &lt;?xml version=&quot;1.0&quot; ?&gt;
                    &lt;!DOCTYPE jokebook [
                    ... DTD goes here ...
                    ]&gt;
                    ... markup goes here ...
                </programlisting>
            </example>
            <para>To create an external DTD, the definitions are simply placed in a separate file,
                usually with a <computeroutput>.dtd</computeroutput> extension, and the DTD is
                declared within the document:</para>
            <example>
                <title>Declaring an external DTD</title>
                <programlisting>
                    &lt;?xml version=&quot;1.0&quot; standalone=&quot;no&quot;?&gt;
                    &lt;!DOCTYPE jokebook SYSTEM &quot;jokebook.dtd&quot;&gt;
                    ... markup goes here ...
                </programlisting>
            </example>
            <para>For most document types, especially those created for your personal use, this will
                be sufficient. If, however, your DTD is distributed to other authors and used
                widely, then you should consider &quot;going public&quot; and creating a
                    <firstterm>Formal Public Identifier</firstterm>
                <indexterm>
                    <primary>Formal Public Identifier</primary>
                </indexterm>for your document type. This allows other authors to be certain that
                they're using the same DTD as everyone else, since FPIs are intended to uniquely
                identify a DTD.</para>
            <para>An example of an FPI is that used by DocBook; here's the declaration for a DocBook
                4.1.2 document:</para>
            <programlisting>
                &lt;!DOCTYPE book PUBLIC &quot;-//OASIS//DTD DocBook XML V4.1.2//EN&quot; 
                &quot;http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd&quot;&gt;
            </programlisting>
            <para>As you can see, it contains information about the creating organisation (or
                author), the version of the DTD, the language (EN), and the URL of the DTD itself.
                In most authoring systems, you'll need to download the DTD and install it before you
                can use it, even though you've declared where it can be found on the web.</para>
            <para>XHTML has three different DTDs defined for it. A &quot;strict&quot; DTD forces
                very rigid rules on the document, which makes it more compatible with XML
                applications and processors. A &quot;transitional&quot; DTD creates more HTML-like
                relaxed documents, and is often used as an intermediate version when moving
                documents from HTML to XML. The &quot;frameset&quot; DTD defines documents which
                create frame-based layouts. The FPIs for these DTDs are: </para>
            <programlisting>
  &lt;!DOCTYPE html 
    PUBLIC &quot;-//W3C//DTD XHTML 1.0 Strict//EN&quot;
    &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&quot;&gt;
                
  &lt;!DOCTYPE html 
    PUBLIC &quot;-//W3C//DTD XHTML 1.0 Transitional//EN&quot;
    &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;&gt;
                
  &lt;!DOCTYPE html 
    PUBLIC &quot;-//W3C//DTD XHTML 1.0 Frameset//EN&quot;
    &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd&quot;&gt;
            </programlisting>
            <para>With XHTML, the declaration of the standard is important, as if it is not
                recognised, the browser will switch to a compatibility mode (known as
                    <quote>Quirks</quote> mode, which will render the document much less
                effectively. For a full discussion on this, see Eric Meyer's <link
                    xlink:href="http://www.ericmeyeroncss.com/bonus/render-mode.html">page on
                    rendering modes</link>, and the material at <link
                    xlink:href="http://www.quirksmode.org/css/quirksmode.html"
                >quirksmode.org</link>. For any new websites, it's advisable to work with the Strict
                XHTML standard, as this will provide the best support for the latest CSS standards
                in most modern browsers.</para>
        </sect1>
        <sect1 xml:id="sec-namespaces-intro">
            <title>Namespaces</title>
            <para> An alternative method of specifying the standard in use is with a namespace.
                We've seen a couple of examples already: </para>
            <programlisting><![CDATA[  <math xmlns="http://www.w3.org/1998/Math/MathML">
            ]]></programlisting>
            <para>Here, the MathML standard is declared with a simple URL. These URLs don't have to
                point to any actual resource, as they are really only unique identifying strings,
                but it's usual for them to point to the standards definition or some related
                documentation. </para>
            <programlisting><![CDATA[  <TEI
      xmlns:xi="http://www.w3.org/2001/XInclude"
      xmlns:svg="http://www.w3.org/2000/svg"
      xmlns:math="http://www.w3.org/1998/Math/MathML"
      xmlns="http://www.tei-c.org/ns/1.0">	
      ]]></programlisting>
            <para>This is a more complex example, where several different standards have been used
                in a document. The main body of the document is written to the TEI standard, which
                is declared as the default, with the <code>xmlns</code> attribute. Other standards
                are defined with prefixes, so that elements from those standards are distinguished
                from the defaults. As an example of this mixing of standards, take a look at this
                markup fragment:</para>
            <programlisting><![CDATA[  <para>With XHTML, the declaration of the standard is important, as if it is not
    recognised, the browser will switch to a compatibility mode (known as
    <quote>Quirks</quote> mode, which will render the document much less effectively.
    For a full discussion on this, see Eric Meyer's <link
      xlink:href="http://www.ericmeyeroncss.com/bonus/render-mode.html">page on
    rendering modes</link>, and the material at <link
      xlink:href="http://www.quirksmode.org/css/quirksmode.html"
    >quirksmode.org</link>. For any new websites, it's advisable to work with the Strict
    XHTML standard, as this will provide the best support for the latest CSS standards
    in most modern browsers.</para>
    ]]> </programlisting>
            <para>This illustrates a mixture of DocBook and XLink standards, where the attributes to
                the <code>link</code> element are defined by the XLink standard, not the DocBook
                standard. The namespace declaration for this document looks like this: </para>
            <programlisting><![CDATA[  <book xmlns="http://docbook.org/ns/docbook" 
        xmlns:xlink="http://www.w3.org/1999/xlink"
        xmlns:xi="http://www.w3.org/2001/XInclude" 
        version="5.0">
                ]]></programlisting>
        </sect1>
    </chapter>
    <chapter xml:id="ch-dtds">
        <title>Defining Documents</title>
        <subtitle>Document Type Definitions</subtitle>
        <para>So far, we've looked primarily at existing document standards, such as XHTML and
            DocBook. These are defined through extensive discussion by groups of experts, and are
            extensively documented and described in various human-readable ways. However, their
                <quote>canonical</quote> definition lies within their Document Type
            Definition (DTD), a document (or more likely, a set of documents) that can be read by
            editors, validators and other software and used to determine whether any given document
            conforms to that standard. </para>
        <para>So, the <glossterm linkend="gl-dtd">DTD</glossterm> is a formal specification of a
            particular document type; it defines what is <emphasis>valid</emphasis> for each
                <glossterm linkend="gl-instance">instance</glossterm> of that document type. It
            defines the sequence of allowable elements and controls which entities and attributes
            may be used within each element. Most XML authoring software can read DTDs and can
            validate your documents by comparing them to your DTD. Some software will also use the
            DTD to restrict you to writing only valid markup as you author your documents.</para>
        <para>For some uses of XML, there's no available standard covering the specific markup we
            need to store our data with. Perhaps we need to store very detailed data for a specific
            subject domain. Or perhaps we want to create documents that have a specific structure
            tied to their purpose. For these types of applications, we need to create a new
            definition, and write our own DTD. </para>
        <para>DTDs are quite simple to write, but only allow very basic definitions of the elements
            and attributes that make up the application. For more detailed control over the
            structure of our instance documents, we can use more modern definition languages such as
            W3C Schemas or RelaxNG, both of which we'll cover in a later section. But as DTDs are
            easier to read and write, we'll start with them for our new XML application.</para>
        <sect1>
            <title>What is a DTD?</title>
            <para>To begin with, let's have a look at the example DTD shown in <xref
                    linkend="ex-simplest-jokebook-dtd"/>. This gives us a very simple document
                structure with some metadata (the <markup>bookinfo</markup> element) and a series of
                one or more <markup>joke</markup>s, all contained within the root node, called
                    <markup>jokebook</markup>. The simplest document that can be created with this
                DTD is shown in <xref linkend="ex-simplest-jokebook-doc"/>. <example
                    xml:id="ex-simplest-jokebook-dtd">
                    <title>A simple DTD for a jokebook</title>
                    <programlisting><![CDATA[
  <!ELEMENT jokebook (bookinfo, joke+)>
  <!ELEMENT bookinfo (title, editor+)>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT editor (#PCDATA)>
  <!ELEMENT joke (simplejoke+)>
  <!ELEMENT simplejoke (question, punchline)>
  <!ELEMENT question (#PCDATA)>
  <!ELEMENT punchline (#PCDATA)>
					]]></programlisting>
                </example></para>
            <para>
                <example xml:id="ex-simplest-jokebook-doc">
                    <title>A minimal jokebook document</title>
                    <programlisting><![CDATA[
  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE jokebook SYSTEM "jokebook.dtd">
  <jokebook>
    <bookinfo>
      <title/>
      <editor/>
    </bookinfo>
    <joke>
      <simplejoke>
        <question/>
        <punchline/>
      </simplejoke>
    </joke>
  </jokebook>
					]]></programlisting>
                </example>
            </para>
            <para>You should be able to roughly match the element names in the XML document with the
                element definitions in the DTD - in the next few sections we'll look at the exact
                syntax that allows us to define this document structure.</para>
        </sect1>
        <sect1>
            <title>Defining the Document</title>
            <para>The basic structure of a DTD comprises a sequence of definitions, of four types: </para>
            <itemizedlist>
                <listitem>
                    <para>Elements (or tags):
                            <computeroutput><![CDATA[<!ELEMENT ....>]]></computeroutput>
                    </para>
                </listitem>
                <listitem>
                    <para>Attributes: <computeroutput><![CDATA[<!ATTLIST ....>]]></computeroutput>
                    </para>
                </listitem>
                <listitem>
                    <para>Entities: <computeroutput><![CDATA[<!ENTITY ....>]]></computeroutput>
                    </para>
                </listitem>
                <listitem>
                    <para>Notations: <computeroutput><![CDATA[<!NOTATION ....>]]></computeroutput>
                    </para>
                </listitem>
            </itemizedlist>
            <para>The order of the definitions is generally not important, unless you're importing
                definitions from elsewhere (more on this another time). </para>
            <para>The DTD doesn't follow the usual XML rules of well-formedness, since it is not an
                XML document itself. However, the syntax is strict and case-sensitive.</para>
        </sect1>
        <sect1>
            <title>Defining Elements</title>
            <para>As you've probably guessed, the element definitions provide the basic tagset to be
                used in the document. They also define the strict order and repetition of elements
                within the document.</para>
            <para>A typical element definition may look like this:</para>
            <programlisting>
  &lt;!ELEMENT simplejoke (question, punchline)&gt;			
            </programlisting>
            <para>This gives us the following markup:</para>
            <programlisting><![CDATA[  <simplejoke>
    <question></question>
    <punchline></punchline>
  </simplejoke>
                ]]>
            </programlisting>
            <para>This says that every <markup>&lt;simplejoke&gt;</markup> must contain
                    <emphasis>one and only one</emphasis>
                <markup>&lt;question&gt;</markup>, followed by <emphasis>one and only one</emphasis>
                <markup>&lt;punchline&gt;</markup>. The <markup>simplejoke</markup> is the
                    <glossterm linkend="gl-parent">parent</glossterm> element, and
                    <markup>question</markup> and <markup>punchline</markup> are its <glossterm
                    linkend="gl-child">children</glossterm>.</para>
            <sect2>
                <title>Repetition+</title>
                <para>Obviously, you are able to allow more than one of each element:</para>
                <programlisting>
  &lt;!ELEMENT jokebook (joke+)&gt;
			</programlisting>
                <para>which means that a jokebook may contain <emphasis>one or more</emphasis>
                    jokes:<programlisting>  &lt;jokebook>
    &lt;joke>&lt;/joke>
    ... more &lt;joke> elements ...
  &lt;/jokebook></programlisting></para>
                <para>Similarly, elements may be marked as <emphasis>optional</emphasis>
                        (<computeroutput>?</computeroutput>) or as appearing <emphasis>none or
                        more</emphasis> times (<computeroutput>*</computeroutput>). You can also use
                    the keyword <computeroutput>ANY</computeroutput> to allow any defined tag or
                    text at that point (though this is rarely used).</para>
            </sect2>
            <sect2>
                <title>Choices | choices | choices</title>
                <para>Alternatives can also be marked:</para>
                <programlisting>
  &lt;!ELEMENT joke (simplejoke | knockknockjoke | 
      doctordoctorjoke | limerick)&gt;
			</programlisting>
                <para>so that a joke may contain one (and only one) of several types. A valid
                    pattern might be:
                    <programlisting>  &lt;joke>&lt;limerick>&lt;/limerick>&lt;/joke></programlisting>We
                    can also allow multiple choices from a list <emphasis>in any order</emphasis>,
                    by saying:</para>
                <programlisting>
  &lt;!ELEMENT myjokes (simplejoke | knockknockjoke | 
      doctordoctorjoke | limerick)+ &gt;
			</programlisting>
                <para>which creates an <emphasis>unordered list</emphasis> of one or more jokes of
                    the listed type, perhaps like
                    this:<programlisting>  &lt;myjokes>
    &lt;knockknockjoke>&lt;/knockknockjoke>
    &lt;limerick>&lt;/limerick>
    &lt;limerick>&lt;/limerick>
    &lt;simplejoke>&lt;/simplejoke>
    &lt;limerick>&lt;/limerick>
  &lt;/myjokes></programlisting></para>
                <para>Remember that, for these to validate, each of these individual element types
                    will also need to be defined, even if only as #PCDATA.</para>
            </sect2>
            <sect2>
                <title>Tricks and tips</title>
                <para>Controlling and limiting multiple items is not always easy with DTDs,
                    especially where there's a need to have a finite maximum or minimum that isn't
                    zero or one. There are various tricks needed to work using these conventions,
                    such as:</para>
                <programlisting>
  &lt;!ELEMENT limerick (line, line, line, line, line)&gt;
				</programlisting>
                <para>to define a strictly five-line verse, or</para>
                <programlisting>
  &lt;!ELEMENT double-entendre (phrase, meaning, meaning+)&gt;
				</programlisting>
                <para>to define it as a phrase followed by <emphasis>two or more</emphasis>
                    meanings:<programlisting>  &lt;double-entendre>
    &lt;phrase>&lt;/phrase>
    &lt;meaning>&lt;/meaning>
    &lt;meaning>&lt;/meaning>
    ... more &lt;meaning>s if required ...
  &lt;/double-entendre></programlisting></para>
                <para>This method is quite a limitation if you wanted to define an element with,
                    perhaps, twenty child elements. Both W3C Schemas and RelaxNG provide more
                    effective ways of defining multiple child elements.</para>
            </sect2>
            <sect2>
                <title>Adding text</title>
                <para>So far, we've only defined tags which contain other tags, so we need a way of
                    allowing tags to contain arbitrary text. We can do this using a #PCDATA
                    clause:</para>
                <programlisting>
  &lt;!ELEMENT phrase (#PCDATA)&gt;
				</programlisting>
                <para>which allows any text <emphasis>except tags</emphasis> to be included as part
                    of the phrase. Allowing a mixture of tags and text is trickier, and often
                    confuses parsers and validators. You can mix <markup>#PCDATA</markup> with other
                    elements in a choice list, but not in a sequence list, so:</para>
                <programlisting>
  &lt;!ELEMENT line (#PCDATA | emphasis | rhyme)*&gt;
				</programlisting>
                <para>is valid, and could give the following markup:
                    <programlisting>  &lt;line>There &lt;emphasis>was&lt;/emphasis> a 
    young lady from &lt;rhyme>Crewe&lt;/rhyme>
  &lt;/line>
</programlisting>but
                    the next mixes <markup>#PCDATA</markup> within a sequence, so isn't a valid DTD
                    definition:</para>
                <programlisting>
  &lt;!ELEMENT double-entendre (phrase, meaning, meaning+, #PCDATA)&gt;
				</programlisting>
                <para>In this case, you'll need to define another element which then contains the
                    arbitrary text itself. Generally, though, mixing text and elements is frowned
                    upon as poor document design, unless the document calls for
                        <quote>inline</quote> markup (as, for example, the inline elements in XHTML
                    such as &lt;strong&gt;, &lt;a&gt;, etc.).</para>
            </sect2>
        </sect1>
        <sect1>
            <title>Adding Attributes</title>
            <para>Often it's preferable to define attributes to specify more detailed information
                about an element. In HTML, attributes were often used to modify the
                    <emphasis>presentation</emphasis> of the element on the screen; in XML, the
                attributes should only be used to describe <emphasis>content</emphasis>.</para>
            <para>Attributes are often used to provide <glossterm linkend="gl-metadata"
                    >metadata</glossterm>, information about the data contained in the element, such
                as its source, language or accuracy.</para>
            <para>So we may want to specify in our XML document thus:</para>
            <programlisting>
  &lt;joke author=&quot;Lee Evans&quot; cert=&quot;18&quot;&gt; ...... &lt;/joke&gt;
            </programlisting>
            <para>which we can define as:</para>
            <programlisting>
  &lt;!ATTLIST joke author CDATA #IMPLIED&gt;
            </programlisting>
            <para>The <computeroutput>ATTLIST</computeroutput> is followed by the element it applies
                to, then the name of the attribute it is defining. In this case it is an arbitrary
                text field (<computeroutput>CDATA</computeroutput> -- note no
                    <computeroutput>#P</computeroutput>!), and is optional
                    (<computeroutput>#IMPLIED</computeroutput>). </para>
            <para>Explicit values can also be specified here:</para>
            <programlisting>
  &lt;!ATTLIST joke cert (U | 12 | 15 | 18) #REQUIRED&gt;
            </programlisting>
            <para>so that the author must choose one of the certificate values.</para>
            <para>And a default value may be given:</para>
            <programlisting>
  &lt;!ATTLIST joke author CDATA &quot;anonymous&quot;&gt;
            </programlisting>
            <para>where, if otherwise unspecified, the value will default to
                    <computeroutput>&quot;anonymous&quot;</computeroutput>.</para>
            <para>You can also specify a fixed value:</para>
            <programlisting>
  &lt;!ATTLIST joke language CDATA #FIXED &quot;English&quot;&gt;
            </programlisting>
            <para>where the attribute will be defined as a specific constant whether or not it's
                encoded in the document.</para>
            <sect2>
                <title>Unique Identifiers</title>
                <para>It's common to want to refer to a certain section of a document using some
                    kind of identifier, so provision has been made for this using the special
                    attribute 'ID'. This is reserved for use as a unique marker within each document
                    (HTML has this feature, though it is little used). In particular, many database
                    applications of XML will generate or read this attribute as a part of its key
                    for the data.</para>
                <programlisting>
  &lt;!ATTLIST joke code ID #REQUIRED&gt;
                </programlisting>
                <para>You can also define an attribute to be a list of such IDs, as
                    cross-references:</para>
                <programlisting>
  &lt;!ATTLIST joke seealso IDREFS #IMPLIED&gt;
                </programlisting>
                <note>
                    <title>Practical task</title>
                    <para>Develop a document type of your own. To begin, think of a type of document
                        that has a simple structure, and that records the same information regularly
                        within that structure. A simple example might be an address book - each
                        entry follows a regular pattern and records the same type of information for
                        each person. </para>
                    <para>Begin by marking up a <quote>typical</quote> document, creating the tags
                        as you go. Then deconstruct the marked up document into a series of element
                        definitions in a separate file. As you go, try creating a new document using
                        your DTD in <![CDATA[<oXygen/>]]>, using the validation facility to test
                        your DTD.</para>
                    <para>This works best with regular, well-structured documents, such as meeting
                        minutes, restaurant menus, recipes, product specifications (e.g. cars,
                        computers,...) and so on. </para>
                    <para>As an example, I've created a <link
                            xlink:href="http://pallas.ex.ac.uk/pallas/teaching/mit3112/outlines/outlines.dtd"
                            >DTD</link> for the module outlines that we produce to define each
                        module (e.g. <link
                            xlink:href="http://pallas.ex.ac.uk/pallas/teaching/mit3112/outlines/mit3112.xml"
                            >this one</link>) that you take (see the <link
                            xlink:href="http://pallas.ex.ac.uk/pallas/teaching/mit3112/outlines/"
                            >outlines</link> folder for other related files). </para>
                </note>
            </sect2>
        </sect1>
        <sect1>
            <title>Good Document Design</title>
            <para>Over the coming weeks, we'll build up a picture of how to analyse data and build a
                document design that models the data accurately and flexibly. For now, we need to
                bear in mind a few steps towards this process: <itemizedlist>
                    <listitem>
                        <para>identify your basic data items</para>
                    </listitem>
                    <listitem>
                        <para>group related data items together</para>
                    </listitem>
                    <listitem>
                        <para>organise these groups into a hierarchical (tree-like) structure</para>
                    </listitem>
                    <listitem>
                        <para>examine the level of detail recorded for each data item - can it be
                            broken down into further elements?</para>
                    </listitem>
                    <listitem>
                        <para>determine whether inline markup is needed within text elements?</para>
                    </listitem>
                    <listitem>
                        <para>look at the metadata (attributes) needed for each data item</para>
                    </listitem>
                </itemizedlist></para>
            <para>Try to design for reusability; don't be restricted by presentation issues or by a
                specific output format for the document. Most XML documents have a life after their
                initial purpose, often unexpected, and planning for this should be part of your
                design. For example, if recording personal names, separating the first and surnames
                into separate data elements can allow more flexible processing, perhaps producing
                lists sorted by surname <emphasis>or</emphasis> firstname:<programlisting><![CDATA[
  <personname>
    <firstname>Gary</firstname>
    <surname>Stringer</surname>
  </personname>
                        
                        ]]></programlisting></para>
            <para>Above all, your document design should be a <emphasis>semantic</emphasis>
                representation of the structured data contained within your documents.</para>
            <para>We'll revisit this topic in more depth in the next session.</para>
        </sect1>
        <sect1>
            <title>Entities and Other Useful(?) Stuff</title>
            <important>
                <title>Warning!</title>
                <para>Most of the techniques described in the rest of this chapter are gradually
                    being replaced with XML-related standards. </para>
                <para>For example, the use of entities for accented characters is made redundant by
                    the adoption of Unicode; notations are more commonly dealt with by embedding
                    markup (via namespaces), etc.</para>
            </important>
            <sect2>
                <title>What are Entities?</title>
                <para>Entities are used for several reasons: <itemizedlist>
                        <listitem>
                            <para>to create a shorthand for entering often-used text;</para>
                        </listitem>
                        <listitem>
                            <para>to define user-friendly names for special characters;</para>
                        </listitem>
                        <listitem>
                            <para>to include material from an external file;</para>
                        </listitem>
                        <listitem>
                            <para>to define material that should not be parsed.</para>
                        </listitem>
                    </itemizedlist></para>
                <para>Defining a text entity (an <firstterm>internal general entity</firstterm>) in
                    the DTD:
                    <programlisting>
  &lt;!ENTITY uoe &quot;University of Exeter&quot;&gt;
                    </programlisting>
                    then using the entity in an XML document:
                    <programlisting>
  &lt;p&gt;The &amp;uoe; is a very rainy place&lt;/p&gt;
                    </programlisting></para>
                <sect3 xml:id="sec-accents-and-symbols">
                    <title>Inserting Accents and Symbols</title>
                    <para>Remember the encoding parameter in the XML declaration, which we mentioned
                        way back in the introduction in <xref linkend="ex-home-made-markup"/>?
                        <programlisting>  &lt;?xml version="1.0" encoding="UTF-8"?>
                        </programlisting></para>
                    <para>An XML document can be written as a Unicode document, which allows the use
                        of a vast array of multinational characters. Unicode should be used wherever
                        possible, and should cope with most situations you come across. However,
                        when dealing with legacy data, it's sometimes necessary to deal with other
                        encodings.</para>
                    <para>In the past, it was more usual to write XML as plain ASCII, a standard
                        format that almost all text editors use. Since ASCII doesn't have provision
                        for multinational and symbol characters, we need a way of defining them, and
                        inserting them easily into our text.</para>
                    <para>Special characters are already part of the XML standard; any Unicode
                        character can be inserted using a character reference, which looks similar
                        to an entity and uses the character's Unicode reference number, e.g. an
                        e-acute is <computeroutput>&amp;#233;</computeroutput>
                    </para>
                    <para>As you can see, this isn't exactly an easy-to-remember way of inserting
                        special characters, so we usually define entities as more memorable
                        references. In XHTML, for example, we can use the entity
                            <computeroutput>&amp;eacute;</computeroutput> which is defined
                        as:<programlisting>
  &lt;!ENTITY eacute CDATA &quot;&amp;#233;&quot; -- small e, acute accent --&gt;
                        </programlisting>There's
                        a list of character references for commonly used characters in Appendix C of
                        Castro (2001), and there are numerous more complete lists on the web.</para>
                    <note>
                        <title>Exercises</title>
                        <para>Using the DTD and XML data documents you created for the previous
                            exercise, try the following: <orderedlist>
                                <listitem>
                                    <para>Add a few standard accented characters (e.g. umlauts or
                                        acute/grave accents) as entities to your DTD.</para>
                                </listitem>
                                <listitem>
                                    <para>Insert a
                                            <computeroutput>lang=&quot;____&quot;</computeroutput>
                                        attribute to one of the elements in your DTD.</para>
                                </listitem>
                                <listitem>
                                    <para>Add some multilingual text to your documents, with the
                                        relevant language attribute set.</para>
                                    <para>
                                        <emphasis>Hint: you'll also need to allow multiple instances
                                            of the tag containing the multilingual text.</emphasis>
                                    </para>
                                </listitem>
                            </orderedlist></para>
                    </note>
                </sect3>
                <sect3>
                    <title>External Entities</title>
                    <para>An <firstterm>external entity</firstterm> defines a binary chunk of data
                        for later use. It's commonly employed for regularly used data such as
                        graphical logos or icons that are used frequently within all documents of
                        the type being defined; it's not normally used for one-off graphics such as
                        diagrams, illustrations or other <quote>content-related</quote> items.
                        Here's how it works: :<programlisting>
                            <![CDATA[  <!ENTITY unilogo SYSTEM logo-large.jpg NDATA jpg>
  <!ENTITY cmitlogo SYSTEM cmit-logo.gif NDATA gif>
               ]]> </programlisting></para>
                    <para>Creating an attribute that refers to that
                        data:<programlisting>
  &lt;!ELEMENT logo (alternatetext?)&gt;
  &lt;!ATTLIST logo image ENTITY #REQUIRED&gt;
  &lt;!ELEMENT alternatetext (#PCDATA)&gt;
                        </programlisting>then
                        referring to that picture in the XML
                        file:<programlisting>
  &lt;logo image=&quot;cmitlogo&quot;&gt;
    &lt;alternatetext&gt;Creative Media and 
      Information Technology.&lt;/alternatetext&gt;
  &lt;/logo&gt;
                        </programlisting></para>
                    <important>
                        <title>Including images as entities</title>
                        <para>In practice, this method of including an image is clumsy and very
                            restrictive, since the location of the image file must be defined in the
                            DTD. It's much more usual to merely indicate a filename as a standard
                            attribute and use a stylesheet or script to insert the image.</para>
                        <para>The entity method can, however, be useful to include very commonly
                            used items such as a corporate logo, or regularly used icons, as part of
                            a the text of the document.</para>
                    </important>
                </sect3>
                <sect3>
                    <title>Notations</title>
                    <para>When including unparsed content, the applications processing the data need
                        to know something about the format of the data included, in order to process
                        it correctly. For this we need to create a
                            <computeroutput>&lt;!NOTATION&gt;</computeroutput> entry. So for
                        example, we might have:
                        <programlisting>
                            &lt;!NOTATION jpg SYSTEM &quot;image/jpeg&quot;&gt;
                            &lt;!NOTATION svg SYSTEM &quot;image/svg-xml&quot;&gt;
                        </programlisting>
                        to allow us to use two different formats for graphical data. Note that the
                        second, SVG, is also an XML document, though we don't want to parse it - it
                        should be passed directly to the application. </para>
                    <para>The value given for each type of file is called a
                            <firstterm>MIME-type</firstterm>, and is a standard code that most web
                        browsers and data processing systems can use. There are numerous lists of
                        MIME-types on the web; the most authoritative is at <link
                            xlink:href="http://www.iana.org/assignments/media-types/">IANA</link>,
                        and a slightly friendlier list can be seen at <link
                            xlink:href="http://www.w3schools.com/media/media_mimeref.asp"
                            >W3Schools</link>.</para>
                </sect3>
            </sect2>
        </sect1>
    </chapter>
    <chapter xml:id="ch-css">
        <title>Cascading Stylesheets</title>
        <para>When read by a browser, an XML document bears little relation to the HTML that it
            knows about by default, so we have to give it clues as to how it should display the
            elements we've created. We can do this in one of two ways, either giving it information
            on displaying existing tags, or translating the document into tags it knows about. To do
            the former, we use CSS, which should be a familiar technique from standard HTML.</para>
        <para>A typical CSS stylesheet for an XML document might look like this:</para>
        <example>
            <title>Simple CSS stylesheet for jokebook (excerpt)</title>
            <programlisting>
  jokebook   {display:block}
 
  joke       {display:block}
 
  simplejoke {display:block;
              border:thin inset blue}
 
  question   {display:inline; 
              font-size:12pt; 
              font-weight:bold}
 
  punchline  {display:inline; 
              font-size:12pt; 
              font-style:italic}
  ...
			</programlisting>
        </example>
        <para>Each entry in the stylesheet consists of a selector (e.g. <markup>jokebook</markup>
            with one or more properties (e.g. <markup>display:block</markup>) attached to it.</para>
        <para>Each element must be defined in the stylesheet with its presentational properties. For
            each element in your DTD, you must define at least the <markup>display</markup>
            property, which governs how the element behaves on the page. If you set
                <markup>display:block</markup> then the element will be shown as an independent
            block of content, on its own line (behaving in a similar way to
                <markup>&lt;p&gt;</markup>, <markup>&lt;h1&gt;</markup>, etc.). Setting it to
                <markup>display:inline</markup> will make it behave more like characters on the
            page. </para>
        <para>Other properties should be vaguely familiar to you if you've used CSS to format HTML
            before -- the actual formatting properties are the same for HTML and XML, and if you're
            used to using CSS with HTML then defining stylesheets for XML is much the same.</para>
        <para>You'll also note that class selectors (the <markup>.myclass</markup> notation in CSS,
            which selects a <markup>class="myclass"</markup> attribute in the XHTML) are not
            commonly used, as there may be no class attribute defined in the DTD for the XML
            document. However, the <markup>xml:id</markup> attribute is often used for more specific
            cases, and as the markup is more semantic with better defined elements serving more
            specific purposes, so the absence of the class notation is rarely a problem. </para>
        <para>Of course, you can also use any attribute as a selector, so for the following markup: <programlisting><![CDATA[  <joke cert="18"> .... </joke>
                                ]]></programlisting> the following rule will prevent the joke being
            displayed: <programlisting><![CDATA[  joke[cert="18"] {
      display: none;
  }             
                ]]></programlisting></para>
        <sect1>
            <title>Associating a CSS file with an XML document</title>
            <para>In HTML, you may be used to linking to a CSS stylesheet using something like this: <programlisting><![CDATA[  <link rel="stylesheet" href="default.css" type="text/css">
					]]></programlisting> or through Internet Explorer's <markup><![CDATA[@import URL()]]></markup>
                directive: <programlisting><![CDATA[  <style type="text/css">
    @import URL(http://mysite.org/default.css); 
  </style>
					]]></programlisting></para>
            <para>In XML, these are replaced by <glossterm>processing instructions</glossterm>,
                which link the stylesheet in a similar way: <programlisting><![CDATA[  <?xml-stylesheet href="default.css" type="text/css"?>
					]]></programlisting></para>
            <para>In XML, as in HTML, you can also define alternate stylesheets: <programlisting><![CDATA[  <?xml-stylesheet alternate="yes" title="Large Print" 
	href="largeprint.css" type="text/css"?>
					
  <?xml-stylesheet alternate="yes" title="High Contrast" 
	href="hicontrast.css" type="text/css"?>
					
  <?xml-stylesheet alternate="yes" title="Print Only" 
	href="print.css" type="text/css" media="print"?>
	]]></programlisting> If viewing the XML document through the Firefox browser, you can then choose
                between these stylesheets using the <menuchoice>
                    <guisubmenu>View</guisubmenu>
                    <guisubmenu>Page Style</guisubmenu>
                    <guimenuitem>Stylesheet-name</guimenuitem>
                </menuchoice> commands. The <markup>media="print"</markup> parameter means that the
                "Print Only" stylesheet will be used for printing by default. </para>
            <para>Note that the last of the alternate stylesheets above adds a
                    <markup>media</markup> attribute, which restricts the application of the
                stylesheet to a particular display type, such as <markup>"print"</markup>,
                    <markup>"screen"</markup>, <markup>"aural"</markup>, etc.<footnote>
                    <para>See <link xlink:href="http://www.w3.org/TR/CSS21/media.html"/> for a full
                        list of media types.</para>
                </footnote> CSS itself has a method for restricting rules to certain display types,
                with the <code>@media</code> rule: <programlisting><![CDATA[  @media print {
    body { font-size: 10pt; }
  }
  @media screen {
    body { font-size: 14px; }
  }
  @media print,screen {
    body { line-height: 1.2; }
  }
  ]]></programlisting></para>
        </sect1>
        <sect1>
            <title>Using the Cascade</title>
            <para>The C in CSS stands for <quote>Cascade</quote>, which describes a property of
                stylesheets that allows properties defined in one part of a stylesheet to override
                those defined elsewhere, or to override default properties. We can use the cascade
                to create useful effects within our stylesheets. </para>
            <sect2>
                <title>How the Cascade Works</title>
                <para/>
                <sidebar>
                    <itemizedlist spacing="compact">
                        <listitem>
                            <para>The rules for the CSS cascade can be summarised as: </para>
                            <itemizedlist spacing="compact">
                                <listitem>
                                    <para>Designer <emphasis>overrides</emphasis> User
                                            <emphasis>overrides</emphasis> Browser</para>
                                </listitem>
                                <listitem>
                                    <para>More specific <emphasis>overrides</emphasis> less
                                        specific</para>
                                </listitem>
                                <listitem>
                                    <para>ID selectors <emphasis>override</emphasis> classes
                                            <emphasis>overrides</emphasis> tags</para>
                                </listitem>
                            </itemizedlist>
                        </listitem>
                        <listitem>
                            <para>Order is also important - later rules override earlier</para>
                        </listitem>
                    </itemizedlist>
                </sidebar>
                <para>The rules of the Cascade mean that it is possible to write very general CSS
                    rules which cover most elements, then redefine or override those rules for more
                    specific cases. So, where there are generic text elements, you can set up a rule
                    for the general case, then redefine properties for specific cases.</para>
                <para>For more details on the Cascade, see the <link
                        xlink:href="http://www.w3.org/TR/REC-CSS2/cascade.html#cascade">explanation
                        in the CSS Guidelines</link>.</para>
                <para>Firstly, the cascade prioritises the rules created by the website designer
                    over and above any that are defined in the browser, either as defaults or those
                    overridden by the user themselves (it's possible to customise the default
                    behaviour of most browsers).</para>
                <para>Secondly, more specific selectors will allow a rule to take precedence. So, a
                    rule which selects only <markup>p</markup> elements will be overridden by a rule
                    which selects all <markup>p</markup> elements which are children of a
                        <markup>div</markup>, so if the CSS says: <programlisting><![CDATA[  div>p { color:blue; }
  p { color:red; }
         ]]></programlisting> then the paragraphs directly inside a <markup>div</markup> will be
                    coloured blue, and those not inside a <markup>div</markup> will be coloured red,
                    even though the order might suggest otherwise.</para>
                <para>Thirdly, there is a precendence of ID-based selectors over other types, so
                    using a <markup>#name</markup> selector will usually take priority over any
                    general redefinitions of elements. </para>
                <para>However, the most important feature of the cascade is that later rules will
                    override earlier ones (if of the same type).</para>
                <para>It's quite common to define a set of defaults in a common stylesheet, then to
                    override some of those rules with more specific stylesheets which are included
                    later in the document. And sometimes, web pages override the site's CSS with
                    inline CSS rules specific to individual pages. This occurs with the CMIT web
                    pages, where inline CSS colours specific subdivisions of the menus, to give a
                    contextual highlighting of the menu for the 'current' module.</para>
            </sect2>
        </sect1>
        <sect1>
            <title>Advanced CSS Selectors</title>
            <para>Some of the more advanced <firstterm>pseudo-classes</firstterm> and
                    <firstterm>psuedo-elements</firstterm> available in recent versions of CSS
                    (<link xlink:href="http://www.w3.org/TR/CSS21/">CSS 2.1</link> and the proposed
                    <link xlink:href="http://www.w3.org/Style/CSS/current-work/">CSS 3</link>) are
                extremely useful when styling XML directly with CSS. Though not available in all
                browsers, their use is gradually becoming more widespread. Here's a selection of the
                most useful:</para>
            <formalpara>
                <title>Styling the first of a group of elements - <code>:first-child</code></title>
                <para>Applies a style to an element if it is the first child of its parent. This can
                    be used to emphasise the first line of a poem, for example, or to provide
                    different formatting for the first item of a list.
                    <programlisting>  chapter { display: block }

  chapter:first-child { color: blue }

</programlisting>
                    Here, the first chapter in the list will be coloured blue [Example: <link
                        xlink:href="http://www.ex.ac.uk/cmit/modules/meaningful_markup/examples/css-first-child/book.xml"
                        >XML</link> | <link
                        xlink:href="http://www.ex.ac.uk/cmit/modules/meaningful_markup/examples/css-first-child/css-first-child.css"
                        >CSS</link>].</para>
            </formalpara>
            <!--
			<formalpara>
				<title>Styling links - <code>:link</code> and <code>:visited</code></title>
				<para />
			</formalpara>
			<formalpara>
				<title>Styling links dynamically - <code>:hover</code>,
					<code>:active</code> and <code>:focus</code></title>
				<para />
			</formalpara>
			-->
            <formalpara>
                <title>Styling the first of a group of lines or letters - <code>:first-line</code>
                    and <code>:first-letter</code></title>
                <para>
                    <programlisting>
  p              { font-size: 12pt; line-height: 1.2 }
  p:first-letter { font-size: 200%; float:left }
					</programlisting>
                </para>
            </formalpara>
            <formalpara>
                <title>Adding text before/after an element - <code>:before</code> and
                        <code>:after</code></title>
                <para>The following will add the word 'Warning' to any paragraphs with
                        <markup>class="warning":</markup><programlisting>
  p.warning:before { content: "Warning: " }
					</programlisting>
                    and this rule will add a number before each h1 element, counting up with roman
                    numerals:
                    <programlisting>
  h1:before { content: counter(chapno, upper-roman) ". " }
                    </programlisting>
                    note here that the <markup>chapno</markup> is an identifier for the counter; any
                    other numbering within the document would need to be identified by a different
                    name. You can also reset the counters within rules, so that if numbering
                    sections within a chapter, you could reset the numbers at each new
                    chapter.</para>
            </formalpara>
            <para>There are many other new rule types in the newer CSS (2.1) standards, which allow
                many useful effects to be created. However, there are still limitations to using CSS
                to style XML, most notably that it's often useful to re-order or filter the document
                before display. Whilst the newer CSS layout techniques can help with this, there is
                a better tool for the job in XSLT, which is purpose-designed for this role. We'll
                begin examining XSLT next week.</para>
            <warning>
                <title>Browser compatibility</title>
                <para>Note that many of the CSS 2.1 functions are only available in a few browsers -
                    Firefox seems the most reliable for this at the moment - though eventually, they
                    should be supported in all browsers.</para>
                <para>For the purposes of the assignment, I'm quite happy for you to develop CSS
                    which uses the newer features, rather than trying to produce effects that work
                    across all browsers - but don't forget to mention in your report which browsers
                    you used to test the CSS in!</para>
            </warning>
            <note>
                <title>Further reading...</title>
                <para>The W3C have produced a guide to <link
                        xlink:href="http://www.w3.org/Style/styling-XML">adding style to
                    XML</link>.</para>
            </note>
        </sect1>
    </chapter>
    <chapter xml:id="ch-xslt-intro">
        <title>Introducing XSL Transformations</title>
        <para>CSS is a good way of laying out your pages in a well-designed manner directly from the
            XML document. And it provides a way for you to allow most browsers to view your XML
            without falling over or drawing a blank. But CSS is very limited in what it can do to
            the data before it is shown on screen. And it certainly can't manipulate the data in any
            real way, or rewrite it into other markup languages. That's where XSL steps in.</para>
        <para>The XSL standard comprises two basic parts: <variablelist>
                <varlistentry>
                    <term>XSL Transformations (XSLT)</term>
                    <listitem>
                        <para>These are widely used, well-defined and stable as a standard, and are
                            used primarily to rewrite XML into other XML applications (such as XHTML
                            or WML).</para>
                    </listitem>
                </varlistentry>
                <varlistentry>
                    <term>XSL Formatting Objects (XSL-FO)</term>
                    <listitem>
                        <para>This standard is used to generate formatted output such as PDF files;
                            they provide exact control of layout and presentation that isn't
                            possible with either XSLT or CSS.</para>
                    </listitem>
                </varlistentry>
            </variablelist></para>
        <para>These two languages share many features, but are designed to achieve different tasks.
            XSLT is a general-purpose language for transforming one XML-based markup language into
            another, whereas XSL-FO is designed specifically for creating views of XML data in
            paged-media formats. Over the next couple of weeks, we're going to look at XSLT in
            depth, and maybe have a quick glance at XSL-FO if there's time.</para>
        <sect1>
            <title>A Simple Stylesheet</title>
            <para>Let's begin by looking at our jokebook example to see why XSLT could be useful.
                One task that we might want to achieve is to produce an XHTML version of our jokes,
                for display on the web. XSLT is designed to this this job, as it can transform one
                XML-based markup language into any other. </para>
            <figure>
                <info>
                    <title>The XSLT Processing Model</title>
                </info>
                <mediaobject>
                    <imageobject>
                        <imagedata fileref="figures/xslt-processing-model.png" format="PNG"
                            scalefit="1"/>
                    </imageobject>
                </mediaobject>
                <caption>
                    <para>
                        <emphasis>The XSLT stylesheet is applied using an XSLT processor (such as
                            the ones built into the Oxygen editor), and converts documents from one
                            XML application into another. CSS can still be applied to the resulting
                            XML document.</emphasis>
                    </para>
                </caption>
            </figure>
            <para>To perform this transformation, we need to write stylesheets (though they are
                rather different from the CSS stylesheets we'e already examined). The stylesheet
                defines a set of templates that can be applied to the XML data document, and which
                contain rules to manipulate that data into the target format.</para>
            <para>XSLT stylesheets are, of course, written as well-formed valid XML, so the basic
                syntax rules apply. They are usually a mixture of markup languages, including
                elements from the target language(s), together with statements from the XSLT
                language itself.</para>
            <para>Here's a subset of a simple example: </para>
            <example xml:id="ex-simple-xslt-stylesheet">
                <title>A simple XSLT stylesheet for jokebook (excerpt)</title>
                <programlisting>
&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/TR/xsl&quot;&gt;
 
  &lt;xsl:template match=&quot;jokebook&quot;&gt;
	&lt;html&gt;
	  &lt;head&gt;
		&lt;title&gt;My Jokes&lt;/title&gt;
	  &lt;/head&gt;
	  &lt;body&gt;
		&lt;h1&gt;My Jokes&lt;/h1&gt;
		&lt;xsl:apply-templates /&gt;
	  &lt;/body&gt;
	&lt;/html&gt;
  &lt;/xsl:template&gt;

  ....

  &lt;xsl:template match=&quot;double-entendre&quot;&gt;
	&lt;h3&gt;Double-entendre:&lt;/h3&gt;
	&lt;p&gt;The phrase 
	  &lt;xsl:value-of select=&quot;phrase&quot; /&gt;
	  can mean:&lt;/p&gt;
	&lt;ol&gt;
	  &lt;xsl:for-each select=&quot;meaning&quot;&gt;
		&lt;li&gt;
		  &lt;xsl:value-of /&gt;
		&lt;/li&gt;
	  &lt;xsl:for-each /&gt;
	&lt;/ol&gt;

  &lt;/xsl:template&gt;
 
  ....

  &lt;xsl:template match=&quot;/&quot;&gt;
	&lt;xsl:apply-templates /&gt;
  &lt;/xsl:template&gt;
   
&lt;/xsl:stylesheet&gt;
			
			</programlisting>
            </example>
            <para>As you can see, it's written as an XML document, which means it must be
                well-formed according to the XML rules. The repercussion of this is that any
                embedded markup in the stylesheet must also be well-formed XML, which in turn means
                that our results will always be well-formed XML too. This can be difficult to work
                with at first, but will yield benefits in the quality of markup that is
                produced.</para>
            <para/>
        </sect1>
        <sect1>
            <title>A More Detailed Look</title>
            <para>The first thing to notice in the example is the mixture of XSL and XHTML markup.
                We're generating an XHTML page from an XML document, so the XHTML markup is inserted
                at the relevant output stage.</para>
            <para>We begin with the declarations:</para>
            <programlisting>
  &lt;?xml version=&quot;1.0&quot;?&gt; 
  &lt;xsl:stylesheet version=&quot;1.0&quot; 
      xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt; 
			</programlisting>
            <para>which define the verion of XML and XSL we're using. The <markup>xmlns:xsl</markup>
                clause is a 'namespace' declaration, and defines the standard we're using for the
                    <markup><![CDATA[<xsl:_____>]]></markup> tags. We're using the convention of an
                    <markup>xsl:</markup> prefix, but you could use another prefix such as
                    <markup>x:</markup> or <markup>xslt:</markup>. </para>
            <para>There are various locations used for this namespace declaration, the above is the
                current recommendation, though for some processors you may have to change this; for
                example, the built-in XSLT processor in some versions of IE6 require:</para>
            <programlisting>
  &lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/TR/WD-xsl&quot;&gt;
			</programlisting>
            <para> The 'official' declaration, as recommended by the W3C, for an XSLT stylesheet
                outputting XHTML is: <programlisting><![CDATA[
  <xsl:stylesheet version="1.0"
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns="http://www.w3.org/TR/xhtml1/strict">]]>
				</programlisting> Note that this also defines
                (with the default namespace) the version of XHTML that you're using (the strict
                version, in this case) within the second namespace (xmlns) declaration. Any markup
                which has no prefix will thus be XHTML Strict.</para>
            <para>The body of the stylesheet consists mostly of
                    <markup><![CDATA[<xsl:template>]]></markup> elements, which are designed to
                    <markup>match</markup> the elements in our source document. So if we have a
                sequence of <markup><![CDATA[<joke>]]></markup> elements in our source document, we
                can use a <markup><![CDATA[<xsl:template match="joke">]]></markup> element to
                process each of these with XSLT. </para>
            <para> The last section, common to many XSLT 1.0 stylesheets, is the root template
                declaration: <programlisting><![CDATA[
  <xsl:template match="/"> 
    <xsl:apply-templates /> 
  </xsl:template>]]>
				</programlisting> This is really just saying that the templates defined in
                this stylesheet apply to the whole of the XML document tree. If you don't include
                it, most processors will use a default template which matches this definition, but
                it's useful to be able to override this default. Most version 1 stylesheets include
                the root template definition for completeness.</para>
            <para> If we examine this more carefully, we can see that we're defining a new template
                within the <markup><![CDATA[<xsl:template>]]></markup> which matches a particular
                node or set of nodes in the document tree, in this case the topmost or root node.
                ("/"), which is the container for the entire XML document. We're then defining this
                template to merely look for and process other templates for the child nodes of the
                document.</para>
            <warning>
                <title>XSLT Version 1.0 or 2.0?</title>
                <para> On recent versions of the Oxygen editor, you'll be asked to choose between
                    XSLT 1.0 and 2.0 when creating a new stylesheet. At this stage, I'd
                        <emphasis>strongly</emphasis> recommend creating XSLT 1.0, as it's simpler
                    and has fewer <quote>gotchas</quote> than 2.0, which will cause much
                    head-scratching! </para>
                <para>Once you're confident with 1.0, then you can take the step towards the newer
                    standard. We'll look next week at some of the new features and complications
                    introduced with XSLT 2.0.</para>
            </warning>
        </sect1>
        <sect1>
            <title>Outputting Markup</title>
            <para>In the stylesheet shown in <xref linkend="ex-simple-xslt-stylesheet"/>, each
                template (i.e. within the <markup><![CDATA[<xsl:template>]]></markup> elements)
                could also include XHTML markup, since that's what we're outputting to our result
                document in this case.</para>
            <para>The template for 'jokebook', which is the root element of our XML file in each
                case, is defined first.
                <programlisting>  &lt;xsl:template match=&quot;jokebook&quot;&gt;
	&lt;html&gt;
	  &lt;head&gt;
		&lt;title&gt;My Jokes&lt;/title&gt;
	  &lt;/head&gt;
	  &lt;body&gt;
		&lt;h1&gt;My Jokes&lt;/h1&gt;
		&lt;xsl:apply-templates /&gt;
	  &lt;/body&gt;
	&lt;/html&gt;
  &lt;/xsl:template&gt;
   </programlisting>
                Here, we're adding some structural XHTML markup which will form the basis of the
                XHTML result document, and in the midst of this asking the processor to apply any
                relevant templates for the contents of the <markup><![CDATA[<jokebook>]]></markup>
                tag, which is achieved with the <markup><![CDATA[<xsl:apply-templates />]]></markup>
                element.</para>
            <para>Skip down now to the <markup><![CDATA[<double-entendre>]]></markup> template:
                <programlisting>  &lt;xsl:template match=&quot;double-entendre&quot;&gt;
	&lt;h3&gt;Double-entendre:&lt;/h3&gt;
	&lt;p&gt;The phrase 
	  &lt;xsl:value-of select=&quot;phrase&quot; /&gt;
	  can mean:&lt;/p&gt;
	&lt;ol&gt;
	  &lt;xsl:for-each select=&quot;meaning&quot;&gt;
		&lt;li&gt;&lt;xsl:value-of /&gt;&lt;/li&gt;
	  &lt;xsl:for-each /&gt;
	&lt;/ol&gt;
  &lt;/xsl:template&gt;
 </programlisting>
                This again outputs some HTML, but then outputs the entire contents of the
                    <markup><![CDATA[<phrase>]]></markup> tag using a
                    <markup><![CDATA[<xsl:value-of>]]></markup> statement. Note that this will
                output the whole of the contents of the element specified, so if it contains
                sub-elements, you'll get the values of all the #PCDATA elements with no surrounding
                tags splurged into the output. </para>
            <para>You can also see an <markup><![CDATA[<xsl:for-each>]]></markup> statement which
                processes each of the <markup><![CDATA[<meaning>]]></markup> tags in the structure.
                We could, in fact, create a new template for <markup><![CDATA[<meaning>]]></markup>
                and do an <markup><![CDATA[<xsl:apply-templates>]]></markup>, which would work just
                as well here. However, sometimes we need to keep count of how many meanings we've
                processed (say we wanted to print the total number of meanings at the end), and
                using the <markup><![CDATA[<xsl:for-each>]]></markup> will allow us to do
                this.</para>
            <para>The <markup><![CDATA[<xsl:value-of />]]></markup> without a select attribute just
                means 'output the value of the current node', in this case, the whole of the
                currently processed <markup><![CDATA[<meaning>]]></markup> tag.</para>
            <para>There are a number of other processing elements which can be used, together with a
                way of defining variables for temporarily storing values, which we'll look at in
                detail next time.</para>
            <important>
                <title>Hints and Gotchas</title>
                <para>As the stylesheet must be well-formed XML, we have to use XHTML-style closing
                    tags to all our HTML markup.</para>
                <para>Though it's common to do so, it's not necessary to create a template for
                        <emphasis>every</emphasis> element you've defined in your DTD. </para>
                <para> If a template can't be found for a node in the document tree, then the
                    default action is to apply-templates to its contents, or to output the contents
                    if it's a #PCDATA node. </para>
                <para>Order of definitions isn't important, except for your own sanity!</para>
            </important>
        </sect1>
        <sect1>
            <title>Tracking the current node</title>
            <para>This illustrates an important concept in XSLT processing. The 'current node' is
                the element that we're working on at the moment, the context of the statement. The
                expressions use in referring to nodes (XPath expressions), like URLs in an
                    <markup>&lt;a&gt;</markup> tag in HTML, can be relative or absolute, though the
                syntax differs slightly for XSLT. </para>
            <para>As you might expect, you can begin an XPath expression with a <markup>/</markup>
                to start at the top of the tree, e.g. <markup>/jokebook/joke/simplejoke</markup>. </para>
            <para>A '//' usually means 'all of...' in some way, so: <itemizedlist>
                    <listitem>
                        <para> "//" means "all descendents of the root node"</para>
                    </listitem>
                    <listitem>
                        <para> ".//" means "all descendents of the current node"</para>
                    </listitem>
                    <listitem>
                        <para>"//name" means "any descendents called 'name' in the whole
                            document"</para>
                    </listitem>
                    <listitem>
                        <para>"path//name" means "any descendents of 'path' called 'name'"</para>
                    </listitem>
                </itemizedlist></para>
            <para>The @ symbol can be used to refer to an attribute of the current node, so @author
                is the value of the author attribute of the current node.</para>
            <para>These expressions used in referring to parts of the document are part of a
                standard called 'XPath', which we'll cover in more detail later.</para>
        </sect1>
    </chapter>
    <chapter xml:id="ch-xpath">
        <title>Further XSLT and XPath</title>
        <para>This lecture will look at some of the ways of generating loops, repetitions and
            choices within your stylesheets. Remember that, though these look similar to the control
            structures present in procedural programming languages, they are rather more limited in
            the way they behave. We'll also look at other XSLT elements that control such things as
            sorting, importing and the characteristics of the output.</para>
        <para>Finally, we'll revisit XPath, looking at some of the tests, expressions and functions
            that can be used in conjunction with the above.</para>
        <sect1>
            <title>Control structures</title>
            <para>There are a number of simple ways to make choices and perform repetitions within
                the XSLT language. </para>
            <sect2>
                <title>The &lt;xsl:if&gt; element</title>
                <para>
                    <computeroutput>&lt;xsl:if test="<parameter>condition</parameter>"&gt;
                            <code>...</code>&lt;/xsl:if&gt;</computeroutput>
                </para>
                <para>This element provides a simple <code>if ... then</code> control, which applies
                    the elements it contains <emphasis>if and only if</emphasis> the condition is
                    true. <footnote>
                        <para>Programmers should note that there's no <code>else</code> currently
                            defined, so we'd need to use an <code><![CDATA[<xsl:choose>]]></code>
                            element instead.</para>
                    </footnote><programlisting><![CDATA[
  <xsl:if test="@cert='18'">
	<xsl:text>Warning: 
          the following joke may be offensive</xsl:text>
  </xsl:if>
]]>
</programlisting></para>
            </sect2>
            <sect2>
                <title>
                    <code><![CDATA[<xsl:choose>]]></code>
                </title>
                <para>The choose element makes a choice between one of several alternatives. A test
                    is given for each defined choice, and there is a <quote>catch-all</quote>
                    <code><![CDATA[<otherwise>]]></code> element which occurs if no other choice is
                    selected. Equivalent to a <code>switch</code> or <code>case</code> structure in
                    procedural languages. <programlisting><![CDATA[
  <xsl:choose>
    <xsl:when test="@type='limerick'">
      <xsl:text>A Limerick:</xsl:text>
    </xsl:when>
        ...
    <xsl:otherwise>
      <xsl:text>A generic joke:</xsl:text>
    </xsl:otherwise>
  </xsl:choose>
]]>
</programlisting></para>
            </sect2>
            <sect2>
                <title>
                    <code><![CDATA[<xsl:for-each>]]></code>
                </title>
                <para>Processes each of the child nodes in turn. <programlisting><![CDATA[
  <xsl:for-each select="jokes">
    <xsl:value-of select="position()"/>
    <xsl:text>. </xsl:text>
    <xsl:apply-templates />
  </xsl:for-each>
]]>
</programlisting></para>
            </sect2>
            <sect2>
                <title>
                    <code><![CDATA[<xsl:sort>]]></code>
                </title>
                <para>Sort is used after a <code><![CDATA[<xsl:for-each>]]></code> or a
                        <code><![CDATA[<xsl:apply-templates>]]></code> element to sort the child
                    nodes that is selected by that element. <programlisting><![CDATA[
  <xsl:for-each select="/jokebook/jokes">
    <xsl:sort select="joke/@type"  data-type="text"/>
      ...
  </xsl:for-each>
]]>
</programlisting></para>
            </sect2>
        </sect1>
        <sect1>
            <title>Creating new elements - <code><![CDATA[<xsl:element>]]></code> and
                    <code><![CDATA[<xsl:attribute>]]></code></title>
            <para>The Element and Attribute commands are used to output specific elements and
                attributes to the result document, and can be used to construct HTML or other tags
                using parts of the XML source document. So, for example, where a cartoon has a
                filename encoded for it and you wish to display this with an
                    <code><![CDATA[<img>]]></code> tag in HTML, you could use: <programlisting><![CDATA[
  <img>
    <xsl:attribute name="src">
      <xsl:value-of select="cartoon/filename" />
    </xsl:attribute> 
  </img>
]]>
</programlisting></para>
        </sect1>
        <sect1>
            <title>Storing values - <code><![CDATA[<xsl:variable>]]></code> and
                    <code><![CDATA[<xsl:param>]]></code></title>
            <para>Creates a variable. Note that variables in XSLT differ fundamentally from
                variables in procedural languages; they have more in common with constants or
                finals. For example, the variable with the most local scope overrides any others
                with the same name. So a variable created in the root element, and thus having
                'global' scope, is overridden by any variables defined in child nodes with the same
                name. <programlisting><![CDATA[
  <xsl:variable name="myFavourite" value="limerick"/>
      ...
  <xsl:if select="$myFavourite='limerick'">
    <xsl:apply-templates/>
  </xsl:if>
				]]></programlisting></para>
            <para>Parameters can be used in a similar way, but are effectively
                    <quote>default</quote> values, which can be overridden when the stylesheet is
                applied. </para>
            <sidebar>
                <note>
                    <title>To see parameters in action...</title>
                    <para>...try editing the DocBook parameters to change the type of output that's
                        produced. You can change parameters of a particular stylesheet in
                            <code><![CDATA[<oXygen/>]]></code> by editing the transformation
                        scenario for the stylesheet. </para>
                </note>
            </sidebar>
            <para>Xml.com has a useful <link
                    xlink:href="http://www.xml.com/pub/a/2001/02/07/trxml9.html">overview</link> of
                the differences between variables and parameters, and examples of how to use them.
                There are also excellent articles on this site which cover most areas of XML
                usage.</para>
        </sect1>
        <sect1>
            <title>More XPath: Tests and operators</title>
            <para>The basic test operators should be fairly familiar to you, they are similar to
                most programming languages. So: <programlisting><![CDATA[
  =    !=    >    >=    <    <=    ]]>
			</programlisting> all take the usual (and hopefully
                obvious) meanings. </para>
            <para>There's a slight difference in the basic arithmetic operators:
                <programlisting>
<![CDATA[  +    -    *    div    mod]]>
			</programlisting> in
                that the divided-by operator must be spelt out - the / symbol has far too much
                significance in other contexts to be reused. </para>
        </sect1>
        <sect1>
            <title>Some Useful XPath Functions</title>
            <para>There are also some useful pre-defined functions, the best of which are discussed
                below.</para>
            <formalpara>
                <title>
                    <code>position()</code>
                </title>
                <para>Returns the current node's position in relation to its siblings, so the fourth
                        <code><![CDATA[<joke>]]></code> in the <code><![CDATA[<jokebook>]]></code>
                    will have <code>position()=4</code></para>
            </formalpara>
            <formalpara>
                <title>
                    <code>last()</code>
                </title>
                <para>Returns the number of nodes in the current context (i.e. the position() of the
                    last sibling). <programlisting><![CDATA[
  <xsl:choose>
    <xsl:when test="position()=last()">
      <li>And finally, 
        <xsl:value-of select="."/></li>
    </xsl:when>
    <xsl:otherwise>
      <li><xsl:value-of select="."/></li>
    </xsl:otherwise>
  </xsl:choose>
]]>
</programlisting></para>
            </formalpara>
            <formalpara>
                <title>
                    <code>sum(expr), count(expr), ceiling(expr), floor(expr), round(expr)</code>
                </title>
                <para>Various mathematical functions - sum and count should be obvious; floor,
                    ceiling and round are used to convert floating-point decimal numbers to integers
                    (rounding up, down and mathematically)</para>
            </formalpara>
            <formalpara>
                <title>
                    <code>substring(expr,start,len)</code>
                </title>
                <para>Returns a substring of the string <emphasis>expr</emphasis>, starting at
                    character number <emphasis>start</emphasis>, with length of
                        <emphasis>len</emphasis> characters.</para>
            </formalpara>
            <para>This has (I hope) covered the most useful XSLT and XPath features, and should
                certainly be sufficient to perform most basic tasks. I'll be adding a few more
                features I feel would be useful to you over the next few weeks, so keep an eye on
                this lecture's notes. And don't forget, there are many more functions defined, which
                you can glean from the XML/XSLT/XPath standards documents.</para>
        </sect1>
    </chapter>
    <chapter xml:id="ch-xslt-problems">
        <title>Using XSLT 2.0</title>
        <para>XSLT 2.0 introduces a number of new and more powerful features to your stylesheets.
            The specification also adds an updated XPath 2.0, which introduces more complex queries
            and some procedural functions to the language.</para>
        <para>The first change that is needed is to switch to an XSLT2-aware processor. In Oxygen
            11, the default processor is Saxon 6.5.5, which will not parse XSLT2 stylesheets, so you
            should change it to SaxonPE or SaxonB. There are other options available, and often the
            choice of processor is dependent on the server technologies you're using - if you're
            developing an application for an MSWindows server, then a .NET-based processor may be
            needed.</para>
        <para/>
        <sect1>
            <info>
                <title>Namespaces</title>
            </info>
            <para>XSLT2 and XPath2 are both namespace aware, and expect namespaces to be properly
                declared and prefixed throughout. Although it's possible to leave the default
                namespace undefined, it's usual to add it to make the output format clear.</para>
            <para>As an example, here's the header for a stylesheet that converts TEI markup into
                XHTML:</para>
            <example xml:id="ex-xslt2-header">
                <info>
                    <title>A namespace-aware XSLT2 stylesheet header</title>
                </info>
                <programlisting>
                    <![CDATA[                    
<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    xmlns:tei="http://www.tei-c.org/ns/1.0"
    xmlns:cite="http://citations.ex.ac.uk/ns/" 
    xmlns="http://www.w3.org/1999/xhtml"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0"
    exclude-result-prefixes="xs cite tei" version="2.0">
                    ]]>
                </programlisting>
            </example>
            <para>In this example, we're effectively declaring our <emphasis>input
                    language</emphasis> as the TEI namespace - the
                    <command>xpath-default-namespace</command> implies that most of our xpath
                statements will be referring to TEI elements. If this isn't included, you'll need to
                prefix all TEI elements within xpaths to identify them.</para>
            <para>Our output language is declared by the default namespace (<command>xmlns</command>
                without suffix).</para>
            <para>We're also excluding some of the intermediate prefixes from output, including one
                that we've defined locally (<command>xmlns:cite</command>). This prevents spurious
                    <command>xmlns</command> attributes from cluttering up the root node of our
                resultant XHTML.</para>
        </sect1>
        <sect1>
            <info>
                <title>Sequences</title>
            </info>
            <para>In XSLT 1.0, most operations or templates resulted in the production of either
                single items of data, or <glossterm linkend="gl-result-tree">result tree
                    fragments</glossterm>, which had to be well formed XML (in particular, with a
                single root element).</para>
            <para>In XSLT 2.0, there is a third possible result, a <glossterm linkend="gl-sequence"
                    >sequence</glossterm>, which is simple an ordered list of items (which may be
                single atomic values, or result tree fragments. This allows simpler sequential
                processing of arbitrary items of data, without having to construct a result tree
                that wraps them in a spurious root node element. </para>
        </sect1>
        <sect1>
            <info>
                <title>XPath enhancements</title>
            </info>
            <para>In general, XPath 2.0 introduces new pre-defined operators and functions that are
                merely more powerful versions of tasks that were possible in the older version. So
                the new <command>tokenize()</command> function performs a task (splitting a string
                into constituent parts) that needed a custom-written stylesheet and significant
                levels of recursion to perform in XSLT1.</para>
            <para>However, some new features take XPath into a significantly more powerful language,
                adding basic flow control into the XPath itself. So we now have XPath expressions
                such as:</para>
            <programlisting>  for $thing in SEQ return (EXPR) </programlisting>
            <para> which will perform the expression EXPR on each of the items in the sequence SEQ,
                and more powerful conditional expressions such as: </para>
            <programlisting>  some $thing in SEQ satisfies (EXPR)</programlisting>
            <para>which will resolve as <command>true</command> if the expression EXPR is true for
                any of the items in the sequence SEQ. </para>
            <para>As you can see, this moves some of the processing overhead into the XPath
                expressions, which allows for more efficient and compact stylesheets.</para>
        </sect1>
        <sect1>
            <info>
                <title>Functions</title>
            </info>
            <para>XSLT2 also introduces the named function, which can be defined to perform tasks
                which are not suited to the template structure. </para>
            <example>
                <info>
                    <title>A basic function to lowercase text and remove (some) accents</title>
                </info>
                <programlisting><![CDATA[
  <xsl:function name="cite:lower-remove-accents">
      <xsl:param name="input"/>
      <xsl:variable 
        name="ac">àáâãçèéêëìíîïùúûü`´̒̕΄</xsl:variable>
      <xsl:variable 
        name="un">aaaaceeeeiiiiuuuu'''''</xsl:variable>
      <xsl:value-of 
        select="translate(lower-case($input),$ac,$un)"/>
  </xsl:function>
  ]]></programlisting>
            </example>
            <para>Here, we can see the parameter passed to the function being processed by a simple
                character substitution. The function is called like this: </para>
            <programlisting><![CDATA[
  <xsl:value-of select="cite:lower-remove-accents($string)"/>
                ]]></programlisting>
            <para>Note also that we're using a defined namespace (<command>cite:</command>) taken
                from the header in <xref linkend="ex-xslt2-header"/>. All functions must be tied to
                a namespace, even if it's a fictitious one such as this!</para>
        </sect1>
        <sect1>
            <info>
                <title>Frequently Asked Questions</title>
            </info>
            <para>The intention of this session is really to clear up any outstanding problems you
                may be having, especially with XSLT. Below are a few questions that I've been asked
                so far; if you have more then email me or ask me in the practical.</para>
            <sect2>
                <title>How do I process multiple XML documents</title>
                <para>If you have a number of source documents that need to be summarised or
                    collated, you can read the documents into the XSL process tree using the
                        <code>document()</code> function. The most common way of doing this is by
                    creating a master document, which details the files to be included in some way,
                    e.g. </para>
                <programlisting>    
  &lt;master&gt;
    &lt;doc filename=&quot;file1.xml&quot; /&gt;
    &lt;doc filename=&quot;file2.xml&quot; /&gt;
  &lt;/master&gt;
		</programlisting>
                <para>Each of these included files contains consistent data to be processed. The XSL
                    transformation can then refer to the data within these like this:</para>
                <programlisting>
  &lt;xsl:template match=&quot;/&quot;&gt;
    &lt;xsl:for-each select=&quot;/master/doc&quot;&gt;
      &lt;xsl:apply-templates select=&quot;document(@filename)&quot; /&gt;
    &lt;/xsl:for-each&gt;
  &lt;/xsl:template&gt;
		</programlisting>
                <para>As you can probably guess, this reads each file in turn, and applies its
                    templates to the elements within the individual documents. Note that the
                    argument to the <code>document()</code> function can be a URL in most
                    implementations of XSLT. This method will also work in XSLT 1.0.</para>
                <para>A more satisfactory solution in XSLT2 stylesheets is to use XInclude. This is
                    a basic linking standard that allows inclusion of trees of XML markup without
                    recourse to special handling within the stylesheet - most XSLT2 aware processors
                    are able to transparently handle XIncludes. There's a good example included at
                    the start of these notes, where a legal notice is included from a separate file: <example>
                        <info>
                            <title>XInclude example</title>
                        </info>
                        <programlisting><![CDATA[
  <xi:include href="../mit0000/legalnotice-by-nc-sa.xml"
              xmlns:xi="http://www.w3.org/2001/XInclude">
      <xi:fallback>
          <para>All rights reserved.</para>
       </xi:fallback>
  </xi:include>]]>
                        </programlisting>
                    </example>
                </para>
            </sect2>
        </sect1>
    </chapter>
    <chapter xml:id="ch-schemas">
        <title>Schemas of various types</title>
        <para/>
        <sect1>
            <title>Why use a schema?</title>
            <para>As XML developed, the one standard that didn't evolve easily into an XML form is
                the DTD. This is perhaps because the DTD is at the heart of any XML application -
                everything else is built around it - and changing the way the document is defined
                would require major rewriting of almost all XML applications. </para>
            <para>However, the bullet has been bitten, and the DTD has evolved (or continues to
                evolve) into the Schema. Schemas are XML-formatted, so are easy to machine-read and
                process, and do everything that the DTD can in specifying the structure and ordering
                of XML documents. But Schemas can go a lot further in the detailed specification of
                documents, right down to the individual data elements.</para>
        </sect1>
        <sect1>
            <title>So which language should I use?</title>
            <para>Confusingly, there are several different schema languages which can be used to
                describe XML document types. The two main contenders are the <quote>official</quote>
                XML Schema Description Language as defined by the W3C, and a language called
                    <firstterm>Relax NG</firstterm>, which has both an XML-based format and a
                briefer, easier to learn and read <quote>compact</quote> format.</para>
            <para>The course textbook goes into great detail on using XSD to create documents, so
                that's the language we'll look at here. However, there is strong support in parts of
                the publishing industry for the RelaxNG form, and it's too close to call which will
                win in the long term.</para>
            <para>The main differences between the two lie in the philosophy of how to describe
                elements. XSD goes for the <quote>describe everything in as much detail as
                    possible</quote> approach, whereas RelaxNG tries to simplify and create more
                readable definitions. Personally, I find RelaxNG easier to sketch out a document in,
                but for complex documents with very specific data to encode, XSD provides a better
                specification.</para>
            <para>As the most complex standard, we'll look briefly here at XML Schema Descriptions,
                but the overall principles are similar. You can find more documentation on either
                language in the O'Reilly series (van der Vlist <link linkend="vanderVlist2002"
                    >2002</link>, <link linkend="vanderVlist2004">2004</link>), available to borrow
                from the CMIT office. </para>
        </sect1>
        <sect1>
            <title>Using the W3C's Schema Description Language</title>
            <para/>
            <sect2>
                <title>XSD: Getting Started</title>
                <para/>
                <example>
                    <title>Starting a new schema</title>
                    <programlisting><![CDATA[
  <?xml version="1.0" ?>
  <xsd:schema xmlns: xsd="http://www.w3.org/2001/XMLSchema">
    ....
  </xsd:schema>

]]></programlisting>
                </example>
                <example>
                    <title>Using a schema to validate XML</title>
                    <programlisting><![CDATA[
  <?xml version="1.0" ?>
  <jokebook 
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:noNamespaceSchemaLocation="jokebook.xsd">
      ....
  </jokebook>
	
]]></programlisting>
                </example>
            </sect2>
            <sect2>
                <title>XSD: Documentation</title>
                <para>Where a schema is to be used widely, it's useful to add annotations to the
                    definitions, to provide clues to the user regarding the structure of the markup
                    and the data it defines. This is particularly helpful when used with a
                    schema-aware editor such as oXygen, which will show these annotations in context
                    as the user creates a document.</para>
                <example>
                    <title>Adding documentation to a schema</title>
                    <programlisting><![CDATA[
  <xsd:annotation>
    <xsd:documentation>
      Documentation text goes here.
    </xsd:documentation>
  </xsd:annotation>

]]></programlisting>
                </example>
                <para>Annotations placed at the top of the schema before any type definitions apply
                    to the schema as a whole; annotations place directly after an element definition
                    will apply to that element only.</para>
                <para>Once annotations are added, <code><![CDATA[<oXygen/>]]></code> should pick
                    them up and display them as tool tips. For further examples, see the XSD version
                    of the Jokebook listed on the examples page on the module website.</para>
            </sect2>
            <sect2>
                <title>XSD: Simple types</title>
                <para>Whereas a DTD would use the <code>#PCDATA</code> to define a generic
                        <quote>leaf node</quote> to contain data, the schema can specify the type of
                    that data very precisely.</para>
                <para>The simplest use of this is with the standard built-in types, as defined in
                    the <link xlink:href="http://www.w3.org/TR/xmlschema-2/">datatypes</link>
                    section of the W3C specification.</para>
                <example>
                    <title>Some simple element definitions in XSD</title>
                    <programlisting><![CDATA[
  <xsd:element name="author" type="xsd:string" />
  <xsd:element name="pubdate" type="date" /> 
  <xsd:element name="isPublished" type="boolean" />
]]></programlisting>
                </example>
            </sect2>
            <sect2>
                <title>XSD: Defining element structure</title>
                <para>If we want to specify nodes or elements which are more complex than just a
                    simple string, number, date or true/false value, the Schema language allows us
                    to do this. For example, to define the actual structure of the document, the
                    hierarchy of nested elements, we need to use the
                        <code><![CDATA[<complexType>]]></code> construction.</para>
                <example>
                    <title>XSD: A simple element container</title>
                    <programlisting><![CDATA[
    <xs:element name="jokebook">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="bookinfo" type="bookinfoType" />
                <xs:element name="joke" maxOccurs="unbounded" type="jokeType"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

]]></programlisting>
                </example>
                <para>This defines a complex element called <markup>jokebook</markup>, which
                    contains a sequence of a <markup>bookinfo</markup> element, followed by one or
                    more <markup>joke</markup> elements. Note that we define the repeated element
                    using a <markup>maxOccurs="unbounded"</markup>, which allows an unlimited number
                    of repetitions. There is also a <markup>minOccurs</markup> attribute, which
                    defaults to a value of <markup>"1"</markup>, so we don't need to define it
                    here.</para>
                <para>We've also chosen to create these <markup>joke</markup> and
                        <markup>bookinfo</markup> elements with user-defined types
                        (<markup>bookinfoType</markup> and <markup>jokeType</markup>), which isn't
                    necessary, but does make the definition more readable (otherwise the element
                    structure would be a very deep XML hierarchy of definitions. Here's one of those
                    types further specified: <example>
                        <info>
                            <title>XSD: A user-defined complex type</title>
                        </info>
                        <programlisting><![CDATA[  <xs:complexType name="bookinfoType">
    <xs:sequence>
      <xs:element name="title" type="xs:string" />
      <xs:element minOccurs="0" name="editor" type="xs:string" />
      <xs:element minOccurs="0" name="pubdate" type="xs:date" />
    </xs:sequence>
  </xs:complexType>
  
]]></programlisting>
                    </example></para>
                <para>This user-defined type uses only simple types in its element definitions, but
                    does make two of these elements optional (the<markup>
                    minOccurs="0"</markup>).</para>
            </sect2>
            <sect2>
                <title>XSD: More powerful specification</title>
                <para>One of the key advantages over the older DTD specifications of an XML
                    application is the ability to define the data items more accurately. This can be
                    as simple as restricting the length of a data item, or constraining it to
                    numeric data only. If the data has a more complex structure, it's possible to
                    define a pattern (or regular expression) that describes the data accurately.
                    Here's an example:</para>
                <para>
                    <example>
                        <info>
                            <title>XSD: Constraining data with a pattern - postcodes</title>
                        </info>
                        <programlisting>  &lt;xs:simpleType name="uk-postcode">
    &lt;xs:restriction base="xs:string">
      &lt;xs:pattern 
        value="[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z-[CIKMOV]]{2}" />
    &lt;/xs:restriction>
  &lt;/xs:simpleType>

</programlisting>
                    </example>
                </para>
            </sect2>
        </sect1>
    </chapter>
    <chapter xml:id="ch-xpointer">
        <title>Making Connections</title>
        <subtitle>XPointer and XLink</subtitle>
        <para>As we have so often seen with HTML, the power of online documents is often due to the
            ability to link them together with hyperlinks. In HTML, we have a simple mechanism, the
                <code>&lt;a&gt;</code> element, which coupled with a URI in an <code>href</code>
            attribute, gives us a simple one-way link to another document or resource. In XML, we
            have two technologies that replicate this function. XLink acts as the
                <code>&lt;a&gt;</code> element, defining where the link appears and how it is used,
            and the XPointer standard gives us a method of accurately pinpointing the link
            destination to a document or a point within a document. </para>
        <sect1>
            <title>Adding simple links</title>
            <para>The simplest type of XLink is very similar to the familiar hyperlink used in
                XHTML, though it's a little more verbose. In DocBook, a link might look like
                this:<programlisting>  &lt;uri xmlns:xlink="http://www.w3.org/1999/xlink"
      xlink:type="simple"
      xlink:href="http://www.distributed.net/"
      >http://www.distributed.net/&lt;/uri>

</programlisting></para>
            <para>Note that we're using a new namespace, <markup>xlink</markup>, which needs to be
                defined previously, in this case, in the <markup>uri</markup> element itself. It's
                more common (especially for documents with many links) to define the namespace in
                the root node for the document:
                <programlisting>  &lt;book xmlns="http://docbook.org/ns/docbook" version="5.0"
    xmlns:xlink="http://www.w3.org/1999/xlink">

</programlisting></para>
            <para>which allows us then to omit the <markup>xmlns:xlink</markup> declaration in each
                link element:
                <programlisting>  &lt;uri xlink:href="http://www.exeter.ac.uk/">University&lt;/uri>

</programlisting></para>
            <para>Here, we've also omitted the <markup>xlink:type="simple"</markup>, which is
                generally assumed if not present.</para>
            <sect2>
                <info>
                    <title>Show and Actuate</title>
                </info>
                <para>In HTML, we had an additional link attribute, <markup>target</markup>, which
                    allowed us to specify how the link should be displayed, so
                        <markup>target="_blank"</markup> would give us a new window, or
                        <markup>target="_parent"</markup> would open in the parent window, replacing
                    whatever was loaded there. </para>
                <para>In XLink, there is a similar attribute, show. This has the following values: <informaltable>
                        <tbody>
                            <tr>
                                <th><markup>embed</markup></th>
                                <td>Load the resource into the context of the linking element. In
                                    the examples above, an <markup>xlink:show="embed"</markup> would
                                    replace the <markup>&lt;uri&gt;</markup> element.</td>
                            </tr>
                            <tr>
                                <th><markup>new</markup></th>
                                <td>Load the resource into a new window, frame or pane.</td>
                            </tr>
                            <tr>
                                <th><markup>replace</markup></th>
                                <td>Load the resource into the current window, replacing the entire
                                    contents (default).</td>
                            </tr>
                            <tr>
                                <th><markup>other</markup></th>
                                <td>Behaviour is defined in other markup or scripts.</td>
                            </tr>
                            <tr>
                                <th><markup>none</markup></th>
                                <td>Behaviour is left for the browser to decide upon.</td>
                            </tr>
                        </tbody>
                    </informaltable></para>
                <para>We can also decide when this action happens, with an <markup>actuate</markup>
                    attribute. This is rather different to the standard HTML link, where a user
                    action (clicking on the link) is the only way to actuate the link. Possible
                    values include: <informaltable>
                        <tbody>
                            <tr>
                                <th><markup>onLoad</markup></th>
                                <td>Link should be actuated when the document is loaded; no user
                                    action is required to initiate the link in this case.</td>
                            </tr>
                            <tr>
                                <th><markup>onRequest</markup></th>
                                <td>Link should be actuated only when a user-initiated event occurs.
                                    This is normally when the user clicks on the link text, but may
                                    be triggered by other events.</td>
                            </tr>
                            <tr>
                                <th><markup>other</markup></th>
                                <td>Behaviour is defined in other markup or scripts.</td>
                            </tr>
                            <tr>
                                <th><markup>none</markup></th>
                                <td>Behaviour is left for the browser to decide upon.</td>
                            </tr>
                        </tbody>
                    </informaltable></para>
                <example>
                    <info>
                        <title>Some XLink/XHTML Equivalences</title>
                    </info>
                    <programlisting><![CDATA[  <a href="http://www.ex.ac.uk/">Exeter</a>

  <a xlink:type="simple" 
     xlink:href="http://www.ex.ac.uk/" 
     xlink:show="replace"
     xlink:actuate="onRequest">Exeter</a>
                        
  <img src="uoe-logo.png" alt="The University Logo" />
  
  <img xlink:type="simple" 
       xlink:href="uoe-logo.png"
       xlink:title="The University Logo"
       xlink:show="embed"
       xlink:actuate="onLoad" />
                        
                        ]]></programlisting>
                </example>
            </sect2>
        </sect1>
        <sect1>
            <title>Extended Links</title>
            <para>The XLink specification provides much greater flexibility in creating links
                between documents than the simple XHTML-style model, though. Its concept of extended
                links gives the ability to link multiple documents and to provide links that are
                externally defined, rather than embedded within a document. In mathematical terms,
                XLink can define a directed graph, which can be internally or externally referenced.
                Here's an example:<example>
                    <info>
                        <title>An image gallery as an extended XLink</title>
                    </info>
                    <programlisting>  &lt;gallery xlink:type="extended" xlink:title="Image Gallery">
    &lt;photo xlink:type="locator" xlink:href="sunset.png" 
           xlink:label="sunset" 
           xlink:title="A sunset over the bay" />
    &lt;photo xlink:type="locator" xlink:href="beach.png" 
           xlink:label="beach-01" 
           xlink:title="A view south over the beach" />
    &lt;photo xlink:type="locator" xlink:href="beachnw.png" 
           xlink:label="beach-02" 
           xlink:title="Looking northwest across the beach" />
  &lt;/gallery>

</programlisting>
                </example></para>
        </sect1>
    </chapter>
    <bibliography>
        <bibliomixed xml:id="Castro2001"/>
        <bibliomixed xml:id="W3C-XML-1.0"/>
        <bibliomixed xml:id="W3C-XLink-1.0"/>
        <bibliomixed xml:id="W3C-XPath-1.0"/>
        <bibliomixed xml:id="W3C-XSLT-1.0"/>
        <bibliomixed xml:id="Tennison2002"/>
        <bibliomixed xml:id="vanderVlist2002"/>
        <bibliomixed xml:id="vanderVlist2004"/>
    </bibliography>
    <glossary>
        <glossentry xml:id="gl-application">
            <glossterm>Application</glossterm>
            <glossdef>
                <para>Specifically, a language defined using XML. More generally, this language and
                    its associated stylesheets, related documentation and any server programs that
                    is uses.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-child">
            <glossterm>Child element</glossterm>
            <glossdef>
                <para>An element enclosed within another element. For example, in an XHTML document,
                    the <markup><![CDATA[<html>]]></markup> element will always have exactly two
                    children: <markup><![CDATA[<head>]]></markup> and
                        <markup><![CDATA[<body>]]></markup>.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-dtd">
            <glossterm>Document Type Definition </glossterm>
            <abbrev>DTD</abbrev>
            <glossdef>
                <para>A language used to describe document structures in <glossterm
                        linkend="gl-sgml">SGML</glossterm>. Before <link linkend="gl-schema">XML
                        Schema</link> were standardised, DTDs were also widely used to specify the
                    structure of XML documents.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-element">
            <glossterm>Element</glossterm>
            <glossdef>
                <para>The XML term for the basic unit of markup, often called a
                        <emphasis>tag</emphasis> in other contexts.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-xml">
            <glossterm>Extensible Markup Language</glossterm>
            <abbrev>XML</abbrev>
            <glossdef>
                <para>A rigorous and exacting syntax for markup languages.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-fpi">
            <glossterm>Formal Public Identifier</glossterm>
            <abbrev>FPI</abbrev>
            <glossdef>
                <para>The unique reference to some kind of XML entity, usually a <glossterm
                        linkend="gl-dtd">DTD</glossterm> or <glossterm linkend="gl-schema"
                        >schema</glossterm>, which is cited in the DOCTYPE declaration. The FPI is
                    normally presented as a URI, which may point to the location of the <glossterm
                        linkend="gl-dtd">DTD</glossterm> on the web, but does not need to.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-instance">
            <glossterm>Instance</glossterm>
            <glossdef>
                <para>For any XML <glossterm linkend="gl-application">application</glossterm>, an
                    instance is any document that conforms to that application's DTD or Schema. So,
                    if you write a valid DocBook document, then that is an instance of the DocBook
                    application.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-metadata">
            <glossterm>Metadata</glossterm>
            <glossdef>
                <para>Data about data. Metadata in XHTML is included as <markup>&lt;meta /></markup>
                    elements, which describe internally such information as the author, copyright,
                    desired search engine behaviour, etc. In defining an application in XML,
                    decisions must be made as to what information is real data, and what is
                    metadata; it's usual to include metadata as attributes rather than individual
                    elements. A common example is where an amount of money must be recorded, where
                    the actual figure is the data, and the currency unit is metadata, e.g.
                        <markup>&lt;value units="GBP">200&lt;/value></markup>.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-parent">
            <glossterm>Parent Element</glossterm>
            <glossdef>
                <para>The element which encloses another element within a markup scheme. For
                    example, the parent of the <markup><![CDATA[<body>]]></markup> element in an
                    XHTML document is always the <markup><![CDATA[<html>]]></markup> element.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-result-tree">
            <glossterm>Result Tree</glossterm>
            <glossdef>
                <para>The resulting tree of XML markup that's produced from a stylesheet in XSLT. In
                    XSLT2, multiple result trees can be produced as a <emphasis role="italic"
                        >sequence</emphasis>, which must be <emphasis>serialised</emphasis> as they
                    are written to the output.  A <emphasis role="italic">result tree
                        fragment</emphasis> can be created by a template, expression or function,
                    which is an intermediate result not usually serialised.</para>
            </glossdef>
        </glossentry>
        <glossentry>
            <glossterm xml:id="gl-sequence">Sequence</glossterm>
            <glossdef>
                <para>A ordered list of result tree fragments or atomic values. Sequences were
                    introduced in XSLT2 to handle and process multiple results from expressions and
                    templates .</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-sgml">
            <glossterm>Standard Generalised Markup Language</glossterm>
            <abbrev>SGML</abbrev>
            <glossdef>
                <para>The predecessor to <glossterm linkend="gl-xml">XML</glossterm>; bigger,
                    bulkier and much harder to understand and manage. SGML evolved into XML through
                    the pruning of many under-used features, and by enforcing more rigorous syntax
                    on its applications. Versions of HTML up to HTML4 were defined using
                    SGML.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-xlink">
            <glossterm>XLink</glossterm>
            <glossdef>
                <para>An extension to XML which allows links between XML-encoded documents to be
                    easily defined.</para>
            </glossdef>
        </glossentry>
        <glossentry>
            <glossterm>XML</glossterm>
            <glosssee otherterm="gl-xml"/>
        </glossentry>
        <glossentry xml:id="gl-xpath">
            <glossterm>XPath</glossterm>
            <glossdef>
                <para>A method of specifying locations or patterns of locations within an XML
                    document.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-xpointer">
            <glossterm>XPointer</glossterm>
            <glossdef>
                <para>An extension to XML which allows internal links within XML documents to be
                    defined easily.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-schema">
            <glossterm>XML Schema</glossterm>
            <glossdef>
                <para>Languages used to specify the structure of XML documents. </para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-xsl-fo">
            <glossterm>XML Stylesheet Language Formatting Objects</glossterm>
            <abbrev>XSL-FO</abbrev>
            <glossdef>
                <para>A language used to write templates to convert XML-compliant documents to
                    presentational (usually print-based) media.</para>
            </glossdef>
        </glossentry>
        <glossentry xml:id="gl-xslt">
            <glossterm>XML Stylesheet Language Transformations</glossterm>
            <abbrev>XSLT</abbrev>
            <glossdef>
                <para>A language used to write transformation templates which convert between
                    XML-compliant markup languages.</para>
            </glossdef>
        </glossentry>
    </glossary>
    <index/>
</book>
