- Markup Languages
- What Does XML Look Like?
- What Does XML Look Like in a Browser?
- What's So Great About XML?
- Well-Formed XML Documents
- Valid XML Documents
- Parsing XML Yourself
- XML Resources
- XML Editors
- XML Browsers
- XML Parsers
- XML Validators
- CSS and XSL
- XLinks and XPointers
- URLs Versus URIs
- ASCII, Unicode, and the Universal Character System
- XML Applications
XML Applications
We've seen a lot of theory in this chapter, so I'm going to spend the rest of this chapter taking a look at how XML is used today in the real world. The world of XML is huge these days; in fact, XML is now used internally even in Netscape and Microsoft products, as well as installations of programming languages such as Perl. You can find a good list of organizations that produce their own XML-based languages at http://www.xml.org/xml/marketplace_company.jsp.
It's useful and encouraging to see how XML is being used today in these XML-based languages. Here's a new piece of terminology: As you know, XML is a metamarkup language, so it's actually used to create languages. The languages so created are applications of XML; as a result, they're called XML applications.
Note that the term XML application means an application of XML to a specific domain, such as MathML, the mathematics markup language; it does not refer to a program that uses XML (a fact that causes a lot of confusion among people who know nothing about XML).
Thousands of XML applications are around today, and we'll see some of them here. You can see the advantage to various groups when defining their own markup languages. For example, physicists or chemists can use the symbols and graphics of their discipline in customized browsers. In fact, I'll start with Chemical Markup Language (CML) now.
XML at Work: Chemical Markup Language
Peter Murray-Rust developed CML as a very early XML application, so it has been around quite a while. Many people think of CML as a sort of HTML+Molecules, and that's not a bad characterization. Using CML, you can display the structure of complex molecules.
With CML, chemists can create and publish molecule specifications for easy interchange. Note that the real value of this is not so much in looking at individual chemicals as it is in being able to search CML repositories for molecules matching specific characteristics.
I've already mentioned a famous CML browser available: Jumbo, which you can download for free from http://www.xml-cml.org/jumbo.html. Jumbo is not only for handling CML; you can also use it to display the structure of an XML document in general. However, there's no question that the novelty of Jumbo is that it can use CML to create graphical representations of molecules.
We've already seen an example in Jumbo in Figure 1-10, where Jumbo is displaying the molecule thiophenol. Here is the file thiophenol.xml that it's reading to display that molecule:
<?jumbo:namespace ns="http://www.xml-cml.org" prefix="C" java="jumbo.cmlxml.*Node" ?> <C:molecule id="thiophenol"> <C:atomArray builtin="elsym"> C C C C C C C S C C O O </C:atomArray> <C:atomArray builtin="x2" type="float"> 0 0.866 0.866 0 -0.866 -0.866 0.0 0.0 1.732 -1.732 1.732 -1.732 </C:atomArray> <C:atomArray builtin="y2" type="float"> 1 0.5 -0.5 -1.0 -0.5 0.5 -2.0 2.0 1.0 1.0 2.0 2.0 </C:atomArray> <C:bondArray builtin="atid1"> 1 2 3 4 5 6 1 4 2 9 6 10 </C:bondArray> <C:bondArray builtin="atid2"> 2 3 4 5 6 1 8 7 9 11 10 12 </C:bondArray> <C:bondArray builtin="order" type="integer"> 4 4 4 4 4 4 1 1 1 2 1 2 </C:bondArray> </C:molecule>
XML at Work: Mathematical Markup Language
Mathematical Markup Language was designed to fill a significant gap in Web documents: equations. In fact, Tim Berners-Lee first developed the World Wide Web at CERN so that high-energy physicists could exchange papers and documents. However, there has been no way to display true equations in Web browsers for nearly a decade.
Mathematical Markup Language (MathML) fixes that. MathML is itself a W3C specification, and you can find it at http://www.w3.org/Math/. Using MathML, you can display equations and all kinds of mathematical terms. (It's not powerful enough for many specialized areas of the sciences or mathematics yet, but it's growing all the time.)
Because of the limited audience for this kind of presentation, no major browser yet supports MathML. However, the Amaya browser, which is W3C's own testbed browser for testing new HTML and XHTML elements (but it's not, unfortunately, an XML browser) has some limited support. You can download Amaya for free from http://www.w3.org/Amaya/.
Here's a MathML document that displays the equation 3Z2 6Z + 12 = 0 (this document uses an XML namespace, which we'll see more about in the next chapter):
<?xml version="1.0"?> <html xmlns:m="http://www.w3.org/TR/REC-MathML/"> <math> <m:mrow> <m:mrow> <m:mn>3</m:mn> <m:mo>⁢</m:mo> <m:msup> <m:mi>Z</m:mi> <m:mn>2</m:mn> </m:msup> <m:mo>-</m:mo> <m:mrow> <m:mn>6</m:mn> <m:mo>⁢</m:mo> <m:mi>Z</m:mi> </m:mrow> <m:mo>+</m:mo> <m:mn>12</m:mn> </m:mrow> <m:mo>=</m:mo> <m:mn>0</m:mn> </m:mrow> </math>
You can see the results of this document in the Amaya browser in Figure 1-13.
Figure 1-13 Displaying MathML in the Amaya browser.
XML at Work: Synchronized Multimedia Integration Language
Synchronized Multimedia Integration Language (SMIL, pronounced "smile") has been around for quite some time. It's a W3C standard that you can find more about at http://www.w3.org/AudioVideo/#SMIL.
SMIL attempts to fix a problem with modern "multimedia" browsers. Usually, such browsers can handle only one aspect of multimedia at a timevideo, audio, or imagesand never more than that. SMIL lets you create television-like fast cuts and true multimedia presentations by letting you specify when various multimedia files are played.
The idea is that SMIL lets you specify what multimedia files are played when; SMIL itself does not describe or encapsulate any multimedia itself.
Microsoft, Macromedia, and Compaq have a semicompeting specification, HTML+TIME, which I'll take a look at next. Microsoft hasn't implemented much SMIL in Internet Explorer yet because of this reason. You can find a SMIL applet written in Java at http://www.empirenet.com/~joseram, as well as some stunning examples of symphonies coordinated with images.
Here's an example SMIL document that creates a multimedia sequence playing mozart1.wav and amadeus1.mov, displaying mozart1.htm, then playing mozart2.wav and amadeus2.mov, and displaying mozart2.htm:
<?xml version="1.0"?> <!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 1.0//EN" "http://www.w3.org/TR/REC-smil/SMIL10.dtd"> <smil> <body> <seq id="mozart"> <audio src="mozart1.wav"/> <video src="amadeus1.mov"/> <text src="mozart1.htm"/> <audio src="mozart2.wav"/> <video src="amadeus2.mov"/> <text src="mozart2.htm"/> </seq> </body> </smil>
XML at Work: HTML+TIME
Microsoft, Macromedia, and Compaq have a multimedia alternative to SMIL called Timed Interactive Multimedia Extension (referred to as HTML+TIME), which is an XML application. Whereas SMIL documents let you manipulate other files, HTML+TIME lets you handle both HTML and multimedia presentations in the same page.
HTML+TIME is not nearly as powerful as SMIL, but Microsoft has shown relatively little interest in SMIL. You can find out about HTML+TIME at msdn.microsoft.com/workshop/Author/behaviors/time.asp. HTML+ TIME is implemented in the Internet Explorer as a behavior, which is a construct in Internet Explorer that lets you separate code from data. You can find more information about Internet Explorer behaviors at msdn.microsoft.com/workshop/c-frame.htm#/workshop/author/default.asp.
Here's an example HTML+TIME document that displays the words Hello, there, from, HTML+TIME, spacing the words' appearance apart by 2 seconds and then repeating:
Listing ch01_10.html
<HTML> <HEAD> <TITLE> Using HTML+TIME </TITLE> <STYLE> .time {behavior: url(#default#time);} </STYLE> </HEAD> <BODY> <DIV CLASS="time" t:REPEAT="5" t:DUR="10" t:TIMELINE="par"> <DIV CLASS="time" t:BEGIN="0" t:DUR="10">Hello</DIV> <DIV CLASS="time" t:BEGIN="2" t:DUR="10">there</DIV> <DIV CLASS="time" t:BEGIN="4" t:DUR="10">from</DIV> <DIV CLASS="time" t:BEGIN="6" t:DUR="10">HTML+TIME.</DIV> </DIV> </BODY> </HTML>
You can see the results of this HTML+TIME document in Figure 1-14.
Figure 1-14 An HTML+TIME document at work.
HTML+TIME actually builds on SMIL to a great extent. The example from the previous topic on SMIL would look this way in HTML+TIME:
<t:seq id="mozart"> <t:audio src="mozart1.wav"/> <t:video src="amadeus1.mov"/> <t:textstream src="mozart1.htm"/> <t:audio src="mozart2.wav"/> <t:video src="amadeus2.mov"/> <t:textstream src="mozart2.htm"/> </seq>
XML at Work: XHTML
One of the biggest XML applications around today is XHTML, the translation of HTML 4.0 into XML by W3C. It's attracting a lot of attention. I'll dig into XHTML in some depth in this book.
W3C introduced XHTML to bridge the gap between HTML and XML, and to introduce more people to XML. XHTML is simply an application that mimics HTML 4.0 in such a way that you can display the resultstrue XML documentsin current Web browsers. XHTML is an exciting development in the XML world, and we'll be spending some time with it later in this book.
Here are some XHTML resources online:
http://www.w3.org/MarkUp/Activity.htmlThe W3C Hypertext Markup activity page, which has an overview of XHTML
http://www.w3.org/TR/xhtml1/The XHTML 1.0 specification (in more common use than XHTML 1.1 today)
http://www.w3.org/TR/xhtml11/The XHTML 1.1 working draft of the XHTML 1.1 module-based specification
XHTML 1.0 comes in three different versions: transitional, frameset, and strict. The transitional version is the most popular because it supports HTML more or less as it's used today. The frameset version supports XHTML documents that display frames (this version is different than the transitional version because documents in the transitional version are based on the <body> element, whereas documents that use frames are based on the <frameset> element). The strict version omits all the HTML elements deprecated in HTML 4.0 (of which there were quite a few).
XHTML 1.1 is a form of the XHTML 1.0 strict version made a little more strict by omitting support for some elements and adding support for a few more (such as <ruby> for annotated text). You can find a list of the differences between XHTML 1.0 and XHTML 1.1 at http://www.w3.org/TR/xhtml11/changes.html#a_changes.
Here's an example XHTML document using the XHTML 1.0 transitional DTD. You can display this document in any standard HTML browser (note that tag names are all in lowercase in XHTML):
Listing ch01_11.html
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title> Web page number one! </title> </head> <body> <h1> Welcome to XHTML! </h1> <center> This is simple text that appears in this page. <p> Here's a new paragraph! </p> </center> </body> </html>
You can see the results of this XHTML in Figure 1-15. Writing XHTML is a lot like HTML, except that you have to adhere to XML syntax (such as making sure that every element has a closing tag).
Figure 1-15 Displaying XHTML.
XML at Work: Microsoft's .NET
Microsoft's .NET initiative is based substantially on XML, which it uses to send data back and forth between .NET components. In .NET, you don't usually see the XMLit's handled behind the scenes automaticallybut it's there.
Here's an example in VB .NET that will expose the behind-the-scenes XML: The data in .NET datasets is transported using XML, and this example explicitly writes the authors database table of the pubs example database to an XML file when the user clicks a button. When the user clicks another button, the code reads that file back into a second .NET dataset. You can see this example at work in Figure 1-16 (which also displays the data in the authors table).
Figure 1-16 Writing data in XML in VB .NET.
Here's the VB .NET codewhen the user clicks the "Write existing dataset to XML file" button you see in Figure 1-16, the authors table in the dataset is written to the file dataset.xml; and when the user clicks the "Create new dataset from XML file" button, a new dataset is created and reads its data in from dataset.xml:
Private Sub Button1_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles Button1.Click DataSet11.Clear() OleDbDataAdapter1.Fill(DataSet11) DataSet11.WriteXml("dataset.xml") End Sub Private Sub Button2_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles Button2.Click Dim ds As New DataSet() ds.ReadXml("dataset.xml") DataGrid1.SetDataBinding(ds, "authors") End Sub
You can see the dataset's data in the dataset.xml file, which looks like thisit's pure XML (and matches the data you see in Figure 1-16):
<?xml version="1.0" standalone="yes"?> <DataSet1 xmlns="http://www.tempuri.org/DataSet1.xsd"> <authors> <au_id>172-32-1176</au_id> <au_lname>White</au_lname> <au_fname>Johnson</au_fname> <phone>408 496-7223</phone> <address>10932 Bigge Rd.</address> <city>Menlo Park</city> <state>CA</state> <zip>94025</zip> <contract>true</contract> </authors> <authors> <au_id>213-46-8915</au_id> <au_lname>Green</au_lname> <au_fname>Marjorie</au_fname> <phone>415 986-7020</phone> . . .
And that provides us with a glimpse at the actual XML used behind the scenes to transport data in .NETsomething that's usually handled automatically.
XML at Work: Open Software Description
Open Software Description (OSD) was developed by Marimba and Microsoft, and you can find more about this XML application at http://www.w3.org/TR/NOTE-OSD.html. OSD allows you to specify how and when software is updated via the Internet.
Not everyone thinks OSD is a great ideaafter all, many users want control over when their software is updated. New versions might have incompatibilities with old versions, for example.
Here's an example .osd file that handles updates for a word processor named SuperDuperTextPro from SuperDuperSoft:
<?xml version="1.0"?> <CHANNEL HREF="http://www.superdupersoft.com/updates.html"> <TITLE> SuperDuperTextPro Updates </TITLE> <USAGE VALUE="SoftwareUpdate"/> <SOFTPKG HREF="http://updates.superdupersoft.com/updates.html" NAME="{34567A7E-8BE7-99C0-8746-0034829873A3}" VERSION="2,4,6"> <TITLE> SuperDuperTextPro </TITLE> <ABSTRACT> SuperDuperTextPro version 206 with sideburns!!! </ABSTRACT> <IMPLEMENTATION> <CODEBASE HREF= "http://www.superdupersoft.com/new.exe"/> </IMPLEMENTATION> </SOFTPKG> </CHANNEL>
XML at Work: Scalable Vector Graphics
Scalable Vector Graphics (SVG) is another W3C-based XML application that is a good idea but that has found only limited implementation so far (notably, in such programs as CorelDraw and various Adobe products such as Adobe Illustrator). Using SVG, you can draw two-dimensional graphics using markup. You can find the SVG specification at http://www.w3.org/TR/SVG/ and an overview at http://www.w3.org/Graphics/SVG/Overview.htm8.
Note that because SVG describes graphics, not text, it's harder for current browsers to implement, and are no browsers today have full SVG implementations. Other graphics standards have been proposed, such as the Precision Graphics Markup Language (PGML) proposed to the W3C (http://www.w3.org/TR/1998/NOTE-PGML) by IBM, Adobe, Netscape, and Sun.
Here's an example PGML document that draws a blue box:
<?xml version="1.0"?> <!DOCTYPE pgml SYSTEM "/DTDs/pgml.dtd"> <pgml> <group fillcolor="blue"> <path> <moveto x="0" y="0"/> <lineto x="0" y="1000"/> <lineto x="1000" y="1000"/> <lineto x="1000" y="0"/> <closepath/> </path> </group> </pgml>
XML at Work: Vector Markup Language
Vector Markup Language (VML) is an alternative to SVG that is implemented in Microsoft Internet Explorer. You can find out more about VML at http://www.w3.org/TR/NOTE-VML. Using VML, you can draw many vector-based graphics figures; here's an example that draws a yellow oval, a blue box, and a red squiggle:
Listing ch01_12.html
<HTML xmlns:v="urn:schemas-microsoft-com:vml"> <HEAD> <TITLE> Using Vector Markup Language </TITLE> <STYLE> v\:* {behavior: url(#default#VML);} </STYLE> </HEAD> <BODY> <CENTER> <H1> Using Vector Markup Language </H1> </CENTER> <P> <v:oval STYLE='width:100pt; height:75pt' fillcolor="yellow"> </v:oval> <P> <v:rect STYLE='width:100pt; height:75pt' fillcolor="blue" strokecolor="red" STROKEWEIGHT="2pt"/> <P> <v:polyline POINTS="20pt,55pt,100pt,-10pt,180pt,65pt,260pt,25pt" strokecolor="red" STROKEWEIGHT="2pt"/> </BODY> </HTML>
You can see the results of this VML in Figure 1-17.
Figure 1-17 Vector Markup Language at work.
Extensible Business Reporting Language
Extensible Business Reporting Language (XBRL, formerly named XFRML) is an open specification that uses XML to describe financial statements. You can find more on XBRL at http://www.xbrl.org/. Using XBRL, you can codify business financial statements in a way that makes it easy to search them en masse and review them quickly, extracting the information you want.
Here's a sample XBRL document that gives you an idea of what this application looks like at work:
<?xml version="1.0" encoding="utf-8" ?> <group xmlns="http://www.xbrl.org/us/aicpa-us-gaap" xmlns:gpsi="http://www.xbrl.org/TaxonomyCustom.xsd" id="543-AB" entity="NASDAQ:GPSI" period="1999-05-31" schemaLocation="http://www.xbrl.org/TaxonomyCustom.xsd" scaleFactor="6" precision="9" type="USGAAP:Financial" unit="ISO4217:USD" decimalPattern="" formatName=""> <item id="IS-025" type="operatingExpenses.researchExpense" period="P1Y/1999-05-31">20427</item> <item id="IS-026" type="operatingExpenses.researchExpense" period="P1Y/1998-05-31">12586</item> </group> <group type="gpsi:detail.quarterly" period="1998-05-31"> <item period="1997-06-01/1998-07-31">0.12</item> <item period="1997-09-01/1997-11-30">0.16</item> <item period="1997-12-01/1998-02-28">0.17</item> <item period="1998-03-01/1998-05-31">-0.12</item> <item period="1998-06-01/1998-05-31">0.33</item> </group> <group type="gpsi:detail.quarterly" period="1999-05-31"> <item period="1998-06-01/1998-08-31">0.15</item> <item period="1998-09-01/1998-11-30">0.20</item> <item period="1998-12-01/1999-02-28">0.23</item> <item period="1999-03-01/1999-05-31">0.28</item> <item period="1998-06-01/1999-05-31">0.86</item> </group> <group type="gpsi:detail.quarterly" period="1998-05-31"> <item period="1997-06-01/1998-07-31">0.11</item> <item period="1997-09-01/1997-11-30">0.15</item> <item period="1997-12-01/1998-02-28">0.17</item> <item period="1998-03-01/1998-05-31">-0.12</item> <item period="1998-06-01/1998-05-31">0.32</item>
</group>
Resource Description Framework
Resource Description Framework (RDF) is an XML application that specializes in metadatathat is, data about other data. You use RDF to specify information about other resources, such as Web pages, movies, automobiles, or practically anything. You can find more information about RDF at http://www.w3.org/RDF/; I'll be discussing it later in the book as well.
Using RDF, you create vocabularies that describe resources. For example, the Dublin Core is an RDF vocabulary that handles metadata for Web pages; you can find more information about it at http://dublincore.org/. Using the Dublin Core, you can specify a great deal of information about Web pages. The Dublin Core is designed ultimately to replace the unsystematic use of <META> tags in today's pages. When systemized, that information will be much more tractable to Web search engines.
Here's an example RDF page using the Dublin Core that gives information about a Web page:
<RDF:RDF xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:DC="http://purl.org/DC/"> <RDF:Description about="http://www.starpowder.com/xml"> <DC:Format>HTML</DC:Format> <DC:Language>en</DC:Language> <DC:Date>2002-02-02</DC:date> <DC:Type>tutorial</DC:Type> <DC:Title>Welcome to XML!</DC:Title> </RDF:Description> </RDF:RDF>
Note that many more XML applications exist than can be covered in one chapterand plenty of them work behind the scenes. As mentioned earlier, Microsoft's .NET initiative uses XML extensively internally. Microsoft Office 2000 and Office XP can handle HTML as well as other types of documents, but HTML doesn't allow it to store everything it needs in a document; thus, it also includes some XML behind the scenes (in fact, Office 2000 and XP's vector graphics are done using VML). Even relatively early versions of Netscape Navigator allowed you to look for sites much like the current one you're viewing; to do that, it connected to a program that uses XML internally. As you can see, XML is everywhere you look on the Internet.
And that's it for our overview chapter. We've gotten a solid foundation in XML here, and it's a good place to begin. The next step is to get systematic and to start getting all the actual ground rules for creating XML documents under our belts. I'll turn to that in Chapter 2.