An Introduction to XML Basics
- Markup Languages
- What Does XML Look Like?
- What Does XML Look Like in a Browser?
- What's So Great About XML?
- Well-Formed XML Documents
- Valid XML Documents
- Parsing XML Yourself
- XML Resources
- XML Editors
- XML Browsers
- XML Parsers
- XML Validators
- CSS and XSL
- XLinks and XPointers
- URLs Versus URIs
- ASCII, Unicode, and the Universal Character System
- XML Applications
Welcome to the world of Extensible Markup Language, XML. This book is your guided tour to that world, so have no worriesyou've come to the right place. That world is large and expanding in unpredictable ways every minute, but we're going to become familiar with the lay of the land in detail here. And there's a lot of territory to cover because XML is getting into the most amazing places, and in the most amazing ways, these days.
XML is a language defined by the World Wide Web Consortium (W3C, http://www.w3c.org), the body that sets the standards for the Web, and this first chapter is all about getting a solid overview of that language and how you can use it. For example, you probably already know that you can use XML to create your own elements, thus creating a customized markup language for your own use. In this way, XML supercedes other markup languages such as Hypertext Markup Language (HTML); in HTML, all the elements you use are predefinedand there are not enough of them. In fact, XML is a metamarkup language because it lets you create your own markup languages.
Markup Languages
Markup languages are all about describing the form of the documentthat is, the way the content of the document should be interpreted. The markup language that most people are familiar with today is, of course, HTML, which you use to create standard Web pages. Here's an example HTML page:
Listing ch01_01.html
<HTML> <HEAD> <TITLE>Hello From HTML</TITLE> </HEAD> <BODY> <CENTER> <H1> Hello From HTML </H1> </CENTER> Welcome to the wild and woolly world of HTML. </BODY> </HTML>
You can see the results of this HTML in Figure 1-1 in Netscape Navigator. Note that the HTML markup in this pagethat is, tags such as <HEAD>, <CENTER>, <H1>, and so onis there to give directions to the browser. That's what markup does; it specifies directions on the way the content is to be interpreted.
Figure 1-1 An HTML page in a browser.
When you think of markup in terms of specifying how the content of a document is to be handled, it's easy to see that there are many kinds of markup languages all around already. For example, if you use a word processor to save a document in Rich Text Format (RTF), you'll find all kinds of markup codes embedded in the document. Here's an example; in this case, I've just created an RTF file with the letters abc underlined and in bold using Microsoft Wordtry searching for the actual text (hint: it's near the very end):
{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033 \deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}}{\colortbl;\red0 \green0\blue0;\red0\green0\blue255;\red0\green255\blue255; \red0\green255\blue0;\red255\green0\blue255;\red255\green0 \blue0;\red255\green255\blue0;\red255\green255\blue255;\red0 \green0\blue128;\red0\green128\blue128;\red0\green128\blue0; \red128\green0\blue128;\red128\green0\blue0;\red128\green128 \blue0;\red128\green128\blue128;\red192\green192\blue192;} {\stylesheet{\widctlpar\adjustright \fs20\cgrid \snext0 Normal;} {\*\cs10 \additive Default Paragraph Font;}}{\info{\title } {\author Steven Holzner}{\operator Steven Holzner}{\creatim \yr2000\mo\dy\hr\min}{\revtim\yr2000\mo4\dy17\hr13\min55} {\version1}{\edmins1}{\nofpages1}{\nofwords0}{\nofchars1} {\*\company SteveCo}{\nofcharsws1}{\vern89}}\widowctrl\ftnbj \aenddoc\formshade\viewkind4\viewscale100\pgbrdrhead\pgbrdrfoot \fet0\sectd \psz1\linex0\endnhere\sectdefaultcl {\*\pnseclvl1 \pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2 \pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3 \pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4 \pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5 \pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}} {\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (} {\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1 \pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm \pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\pard\plain \sl480\slmult1\widctlpar\adjustright \fs20\cgrid {\b\fs24\ul abc }{\b\ul \par }}
The markup language that most people are familiar with these days is HTML, but it's easy to see how that language doesn't provide enough power for anything beyond creating standard Web pages.
HTML 1.0 consisted of only a dozen or so tags, but the most recent version, HTML 4.01, consists of almost 100and if you include the other tags added by the major browsers, that number is closer to 120. But as handling data on the Web and other nets intensifies, it's clear that 120 tags isn't enoughand, in fact, you can never have enough.
For example, what if your hobby was building model ships and you wanted to exchange specifications with others on the topic? HTML doesn't include tags such as <BEAMWIDTH>, <MIZZENHEIGHT>, <DRAFT>, <SHIPCLASS>, and the others you might want. What if you were a major bank that wanted to exchange financial data with other institutionswould you prefer tags such as <B>, <UL>, and <FONT>, or tags such as <FISCALYEAR>, <ACCOUNTNUMBER>, <TRANSFERACCOUNT>, and others? (In fact, such markup languages as Extensible Business Reporting Language exist nowand they're built on XML.)
What if you were a Web browser manufacturer and wanted to create your own markup language to let people configure your browser, adding scrollbars, toolbars, and other elements? You might create your own markup language to do that; in fact, Netscape has done just that with the XML-based User Interface Language, which we'll see in this chapter.
The upshot is that there are as many reasons to create markup languages as there are ways of handling dataand, of course, that's unlimited. That's where XML comes in: It's a metamarkup specification that lets you create your own markup languages.