- Markup Languages
- What Does XML Look Like?
- What Does XML Look Like in a Browser?
- What's So Great About XML?
- Well-Formed XML Documents
- Valid XML Documents
- Parsing XML Yourself
- XML Resources
- XML Editors
- XML Browsers
- XML Parsers
- XML Validators
- CSS and XSL
- XLinks and XPointers
- URLs Versus URIs
- ASCII, Unicode, and the Universal Character System
- XML Applications
What's So Great About XML?
XML is so popular for many reasons. I'll examine some of them here as part of our overview of where XML is today. My own personal favorite is that XML allows easy data handling and exchange, and I'm going to start with that.
Easy Data Exchange
I've been involved with computing for a long time, and one of the things I've watched with misgiving is the growth of proprietary data formats. In earlier days, programs could exchange data easily because data was stored as text. Today, however, you need conversion programs or modules to let applications transfer data between themselves. In fact, proprietary data formats have become so complex that frequently one version of a complex application can't even read data from an earlier version of the same application.
In XML, data and markup is stored as text that you yourself can configure. If you like, you can use XML editors, as we'll see, to create XML documents. If something goes wrong, however, you can examine or modify the document directly because it's all just text. The data is also not encoded in some way that has been patented or copyrighted, which some formats are, so it's more accessible.
You might think that binary formats would be more efficient because they can store data more compactly, but that's not the way things have worked out. Microsoft Corporation, for example, is notorious for turning out huge applications that store even simple data in huge files (the not-so-affectionate name for this is "bloatware"). If you store only the letters abc in a Microsoft Word 2000 document, you might be surprised to find that the document is something like 20,000 bytes. A similar XML file might be 30 or 40 bytes. Even large amounts of data are not necessarily stored efficiently; Microsoft Excel, for example, routinely creates large files that are five times as long as the corresponding text, and Microsoft Access XP creates files that start at 96KB.
In addition, when you standardize markup languages, many different people can use them; I'll take a look at that next.
Customizing Markup Languages
As we've already seen, you can create customized markup languages using XML, and that represents its extraordinary power. When you and a number of other people agree on a markup language, you can create customized browsers or applications that handle that language. Hundreds of such languages are already being standardized now, including these:
Banking Industry Technology Secretariat (BITS)
Financial Exchange (IFX)
Bank Internet Payment System (BIPS)
Telecommunications Interchange Markup (TIM)
Schools Interoperability Framework (SIF)
Common Business Library (xCBL)
Electronic Business XML Initiative (ebXML)
Product Data Markup Language (PDML)
Financial Information eXchange protocol (FIX)
The Text Encoding Initiative (TEI)
Some customized markup languages, such as Chemical Markup Language (CML), let you represent complex molecules graphically, as we'll see later in this chapter. And you can imagine how useful a language would be that creates graphical building plans for architects when you open a document in a browser.
Not only can you create custom markup languages, but you can extend them using XML as well. So, if someone creates a markup language based on XML, you can add the extensions you want easily. In fact, that's what's happening now with Extensible Hypertext Markup Language (XHTML), which I'll take a look at briefly in this chapter and in detail later in the book. Using XHTML, you can add your own elements to what a browser displays as normal HTML.
Self-Describing Data
The data in XML documents is self-describing. Take a look at this document:
Listing ch01_05.xml
<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> <GREETING> Hello From XML </GREETING> <MESSAGE> Welcome to the wild and woolly world of XML. </MESSAGE> </DOCUMENT>
Based solely on the names we've given to each XML element here, you can figure out what's going on. This document has a greeting and a message to impart. Even if you came back to this document years later, you could figure out what's going on. This means that XML documents are, to a large extent, self-documenting. (We'll also see in the next chapter that you can add explicit comments to XML files.)
Structured and Integrated Data
Another powerful aspect of XML is that it lets you specify not only data, but also the structure of that data and how various elements are integrated into other elements. This is important when you're dealing with complex and important data. For example, you could represent a long bank statement in HTML, but in XML, you can actually build in the semantic rules that specify the structure of the document so that the document can be checked to make sure it's set up correctly.
Take a look at this XML document:
Listing ch01_06.xml
<?xml version="1.0"?> <SCHOOL> <CLASS type="seminar"> <CLASS_TITLE>XML In The Real World</CLASS_TITLE> <CLASS_NUMBER>6.031</CLASS_NUMBER> <SUBJECT>XML</SUBJECT> <START_DATE>6/1/2002</START_DATE> <STUDENTS> <STUDENT status="attending"> <FIRST_NAME>Edward</FIRST_NAME> <LAST_NAME>Samson</LAST_NAME> </STUDENT> <STUDENT status="withdrawn"> <FIRST_NAME>Ernestine</FIRST_NAME> <LAST_NAME>Johnson</LAST_NAME> </STUDENT> </STUDENTS> </CLASS> </SCHOOL>
Here I've set up an XML seminar and added two students to it. As we'll see in Chapter 2, "Creating Well-Formed XML Documents" and Chapter 3, "Valid Documents: Creating Document Type Definitions," using XML, you can specify, for example, that each <STUDENT> element needs to enclose a <FIRST_NAME> and a <LAST_NAME> element, that the <START_DATE> element can't go in the <STUDENTS> element, and more.
In fact, this emphasis on the correctness of documents is strong in XML. In HTML, a Web author could (and frequently did) write sloppy HTML, knowing that the Web browser would take care of any syntax problems (some Web authors even exploited this intentionally to create special effects in some browsers). In fact, some people estimate that 50% or more of the code in modern browsers is there to take care of sloppy HTML in Web pages. For that kind of reason, the story is different in XML. In XML, browsers are supposed to check your document; if there's a problem, they are not supposed to proceed any further. They should let you know about the problem, but that's as far as they're supposed to go.
So how does an XML browser check your document? There are two main checks that XML browsers can make: checking that your document is well-formed and checking that it's valid. We'll see what these terms mean in more detail in the next chapter, and I'll look at them in overview here.