The Future of Web Standards
Like the chapter we've just read, the first part of this chapter is for bosses, clients, the head of IT, the new director of marketing, and anyone else who doesn't yet see the connection between web standards and their organization's long-term health. For this is where we connect the dots between web standards, "Web 2.0," Search engine optimization (SEO), and other industry-leading ideas that can make even a businessperson who is indifferent to technology sit up and roll over.
This chapter is also for web designers and developers who want to understand what the best web standards have in common, where they came from, and where they (and we) are going. If you're excited (or worried) about HTML5, if you're relieved (or disappointed) that activity on XHTML2 has ceased, or if you're a hard-working professional who has no idea what I'm talking about, this chapter is for you.
Let's start with the stuff your boss or client needs to know.
Findability, Syndication, Blogs, Podcasts, the Long Tail, Ajax (and Other Reasons Standards Are Winning)
Are you bosses, clients, IT folks, and marketers listening? Here's what you need to know: despite misunderstandings that stymie their adoption in some quarters, standards are winning on many fronts and are rapidly changing technology, business, and publishing on and off the web. Indeed, web standards have played a defining role in just about every market-changing, money-making digital innovation of the past five years.
Take podcasts. Or take blogs (please!). Even the assistant to the lower middle manager of marketing has heard of those. What makes them run? RSS—an XML application. What else does this XML application do? It helps traditional newspapers and magazines migrate their content to the one place more and more people are reading it—namely, the web.
Maybe the new director of marketing has been reading about "long-tail" marketing, where small sales add up to big bucks. In October 2004, Wired editor-in-chief Chris Anderson, who is also the curator of the TED conference, discovered that "more than half of Amazon's book sales come from outside its top 130,000 titles.... [Thus] the market for books that are not even sold in the average bookstore is larger than the market for those that are" (www.wired.com/wired/archive/12.10/tail.html).
The web is where people with obscure tastes can find niche products the local shop can't afford to carry. Who will best ride the long tail? Those whose content is most easy to discover. Peter Morville, co-creator of modern information architecture, calls this success-fueling quality "findability."
To make their products findable on the web, companies spend millions on search engine optimization. Yet some companies that can't afford to spend a dime on SEO nevertheless do brilliantly with search engines and long-tail sales. Their secret? They write lean, keyword-rich, buzzword-free content that's actually relevant to their customers—and let semantic markup as described in this book push their text to the top of the digital data pile. Coupled with appropriately written and edited copy, CSS layout and structural XHTML are the golden keys to findability. Companies that know this are prospering. Those that don't are falling behind. (For the fate of those who worship SEO but ignore the value of semantic markup, see the Twitter screen shot in this book's preface—and feel free to print those words on T-shirts, mugs, and bumper stickers.)
If the web looked moribund in 2000, it and the internet are once again blossoming—and, in spite of turmoil in other economic sectors, sprouting pretty flowers of cash—thanks to new ideas and new technologies powered by web standards. Not least of these technologies is Extensible Markup Language (XML), an all-embracing data format that's been almost universally adopted and adapted to meet complex needs.
The Universal Language (XML)
The Extensible Markup Language standard (www.w3.org/TR/REC-xml) took the software industry by storm when it was introduced in February 1998. For the first time, the world was offered a universal, adaptable format for structuring documents and data, not only on the web, but everywhere. The world took to it as a lad in his Sunday best takes to mud puddles. Although the "XML web" anticipated by futurists has not come to pass (in hindsight, the prediction that XML will replace HTML looks rather like flying cars and time machines), specific XML applications have transformed and revitalized the medium, and XML has supercharged the consumer and professional software businesses.
XML and HTML Compared
Although it's based on the same technology that gave rise to HTML (and though, just like HTML, it uses tags, attributes, and values to format structured documents), XML is quite different from the venerable markup language it was intended to replace.
HTML is a basic language for marking up web pages. It has a fixed number of tags and a small set of somewhat inconsistent rules. In HTML, you must close some tags, mustn't close others, and might or might not want to close still others, depending on your mood. This looseness makes it easy for anyone to create a web page, even if they don't quite know what they're doing—and that, of course, was the idea.
It was a fine idea in the early days, when the web needed basic content and not much else. And at heart, it will always be a fine idea, for the democratizing power of the web consists precisely in its low access barrier. But for today's larger, more sophisticated sites, where pages are assembled via publishing tools and content must flow back and forth from database to web page to mobile device to print, the lack of uniform rules in HTML may impede data repurposing. It's easy to convert text to HTML, but it's difficult to convert data marked up in HTML to any other format.
Likewise, HTML is merely a formatting language, and not a particularly self-aware one. It contains no information about the content it formats, again limiting your ability to reuse that content in other settings. (Microformats, discussed later, represent one fairly successful attempt to enrich the semantics of HTML.) And, of course, HTML is strictly for the web.
XML-based markup, in contrast to HTML, is bound by consistent rules and is capable of traveling far beyond the web. When you mark up a document in XML, you're not merely preparing it to show up on a web page. You're encoding it in tags that can be understood in any XML-aware environment.
One Parent, Many Children
Specifically, XML is a language for creating other languages. As long as they adhere to its rules, librarians are free to create XML markup whose custom tags facilitate the needs of cataloging. Music companies can create XML markup whose tags include artist, recording, composer, producer, copyright data, royalty data, and so on. Composers can organize their scores in a custom XML markup language called MusicML. (To avoid carpal tunnel syndrome, I'll refer to "creating XML markup" as "writing XML" from here on.)
These custom XML languages are called applications, and because they are all XML, they are compatible with each other. That is, an XML parser can understand all these applications, and the applications are able to easily exchange data with one another. Thus, data from a record company's XML database can end up in a library's catalog of recordings without human labor or error and without bogging down in software incompatibilities.
An Essential Ingredient of Professional and Consumer Software
This power to format, understand, and exchange data has made XML as ubiquitous as Coca-Cola. XML not only stores content housed in online and corporate databases, but it also has become the lingua franca of database programs like FileMaker Pro and of much non-database-oriented software, from high-end design applications to business products like Microsoft Office and OpenOffice, whose native file formats are XML-based.
Print design powerhouses Quark XPress and Adobe InDesign import and export XML and support the creation of XML-based templates. Web editors such as Dreamweaver are likewise XML-savvy, making it easier (or at least possible) to bounce data back and forth between the printed page, the web layout, and the database that runs your online store or global directory.
Not content to merely parse XML, some products are actually made of the stuff. Dreamweaver has long been built with XML files that are available to the end user, making it possible to modify the program by rolling up your shirtsleeves and editing these files. As far back as 2002, a popular A List Apart article by Carrie Bickner (www.alistapart.com/articles/dreamweaver) explained how to make Dreamweaver 4 (yes, Dreamweaver 4) generate valid XHTML by editing the XML files on which the software was built. Selling customized versions of Dreamweaver is something of a cottage industry. Heck, it's more than a cottage. I know a guy who bought a house with the money he made doing it.
Consumer software loves XML, too. The Personal Information Manager on your PC, Mac, or PDA reads and writes XML or can be made to do so via third-party products. When your digital camera time-stamps a snapshot and records its dimensions, file size, and other such information, it most likely records this data in XML. Each time your dad emails you those pipe-clobbering 7MB vacation photo sets, he's likely sending you XML-formatted data along with the beauty shots of lens caps at sunset. Hey, your dad's into web standards.
Image management software like Apple's iPhoto understands XML, too. And when you print a family photo, the print comes out right thanks to presets stored as XML data by the Macintosh OS X operating system. (Indeed, the whole UNIX-based OS X operating system stores its preferences as XML.) Apple's iTunes for Windows and Mac is hip to the jive as well. Export a playlist? XML.
More Popular Than a White Rapper
Why has XML seized the imagination of so many disparate manufacturers and found its way into their products? XML combines standardization with extensibility (the power to customize), transformability (the power to convert data from one format to another), and relatively seamless data exchange between one XML application (or XML-aware software product) and another.
As an open standard unencumbered by patents or royalties, XML blows away outdated, proprietary formats with limited acceptance and built-in costs. The W3C charges no fee when you incorporate XML into your software product or roll your own custom XML-based language. Moreover, acceptance of XML is viral. The more vendors who catch the XML bug, the faster it spreads to other vendors, and the easier it becomes to pass data from one manufacturer's product to another's.
Plus, XML works. Gone are the days when your officemates considered you a guru if you were able to beat plain, tab-delimited text out of one product and import it into another (often with some data loss and much manual reformatting). XML helps vendors build products whose interoperability empowers consumers to work smarter, not harder. Consumers respond with their pocketbooks.
Not a Panacea, But Plays One on TV
I'm not saying that XML is a panacea for all software problems. The data in a JPEG is much better expressed in binary format than as text. Nor do I claim that every software package on the market "gets" XML, although most professional applications and many consumer products do, and their numbers are continually growing. I'm not even saying that all software that claims to support XML does so flawlessly. (Not even on the web. As mentioned elsewhere in this book, the chief gripe against XHTML, the XML version of HTML, is that Internet Explorer treats it as HTML.) But flawlessly implemented or not, XML is the web standard that has most transformed the software industry and the hardware we use in our homes and workplaces.
Even the makers of products that don't support XML seem to believe they should. In April 2002, distressed by lackluster sales and a fragmented middleware market, a group of interactive television and technology providers banded together under the banner of the iTV Production Standards Initiative (www.itvstandards.org). Its mission: to unveil—and shore up support for—an XML-based standard intended to "allow producers to write interactive content once and distribute it to all major set-top box and PC platforms." Sound familiar? It's exactly what The Web Standards Project had to say about W3C standards during the browser wars of the mid- to late 1990s.
Builds Strong Data Five Ways
On the web, XML is increasingly the format of choice for IT professionals, developers, and content specialists who must work with data housed in large corporate or institutional systems. Choosy mothers choose XML for five reasons, many of which will be familiar from the preceding discussion:
- Like ASCII, XML is a single, universal file format that plays well with others.
- Unlike ASCII (or HTML), XML is an intelligent, self-aware format. XML not only holds data; it can also hold data about the data (metadata), facilitating search and other functions.
- XML is an extensible language: it can be customized to suit any business or academic need, or used to create new languages that perform specific tasks, such as data syndication or the delivery of web services.
- XML is based on rules that ensure consistency as data is transferred to other databases, transformed to other formats, or manipulated by other XML applications.
- Via additional XML protocols and XML-based helper languages, XML data can be automatically transferred to a wide variety of formats, from web pages to printed catalogs and annual reports. This transformational power is the stuff developers could only dream about before XML came along. Nor do corporate bean counters fail to appreciate the cost-saving efficiencies that XML facilitates.
A Mother Lode of Inventions
While a complete discussion of XML is outside the scope of this book, the examples that follow will suggest the depth of XML acceptance on and beyond the web and illustrate how the continual emergence of new XMLderived languages and protocols solves problems that once daunted even the brainiest developers.
Resource Description Framework (www.w3.org/RDF)
This XML-based language provides a coherent structure for applications that exchange metadata on the web. In practical terms, RDF integrates library catalogs and directories; collects and syndicates news, software, and all kinds of content; and facilitates communication and sharing between various types of collections (such as personal photo and music collections, to steal an example from the write-up on W3C's site). The power of RDF can also drive software. If you happen to have the Mozilla browser available on your desktop, open its folders and sniff around. You'll find RDF (and CSS) files that help the browser do its job. Specifically, dig around in the profile folders. Each profile has its own set of XML-based files.
RDF can be a frustrating, obtuse language—but in the right hands, it empowers beautiful creations. Jo Walsh (frot.org) has done remarkable work annotating geospatial relationships with RDF (space.frot.org). And in a single essay on RDF-powered taxonomies, writer Paul Ford made the notion of a "semantic" web real to thousands of designers for whom it had previously seemed so much airy piffle (www.ftrain.com/arbs_and_all.html). For more RDF fun facts, see Tim (Mr. XML) Bray's "What Is RDF?" at XML.com (www.xml.com/pub/a/2001/01/24/rdf.html).
RDFa (www.w3.org/TR/xhtml-rdfa-primer)
The W3C intends RDFa to serve as a bridge between the "human and data webs." Like microformats (although disliked by some in the microformats community), RDFa adds semantics to (X)HTML using existing elements such as a and rel. For a friendly overview, spend a pleasant half hour with Mark Birbeck's "Introduction to RDFa" (www.alistapart.com/articles/introduction-to-rdfa) and "Introduction to RDFa Part II" (www.alistapart.com/articles/introduction-to-rdfa-ii).
Extensible Stylesheet Language Transformations (www.w3.org/TR/xslt)
This XML-based markup language can extract and sort XML data and format it as HTML or XHTML, ready for immediate online viewing. If you prefer, XSLT can transform your data to PDF or plain text or use it to drive a continuously updateable chart or similar business image rendered in the Scalable Vector Graphics (SVG) format. XSLT can even do all these things simultaneously. For a hands-on tutorial, see J. David Eisenberg's "Using XML" (www.alistapart.com/articles/usingxml).
Rich Site Summary 2.0 (blogs.law.harvard.edu/tech/rss)
I see the marketing folks are with us again. Rich Site Summary (RSS) is a lightweight XML vocabulary for describing websites. I can use it to tell you when I update my site's content. More radically (and more appealingly to a marketer), I can also use it to send you the content. Remember those meetings you slept through, where people yakked about making your site "sticky"? This is way better. Instead of you hoping your readers will stick around your site, with RSS your content sticks to your readers.
In ancient times, Dan Libby developed RSS to populate AOL/Netscape's "My Netscape" portal. (No, I don't remember it either.) After AOL lost interest in April 2001, Dave Winer's UserLand Software Company carried the spec forward. Winer later left UserLand for academic pastures, and the RSS spec is now housed under a Creative Commons license at Harvard's Berkman Center (cyber.law.harvard.edu).
Today RSS 2.0 is streamed from millions of personal and corporate sites, blogs, and social media networks, making it possibly the most widely accepted XML format on the web [4.1, 4.2, 4.3]. Its simple, powerful syndication empowers both blogging and podcasting (see sticky note "You Got Your Podcast in My Webcast!"). All blog-authoring software supports RSS 2.0 along with a competing specification called Atom. There are aggregators (sites or products that "harvest" RSS feeds) and there are services that alert search engines when you update (www.pingomatic.com).
4.1 Standards-compliant blogging platforms such as WordPress support RSS out of the box (www.wordpress.org).
4.2 Same goes for Movable Type (www.movabletype.org).
4.3 Social media media networks from Twitter to Flickr (shown here) allow friends to "follow" each other via RSS (www.flickr.com). When I publish a photo, my friends who've subscribed to my feed see the new photo in their RSS reader. OK, I just wanted to get a picture of my kid into this edition. Do you blame me?
Publishers use RSS to stay in contact with existing readers and continually reach new ones. And not just small, forward-thinking independent publishers do this. The USAToday website publishes RSS feeds (content.usatoday.com/marketing/rss/index.aspx). So do the BBC [4.4], Amazon, and Yahoo (developer.yahoo.com/rss/#biglist). Wired News and the New York Times (www.nytimes.com/services/xml/rss/index.html) do it. Even birds in the trees do it. There are RSS feeds for individual sections of newspapers and blogs [4.5] and for the discussions of individual articles (www.alistapart.com/feed/hattrick/rss.xml).
4.4 Like all modern news sites, that of the BBC enables readers to subscribe to a variety of feeds from News Front Page and World to Sci-Tech (a video feed) and Latest Published Stories. The BBC site is salutary in not merely dumping these feeds on a page, but actually explaining them to the uninitiated (news.bbc.co.uk/2/hi/help/rss/default.stm).
4.5 Choose which part of Jason Santa Maria's website you wish to subscribe to (www.jasonsantamaria.com).
It's a publisher's dream, a marketer's joy, and a salesperson's revenue stream. (To the disgust of many and the relief of salespeople and advertisers who see their traditional TV and newspaper outlets declining, more and more RSS feeds include paid advertisements.)
XMLHttpRequest—It's not just for Ajax any more (en.wikipedia.org/wiki/XMLHttpRequest)
Created by Microsoft as part of ActiveX for Internet Explorer/Windows but now also supported as a native object in Apple's Safari and in Mozilla and Opera browsers, the XMLHttpRequest Object works with JavaScript to fetch XML data from servers without forcing a page refresh. Rich user experiences can be fashioned from the uninterrupted interactivity this combination of technologies provides. In a widely read essay, consultant Jesse James Garrett named this approach to application development Ajax(www.adaptivepath.com/publications/essays/archives/000385.php); the acronym helped the method gain traction in the marketplace.
When you hear marketers, investors, and developers discussing "Web 2.0" applications, they most often mean products built using XMLHttpRequest, XML, and JavaScript and displayed in pages designed in CSS and structured in XHTML. As the third edition goes to press, Ajax has pretty much cornered the rich applications market; it is also the power behind social networking sites from Facebook to Flickr.
And XMLHttpRequest is not just for Ajax any more. HTML, JSON, text, and more can be sent asynchronously, providing hours of fun for geeks of all ages (www.hedgerwow.com/360/ajax/rss-json/demo.php). In fact, they always could be—we just didn't bother until Ajax made it sexy by bringing the feeling of desktop applications to the web. Note that Ajax helped JSON take off as an alternative to XML. JSON is now spoken natively by most backend web technologies, including PHP and Ruby on Rails.
XML-RPC (www.xmlrpc.com)
Another UserLand Software innovation, XML-RPC is "a spec and a set of implementations that allow software running on disparate operating systems [and]... in different environments to make procedure calls over the internet." Among other things, XML-RPC can be used to automate site-management tasks in web publishing tools like those described next.
Web Publishing Tools for the Rest of Us
As this brief survey shows, that which the XML-aware software products described earlier do at a price, XML-based languages in the hands of clever developers do for free. In turn, these developers often create new products to facilitate the needs of their fellow designers, developers, and authors.
Personal publishing products like WordPress [4.1] and Movable Type [4.2] employ XML-RPC to facilitate site management and XML RSS to automatically syndicate and distribute content to other XML-aware sites. If WordPress and Movable Type grant their users the power to publish, XML gives these products the ability to exist.
As personal publishing (including podcasting—see sticky note "You Got Your Podcast in My Webcast!") spreads, so does XML, not only among sophisticated developers but also among those who've never heard of the XML standard and would be hard pressed to write XML (or sometimes, even HTML) on their own.
At Your Service(s)
The logic of XML drives the web services market, too. The XML-based Simple Object Access Protocol (www.w3.org/TR/soap) facilitates information exchange in a decentralized, platform-independent network environment, accessing services, objects, and servers, and encoding, decoding, and processing messages. The underlying power of XML allows SOAP to cut through the complexity of multiple platforms and products.
SOAP is only one protocol in the burgeoning world of web services (www.w3.org/2002/ws). David Rosam (www.dangerous-thinking.com) defines web services thusly:
Web Services are reusable software components based on XML and related protocols that enable near zero-cost interaction throughout the business ecosystem. They can be used internally for fast and low-cost application integration or made available to customers, suppliers, or partners over the Internet.
That's excellent from a business point of view, but what makes web services magical is their inclusion of libraries called APIs (en.wikipedia.org/wiki/API) that let one web service spawn an endless number of derivative works—most often supported by GNU (www.gnu.org/copyleft/gpl.html) or Creative Commons (www.creativecommons.org) licensing to ensure that the "child" products will be free of legal encumbrance.
Because Google Maps (maps.google.com), Flickr [4.3], and Amazon.com sport APIs, independent developers can spin decentralized services using centralized data. In 2005, Chicago-based journalist and web developer Adrian Holovaty, co-creator of the open-source Django Web framework (www.djangoproject.com) created one of the first, pre-API Google Map mashups, chicagocrime.org [4.6]. The site played a small part in influencing Google to open its map API. Holovaty took things a step further with EveryBlock [4.7], "an experiment in microlocal news." Taking the idea of decentralization one step further, Apple's Dashboard Widgets are consumer-written applications—built with XHTML, XML, CSS, and standard JavaScript—that pull remote data to your desktop (www.apple.com/downloads/dashboard).
4.6 They stand on the APIs of giants. EveryBlock's Chicago Crime section (chicago.everyblock.com/crime), formerly at chicagocrime.org, connects crimes reported by the Chicago police department to their locations, mapped by Google. Invented by Adrian Holovaty, it was one of the first, pre-API Google Map mashups, and helped encourage Google to open its map API.
4.7 EveryBlock (www.everyblock.com), the natural extension of chicagocrime.org, combines the worlds of data and journalism via the power of web standards. Is this the newspaper of the future?
What makes Widgets and sites like EveryBlock so exciting is our knowledge that they are only the beginning of a great creative outpouring. They are like the one-reel silent movies of the late 19th century: interesting in themselves, explosive in their implications for the future.
XML Applications and Your Site
XML is the language on which Scalable Vector Graphics (www.w3.org/TR/SVG) and Extensible Hypertext Markup Language (www.w3.org/TR/2002/REC-xhtml1-20020801) are based. Illustrators who export their client's logo in the SVG format and web authors who compose their pages in XHTML are using XML, whether they know it or not.
The rules that are common to all forms of XML help these formats work together and with other kinds of XML—for instance, with XML stored in a database. An SVG graphic might be automatically altered in response to a visitor-generated search or continuously updated according to data delivered by an XML news feed.
The site of a local TV news channel could use this capability to display live metro traffic in all its congested glory. As one traffic jam cleared and another began, the news feed would relay this information to the server, where it would be formatted as user-readable text content in XHTML and as an updated traffic map in SVG. At the same time, the data might be syndicated in RDF or RSS for sharing with other news organizations or used by SOAP to help city officials pinpoint and respond to the problem.
Although based on XML, SVG graphics are easy to create in products like Adobe Illustrator (www.adobe.com/illustrator). Like Flash vector graphics, images created in SVG can fill even the largest monitors while using little bandwidth. And SVG graphics, like other standard web page components, can be manipulated via Standard JavaScript and the DOM. Not to mention that SVG textual content is accessible by default, and can even be selected with the cursor no matter how it's been stretched or deformed. Firefox supports SVG natively.
Compatible by Nature
Because they share a common parent and abide by the same house rules, all XML applications are compatible with each other, making it easier for developers to manipulate one set of XML data via another and to develop new XML applications as the need arises, without fear of incompatibility.
Ubiquitous in today's professional and consumer software, widely used in web middleware and backend development, and essential to the web services market, XML has succeeded beyond anyone's wildest dreams because it solves everyone's worst nightmares of incompatibility and technological dead ends.
Software makers, disinclined to risk customer loss by being the odd man out, recognize that supporting XML enables their products to work with others and remain viable in a changing market. Executives and IT professionals, unwilling to let proprietary systems continue to hold their organizations' precious data hostage, can solve their problem lickety-split by converting to XML. Small independent developers can compete against the largest companies by harnessing the power of XML, which rewards brains, not budgets.
In today's data-driven world, proprietary formats no longer cut it—if they ever did. XML levels the playing field and invites everyone to play. XML is a web standard, and it works.
And that is the hallmark of a good standard: that it works, gets a job done, and plays well with other standards. Call it interoperability (the W3C's word for it), or call it cooperation between components. Whatever you call it, XML is a vast improvement over the bad old days of proprietary web technologies. Under the spell of web standards, competitors, too, have learned to cooperate.
The Future of Standards
Thanks to The Web Standards Project, browser makers learned to support the same standards. As an unexpected consequence of their technological cooperation, these once-bitter competitors have also learned to play nicely together in other, often surprising ways.
In July 2002, Microsoft submitted to the W3C's HTML Working Group "a set of HTML tests and testable assertions in support of the W3C HTML 4.01 Test Suite Development" (lists.w3.org/Archives/Public/www-qa-wg/2002Jul/0103.html). The contribution was made on behalf of Microsoft, Openwave Systems, Inc., and America Online, Inc., then-owners of Netscape and Mozilla. Opera Software Corporation (makers of the Opera browser) and The Web Standards Project also reviewed it.
Test Suites and Specifications
W3C test suites enable browser makers to determine if their software complies with a standard or requires more work. No test suite existed for HTML 4.01 (the markup language that is also the basis of XHTML 1.0). In the absence of such a test suite, browser makers who wanted to comply with those standards had to cross their fingers and hope for the best.
Moreover, in the absence of a test suite, the makers of standards found themselves in an odd position. How can you be certain that a technology you're inventing adequately addresses the problems it's supposed to solve when you lack a practical proving ground? It's like designing a car on paper without having a machine shop to build what you've envisioned.
In the interest of standards makers as well as browser builders, a test suite was long overdue.
How Suite It Is
When Microsoft took the initiative to correct the problem created by the absence of a test suite, it chose not to act alone, instead inviting its competitors and an outside group (WaSP) to participate in the standards-based effort. Just as significantly, those competitors and that outside group jumped at the chance. The work was submitted free of patent or royalty encumbrance, with resulting or derivative works to be wholly owned by the W3C. Neither Microsoft nor its competitors attempted to make a dime for their trouble.
In the ordinary scheme of things, Microsoft was not known for considering what was best for Netscape, nor was Netscape overly interested in helping Microsoft—and neither wasted many brain cells figuring out what was good for Opera. And these companies didn't go into business to lose money on selfless ventures. Yet here they were, acting in concert for the good of the web, and focusing not on some fancy new proprietary technology, but on humble HTML4.
Ignored by the trade press, the event signified a sea change. The "set of HTML tests" quietly presented to the W3C by Microsoft and its staunchest business foes signaled a permanent shift in the way the web would now evolve. No longer ignored in deference to proprietary "innovations," web standards now bind browser makers together.
It was only natural and logical that the next step would be for browser makers to take "joint innovation" to the next level by creating web standards together instead of passively waiting for the W3C. Combine uncertainty about the direction of XHTML 2.0, impatience with the W3C process, and a Web 2.0-driven preference for applications over documents, and what happened next was inevitable.
HTML5: Birth of the Cool
In 2005, under the leadership of Ian Hickson, engineers from the Mozilla Foundation and Opera Software formed the Web Hypertext Application Technology (WHAT) Working Group (www.whatwg.org), "a loose, unofficial, and open collaboration of Web browser manufacturers and interested parties" whose goal is "to address the need for one coherent development environment for Web applications, through the creation of technical specifications that are intended to be implemented in mass-market Web browsers."
Although its parent organizations, including the Mozilla Foundation and the Opera Software company, are among the W3C's greatest contributors, the engineers who formed WHAT were frustrated by the sometimes slow pace of W3C standards development. The group's emphasis on practical, browser-related issues, and its preference for HTML over XML, initially set it apart from the W3C. But WHAT chose to work with the W3C, not against it, quickly submitting the first draft of its proposed HTML5 language to the W3C for approval.
By working across company lines and tackling focused areas—for instance, specifying how all browsers should handle RDF controls, menus, and toolbars—the WHAT group hopes to fast-track web standards and rationalize browser development so all browsers uniformly support ever-more-advanced standards.
We'll explore the mechanics of HTML5 in Chapter 7, "HTML5: The New Hope." For now, it's sufficient to discuss some of the language's goals and the way they break from the markup of the present.
A New Semantics in Town
Although CSS is a layout language, it is not a semantic one, and nothing about it suggests page structure. HTML and XHTML are document languages that contain outline structure but no hint of page structure. HTML5 (www.whatwg.org/html5) sets out to change that—and to rid the world of "div soup"—by introducing page layout elements such as header, nav, footer, section, and aside. Lachlan Hunt's "A Preview of HTML 5" (www.alistapart.com/articles/previewofhtml5) explains with simple, elegant clarity the intention behind such elements. In the same article, he explains how HTML5's proposed enhancements to form controls, APIs, and multimedia will "give authors more flexibility and greater interoperability."
Lachlan Hunt is a fan of HTML5; John Allsopp is on the fence. In "Semantics in HTML 5" (www.alistapart.com/articles/semanticsinhtml5), he explains why:
We need mechanisms in HTML that clearly and unambiguously enable developers to add richer, more meaningful semantics—not pseudo semantics—to their markup. This is perhaps the single most pressing goal for the HTML 5 project.
But it's not as simple as coming up with a mechanism to create richer semantics in HTML content: there are significant constraints on any solution. Perhaps the biggest one is backward compatibility. The solution can't break the hundreds of millions of browsing devices in use today, which will continue to be used for years to come. Any solution that isn't backward compatible won't be widely adopted by developers for fear of excluding readers. It will quickly wither on the vine.
The solution must be forward compatible as well. Not in the sense that it must work in future browsers—that's the responsibility of browser developers—but it must be extensible. We can't expect any single solution we develop right now to solve all imaginable and unimaginable future semantic needs. We can develop a solution that can be extended to help meet future needs as they arise.
These two constraints in tandem, present a huge challenge. But in the context of a language whose major iterations arrive a decade apart, and whose importance as a global platform for communication is paramount, this is a challenge that must be solved.
Additional concerns about HTML5 include worries that its tolerance of bad HTML will stymie the movement toward the kind of clean, structured, semantic markup which this book advocates and many developers now practice; concerns that it places the future of the web in the hands of a small group with fairly (or unfairly) fixed ideas; and questions about the process. (Presently, two groups are working on HTML5 simultaneously: the WHATWG, chaired by Mr. Hickson, and a W3C working group, also chaired by Mr. Hickson. If Mr. Hickson adds an element to HTML5 in the WHATWG, the W3C group also chaired by Mr. Hickson may take it out again. The classic "Who's On First?" routine is funny when presented by Abbott and Costello, but troubling when the future of markup hangs in the balance.)
The July 2, 2009, announcement by the W3C that it was discontinuing all work on XHTML 2 (www.w3.org/News/2009#item119) was humane in that it put XHTML 2 out of its misery. But it made those who find parts of HTML5 questionable all the more uneasy, and left some standardistas wondering whether XHTML had been a dead end (www.zeldman.com/2009/07/07/in-defense-of-web-developers). (It hasn't been: XHTML 1.0 will still be working long after you and I retire, and the best ideas from XHTML 2.0 are being incorporated into HTML5. Moreover, HTML5 will support HTML and XHTML syntax—although even that bothers some people.)
While this book still cheerfully recommends XHTML 1.0 Transitional or Strict (as it did in the first two editions), it behooves every designer to learn HTML5 and start working with those parts of it that all modern browsers support. Whether you confine your HTML5 exploration to a personal project or use it on a "real" website (as we have on aneventapart.com) is up to you, your client, your browser stats, and the kinds of sites you design. Those who work chiefly on web applications are most likely to desire the power of HTML5.
Internet Explorer and Web Standards
As companies go, only Apple is more secretive than Microsoft. Thus it represented a hopeful break from the past in 2005 when, instead of developing its next browser version in secret, Microsoft worked with The Web Standards Project to ensure that IE7 supported web standards more accurately than any Microsoft browser had before.
Microsoft worked with The Web Standards Project again as it prepared to release its masterpiece of standards compliance, IE8. Comedy ensued as the company vacillated between shipping the browser in standards-compliance mode by default (at the risk of causing scripting and CSS errors in old-school websites optimized for IE only) and using a meta declaration to toggle Standards mode on (thus failing to support standards unless developers explicitly treat IE as a special case and opt in). There were reasonable arguments to be made on both sides, but when it comes to Microsoft and web standards, nobody feels like arguing reasonably. Sacrificing a goat in a roomful of kindergarteners would have gone down with less protest than has attended this on-again, off-again toggle tug-of-war.
For details, see Aaron Gustafson's "Beyond DOCTYPE: Web Standards, Forward Compatibility, and IE8" (www.alistapart.com/articles/beyonddoctype), the article that announced that IE8 would provide advanced standards support on an opt-in basis. Follow it, if you wish, with Eric Meyer's "From Switches To Targets: A Standardista's Journey" (www.alistapart.com/articles/fromswitchestotargets), Jeremy Keith's "They Shoot Browsers, Don't They?" (www.alistapart.com/articles/theyshootbrowsers), and Jeffrey Zeldman's (hey, that's me!) "Version Targeting: Threat or Menace?" (www.alistapart.com/articles/minorthreat). On a positive note, all major browsers now beautifully support HTML 4.01, XHTML 1.0, CSS1, CSS2.1, standard JavaScript, and the DOM—the very things The Web Standards Project demanded (but little expected to see come to pass) when we formed the group in 1998.
Where IE is concerned, the trick for standards-based designers and developers is to decide precisely what "supporting" IE6 users means on the one hand, and determining whether to use CSS 3, which Firefox, Safari, and Opera support (but even IE8 does not), on the other. Your decision and your mileage may vary. We'll have more to say about all this in Part II.
Authoring and Publishing Tools
Developed at the height of the browser wars, a market-leading, professional visual editor like Adobe Dreamweaver initially addressed the problem of browser incompatibility by generating markup and code optimized for 3.0 and 4.0 browsers. When browsers ran on nonstandard, invalid HTML tags, that's what Dreamweaver created. As browsers shored up their standards support, tools like Dreamweaver needed to do likewise. In 2001, with help from The Web Standards Project, it did.
The Web Standards Project's Dreamweaver Task Force, led by Drew McLellan and Rachel Andrew, was created in 2001 to help Dreamweaver's engineers improve the standards compliance and accessibility of sites the tool produces. The Task Force's history can be found at www.webstandards.org/act/campaign/dwtf. Among the group's objectives were these:
- Dreamweaver should produce valid markup "out of the box." (Valid markup uses only standard tags and attributes and contains no errors.)
- Dreamweaver should allow the choice between XHTML and HTML versions, inserting a valid DTD for each choice. (A DTD, or Document Type Definition, tells the browser what kind of markup has been used to author a web page. See Chapter 5.)
- Dreamweaver should respect a document's DTD and produce markup and code in accordance with it.
- Dreamweaver should enable users to easily create web documents accessible to all.
- Dreamweaver should render CSS 2 to a good level of accuracy so that pages formatted with CSS can be worked on within the Dreamweaver visual environment.
- Dreamweaver should not corrupt valid CSS layouts by inserting inline styling without the user's consent.
- Dreamweaver users should feel confident that their Dreamweavercreated pages will validate and have a high level of accessibility.
Released in May 2002, Dreamweaver MX achieved these objectives, and the product's standards support has improved ever since.
In 2006, Molly Holzschlag of The Web Standards Project worked with Microsoft to ensure the standards compliance of its visual web editor, Expression Web Designer (www.microsoft.com/products/expression). The product supports XHTML out of the box and provides CSS comparable to Dreamweaver's.
No visual editor can match hand coding for smart CSS and semantic markup. But pros who like working with visual editors have two standards-compliant options to choose from. (WaSP also worked with Microsoft to improve the compliance of Visual Basic Studio.)
The Road to Joy Is Paved with Validation
Today, more and more designers are using web standards to create sites that are beautiful, usable, and accessible. More and more developers are using them to bring new products and new ideas to the digital marketplace. Ajax is the new black, findability trumps animation, and the modern social contract is written in (X)HTML, CSS, and JavaScript. Yet (X)HTML/CSS validation is rarely achieved on large-scale commercial sites, even when the initial templates validate and the client and designers are fully committed to supporting W3C specifications.
Outdated content management systems and compromised databases cause many of these validation errors. Others come from third-party ads served with invalid methods and improper URL handling. But understanding why an otherwise responsible standards-based site fails to achieve perfect validation is not the same as condoning the neglect of web standards by CMS makers and advertising services.
For the web to progress to its full potential, publishing tools must become standards compliant. Site owners and managers must tell CMS vendors and ad-service managers that compliance matters, just as designers and developers once told browser makers. When enough customers do this, the vendors will upgrade their products, and 99.9% of websites will be begin to leave obsolescence behind.