Effortless Flex 4 Development: Four Data Formats
Four Data Formats
There are four data formats that I'll discuss and use in this book: plain text, XML, JSON, and AMF. To start things off simply, I want to look at each on its own, out of context, so that you can better distinguish the data representation from the code being used to manipulate the data.
When evaluating the pros and cons of each format, there are four criteria to consider:
- Ease of creation on the server
- Ease of use on the client
- Potential data complexity
- Transmitted data size
For the first and second criteria, I'm thinking of how much code is required to generate data in a given format and to retrieve the data from that format. These considerations are tied together and are also reflective of the third criteria. (Implied in these is whether extra libraries are required as well.) Some data, like plain text, is easy to create and use but cannot convey that much information; conversely, JSON is harder to create and use but can convey lots of details.
A final matter is the size, in bytes, of the data itself. This is important, as the amount of data being transferred from the server to the client will have an impact on the application's performance and the server's scalability.
For example, if you want to send the word Flex from one computer to another, you're only sending four characters of data, which is probably also four bytes: not a big deal. But if you wanted, like in an HTML page, to indicate that the word Flex should be in bold, you'd now have to send < strong > Flex</strong >. By using HTML to mark up the text, you've more than quadrupled the amount of data being transferred while only conveying one extra bit of information (bold). As the number of people using that data increases, even minor size differences can have huge implications.
You have to remember, however, that the data size is just one criteria; there is no benefit to data that is transmitted more quickly but isn't all that useful. As the developer, you'll need to select the appropriate data format for the situation, which is why it's necessary to be familiar with the options out there. Secondarily, if you're making use of third-party services, like Yahoo!, Amazon, and others, you likely won't have a choice as to what data format to use, and will just have to handle whatever that service returns.
Plain Text
Plain text is exactly as it sounds: characters without any formatting or markup. Examples are:
- 112.43
- true
- red,blue,green
- Franz Kafka was a 20th century writer...
It's just text: There's nothing to be evaluated or parsed or interpreted or anything. Plain text is universally readable and easily transmitted.
For the record, I'm making a distinction between plain text as a data format as opposed to a file format. XML, JSON, MXML, ActionScript, and lots of other languages are written in plain text files, whereas Microsoft Word documents, among many others, are binary files. The former are readable by many applications; the latter are stored in proprietary formats, and are only readable by a few programs.
As a data format, plain text contains no markup. For example, the HTML < strong > Flex < /strong > contains tags that provide added meaning. Plain text data has none of this. The most elaborate meaning that might be conveyed in plain text would be commas or tabs used to break up individual values, and newlines to indicate individual lines of text. In other words, the most complex bit of plain text data will just be a list of comma-, tab-, or newline-separated values.
Plain text is the simplest data format to create and use. And when plain text is the media, the server only transmits the minimum amount of data, without any extra information. However, plain text can only be effective for representing the most basic information.
XML
XML has been around for years and is widely, widely used. In fact, MXML is derived from XML, as are HTML, Really Simple Syndication (RSS), and many other languages. XML is technically plain text, in terms of how it's created and stored, but it contains markup to provide additional details.
Example XML
XML starts with a root document element, like Application in MXML. XML contains only one root document element, and all of the information must be stored within that. For an example, if you want to represent a catalog of artists as XML, you might start with an artists root element:
<artists></artists>
Next, within that root tag, you may have an element for each individual artist:
<artists> <artist></artist> <artist></artist> <artist></artist> </artists>
Within the artist tag you'll want to represent all the information about a given artist: name, date of birth, works of art, etc. To make the data most usable, you wouldn't want to place this information just within the artist tag, but rather as atomically as possible, with each nugget in its own element:
<artist> <name>Georges Seurat</name> <birthDate>December 2, 1859</birthDate> </artist>
To represent individual works of art, where multiple works would be associated with each artist, you would create a new work tag, one for each piece, but all within the artist record:
<artist> <name>Georges Seurat</name> <birthDate>December 2, 1859</birthDate> <work> <title>The Laborers</title> <year>1883</year> </work> <work> <title>The Models</title> <year>1888</year> </work> </artist>
You can also add information using attributes (just like MXML components have attributes or properties). Again, what attributes you create where is really up to you:
<artist id="2490">
or
<work type="painting">
Of course, elements can have multiple attributes:
<work id="583782" type="painting">
In the end, a body of XML data is said to create a tree-like structure. You have the root element that contains one or more child elements (a child is also called a node). Each child can have its own children, and so on, until the data is fully represented.
XML Syntax Rules
You just saw an example of how XML is structured, but what are the exact syntactical rules? Chapter 1, "Building Flex Applications," discusses the basic XML rules with respect to MXML, but here they are again:
- XML is case-sensitive.
- Every attribute value must be quoted (and double-quotes are preferred).
- Every tag needs to be formally closed.
- Every tag needs to be properly nested.
With respect to closing tags, there are two options. The first is to create opening and closing tags, as in the example XML. This is normally done for any element that might contain values, including other elements. You can also close tags by concluding the opening tag with a slash before the closing angle bracket:
<tag />
You'll see this latter syntax in cases where the values are solely reflected in attributes:
<image source="somefile.png" />
As for nesting of tags, this only applies if you're using opening and closing tags. What is meant is that if you open element A, then open element B (so that element B is contained within—and is a child of—element A), you must then close B before closing A.
Because XML uses the angle brackets and quotation marks to create elements and attributes, you cannot use them within the values of elements or attributes, as in
<question answer="false">4 > 8</question>
In such cases, you'll need to use an entity version of any special character. Table 7.1 lists the five entities to watch for in XML. Each begins with an ampersand and ends with a semicolon.
Table 7.1. XML Entities
Symbol |
Entity Version |
& |
& |
< |
< |
> |
> |
' |
' |
" |
" |
If you'd rather not use entities, you can use a CDATA block:
<question answer="false"><![CDATA[ 4 > 8 ]]></question>
Whether you use CDATA or entities will probably be determined by the amount of data being represented and the number of special characters found within that data.
Finally, in terms of XML syntax, know that XML stored or transmitted often includes the XML declaration, prior to the XML data itself:
<?xml version="1.0" encoding="utf-8"?>
This is the same code as found on the first line of MXML files. It indicates to the program reading the file that the file contains UTF8-encoded XML.
Pros and Cons
XML is relatively easy to create, is extremely extensible ("extensible" is in the name, after all), and can convey lots of well-organized data. Thanks to something called E4X, XML is a snap to use in ActionScript, as you'll see later in the chapter.
On the other hand, XML syntax is demanding and, if it's not 100% right, like if you don't properly nest or close a tag, then the entire XML data will be unusable. This can be a common cause of problems when dynamically generating XML. Also, because of all the added tags, both opening and closing, XML involves transmitting extra data.
Still, all in all, XML is an excellent choice for most situations.
JSON
JavaScript Object Notation (JSON) is one of those acronyms that may sound obtuse but is really exactly what it means: In JSON, data is represented as a JavaScript object. Since JavaScript and ActionScript have the same lineage, JSON data is quite similar in syntax to an ActionScript object, too. A JavaScript (or ActionScript) object, at its root, is represented by curly braces: {}. That is a valid, albeit empty, object.
JSON data objects have property-value pairs (properties just being variables found within an object). To add these to an object, use the property : value syntax, with each pair separated by a comma:
{ name: "Georges Seurat", birthDate: "December 2, 1859" }
The properties need not be quoted, but the values do unless they are numbers. Still, you'll commonly see all of the properties and values quoted:
{ "title":"House I", "year": "1998" }
If you have multiple objects to represent, you create an array of them, using the square brackets, separating each object by a comma:
[{ name: "Georges Seurat", birthDate: "December 2, 1859" }, { name: "Roy Lichtenstein", birthDate: "October 27, 1923" }]
JSON is equally able to represent structured data easily, and with fewer characters than XML (because you're omitting the closing tags). But while you may think JSON's syntax is simpler than XML, the syntax can get hairy quickly. For example, this bit of XML,
<artist id="3955"> <name>Roy Lichtenstein</name> <birthDate>October 27, 1923</birthDate> <work type="sculpture"> <title>House I</title> <year>1998</year> <image source="3955_housei.png" /> </work> </artist>
would look like this in JSON:
{ artist:{ id:3955, name: "Roy Lichtenstein", birthdate: "October 27, 1923", work:{ type: "sculpture", title: "House I", year:1998, image:{ source: "3955_housei.png" } } } }
Now imagine how the JSON would look when you start representing multiple artists and multiple works of art! And then, to be formal, wrap everything in double-quotes...
As with XML, make a slight mistake, like omitting a comma or curly bracket, and the data is useless. For this reason, you'll really want to use a special library in PHP to create the JSON data and a few extra steps are required to use the data in ActionScript. You don't have to transfer quite as much JSON data as you would with XML, but the efforts to encode and decode it minimize any comparable benefit.
While I'm inclined to choose XML over JSON (between the two), you might sometimes be in a situation where JSON is your only option.
AMF
Action Message Format (AMF) was created by Adobe specifically as a way to improve communications between a server and a Flash client. Unlike plain text, XML, and JSON, AMF uses a binary format, which means that it cannot be represented in a book like the others. More than a format for transmitting data, AMF provides a foundation for clients to directly communicate with servers. Instead of just calling a PHP script and seeing the results, AMF lets the client interact with PHP scripts, as appropriate. In this regard, the PHP script on the server is acting as an intelligent service, not as a single, standalone document. Moreover, by using something called a value object, complex data types can be transmitted and used by both the client and server as if they were native to both ActionScript and PHP (this will mean more at the end of Chapter 9, "Using Complex Services").
Using AMF in PHP requires a special library and the use of object-oriented programming (OOP). But the OOP can be trivial and easy to implement, and the AMF response allows for complex data sets to be transmitted quite efficiently (much as a JPEG can represent an image using a fraction of the original bytes, AMF can create a more byte-efficient representation of data). And, when the AMF response returns to the Flash client, the result can be handled in ActionScript easily.