One of the favorite quotes I contributed to Vincent Flanders’ best-selling book (one which I command all my faithful followers to buy and memorize) appears as item #6 on page 279:
“The separation of content and formatting is not in the Constitution, but it should be.”
Mark-up constructs embedded in data such as <font>, often become tedious obstacles for those of us the nerd-programmer persuasion assigned to safely store and later re-render in a variety of physical and digital formats. In other words, if you save hard-coded HTML in your data, then you’ll find yourself having to find or write programs to strip it out things like evil <marquee> tags when you want to publish the same data in some other format, such as PDF or XML.
For those of you that don’t speak pure geek, I’m talking about having the ability to store apples as applesauce, only to later re-constitute it as a pear or a peach; an ability that is hampered when you pollute the sauce with stems and bird-droppings. This is why it is a good practice to separate your website’s data and formatting using CSS; tableless if possible. There are however times when you need to render data exactly as it appears in real life.
A point made by Sean McGrath almost two years ago in an article entitled: ITworld.com – XML IN PRACTICE – Separating Content from Presentation: Easier Said Than Done. Yes, the article does have some dust on it, but it is one of the better articles I’ve found describing some of the complex issues surrounding the storage of highly specialized and/or sensitive data, especially where ‘guestimates’ equal an adulteration of the data as he writes:
“Bitmapped graphics on the other hand, are intimately tied to a particular rendering in terms of pixel area and color depth. Bad things typically happen if you try and resize bitmapped images as the pixels in the image do not encode any semantics about what the image represents. In short, they cannot be repurposed to different shapes, sizes or color schemes without significant loss in quality.”
Is it Live or is it Memorex?
In English, what McGrath is talking about is similar to what happens to the ‘exactness’ of an image when you convert and compress huge images from your digital camera.
Another example of what the author is getting at, at least the heart of his message can be demonstrated in listening to the difference between recording a sermon at 16bit at 44khz stereo versus 8bit at 22khz stereo. The latter being a tinny knock-off of the former. Granted, while this might be an improvement for some of the ‘musicians’ at your church, such variances can actually kill someone when dealing with medical or military imagery.
So now that I’ve scared the mess out of those of you whose physicians are bragging about their latest venture into Java-based web services, let’s keep in mind this article is a couple of years old … nor do I entirely agree with McGrath’s closing argument where he writes:
Yes, it makes sense, for all sorts of reasons, to separate content from presentation. Yes, XML is a great technology for helping you achieve that.
However, sometimes, the medium is an inextricable part of the message. The next time someone tries to sell you a line like “just separate the content from the presentation with XML” be warned — it is not necessarily that simple.
Three Ways to Skin your XML Cat
While I agree there is some data in which compression and/or denormalization is synonymous to adulteration, I also know from my RDBMS upbringing that quite often the best treatment of binary data is just to leave it as is and instead modify the storage and/or transmission mechanism to make exact duplicates. For example, if I want to save the image of a fingerprint, I save it as a binary large object, or BLOB for short. The same mechanism can be used text file using some form of a foreign language encoding; though some will argue in favor of a ‘VARCHAR‘ field.
Regardless of what your religious beliefs are concerning datatypes, a rule of thumb to remember is that there is always more than one way to skin the binary data cat, XML is no exception to this rule. A point well made by an article written two years before McGrath’s entitled “Handling Binary Data in XML Documents” by Lisa Rein. Here she explains two common methods for transmitting medical imagery:
- external entity and notation;
- MIME data types.
I won’t go into the mechanics of how each of these methods are employed, you can get that from Ms.Rein’s aricle. You can also see these methods further explained in “XML, SOAP and Binary Data,” an article published back in February 2003 that looks at this same problem in practical terms of web services. So does the chapter on SAX in the Java™ Web Services Tutorial.
But Dean, didn’t you mentioned “three ways” to get this done!?
Why yes I did, and I’m glad you asked. The third method deals more with text and one in which I made reference to in my 14-Feb-04 post “the Gospel, according to RSS and/or Atom” when I wrote:
Finally, we either needed to encapsulate the RSS::<description>, Atom::<content> with <![CDATA[ ... ]]> to accommodate the hyperlink to the audio rendering, or figure out some way of listing the audio version as an alternate link.
Just in case you need some help connecting the dots – or bullet points to be more concise – one way you can address the “Table” issues mentioned by McGrath is to:
- encapsulate the data within a <![CDATA[ ... ]]> tag