For those of you old enough to remember the original Star Trek: The Motion Picture, you might recall a gruesome scene early in the movie where a transporter malfunction turns two incoming shipmates into disfigured piles of short-lived screaming flesh. That’s sorta the image that came to mind yesterday when I looked at the Redland Baptist homepage and noticed that my VerseScrape program had to horribly mangle yesterday’s incoming words of wisdom from the Book of Proverbs. That is, my screenscraper failed due to the changes in the incoming source file; notably, the highly identifiable and easy to tokenize parenthesis surrounding the scripture verse have been removed.
Unlike the aforementioned unfortunate Enterprise members, VerseScrape is fixable, however to do so would require bringing in the girth of the Scripturizer module to identify and hyperlink the scripture reference.
Another shortcoming contributing to the demise of VerseScrape is the fact that I hard-coded the output in my examples. This means that whenever I make changes to fix the program, you not only have to download and deploy the fix, but you must also fix the example code to suit your site’s display. So rather than just offer a variant of my article “Using Cron with LWP::Simple and XML::RSS to retrieve news feeds,” where I also hard-code the output, why not instead practice what I preached in Chapter 14 of “Son of Web Pages That Suck” and use XSLT?
Just to catch some of you up to speed, XSLT is short for eXtensible Stylesheet Language Transformations.
Yeah, I know, it sounds scary but simply put: XSLT is a mechanism in which two files create a third file. The first being an XML file, such as an RSS 2.0 syndication file. The second file is an XSL file. When smooshed together via a transformation application/module they result in whatever file format and media type you defined in your XSL file. You can read more about it over at w3schools.com.
Assign Once, Iterate Often:
One of the other reasons I created VerseScrape was because incorporating dynamic feeds into your web pages, regardless of format, can and will slow down your page load times. As suggested in “Using Cron with LWP::Simple …” one solution is to employ a loop that makes several attempts to copy the feed locally, AND THEN process it so network failures won’t negatively impact your site’s performance.
The first example is a Perl program that after successfully downloading the ESV Bible Daily Verse RSS 2.0 feed, employs XSLT to create an include file. Beneath that is a PHP program that does the same. Both code examples call an XSL sample I’ve also provided that you can modify to suit your website’s specific needs; regardless of how many times I’m compelled to fix either the Perl or PHP versions.
I realize some of you may be asking why I’ve switched from the IBS to the ESV. The answer is two-fold, yet simple:
- The IBS is still using the 0.91 RSS specification, whereas the ESV uses 2.0;
- The Scripture the ESV provides is generally under 264 characters, and doesn’t include embedded HTML tags – see my article entitled “the Gospel, according to RSS and/or Atom” for a more in depth discussion of this.
If you would still rather not go the XSLT route, then you might want to pay a visit to an article I wrote last summer entitled “English Standard Version Bible RSS Feed” where I demonstrate how to slice-n-dice the ESV RSS 2.0 file using either use XML::RSS or use XML::RSSLite. If you do have questions, make improvements or find bugs in the above, don’t be shy, share your findings in the form of a loving comment.