Using Cron with LWP::Simple and XML::RSS to retrieve news feeds

2003 November 23
by MeanDean

Originally published on March 24, 2003 when the war in Iraq was heating up and I found direct links to popular RSS news feeds were effecting the speed in which pages loaded. I’m reposting this article for reasons that will become obvious later this week. Until then, enjoy this “Spidering Hack!-)”

Adding some syndicated news feeds is a nice way of adding some compelling content to your site. The problem is that sometimes the news feed gets overrun during heavy news days, go offline and/or suffers a host of other connectivity issues that make YOUR site load slow because the software holds your user hostage while the feed retrieval portion of the application has to wait to timeout. You see this alot with PHPNuke and PostNuke sites.

A simple way around this problem is to use a program that periodically retreives the feed and slices-n-dices it into an easy to include file on your host. Doing this achieves five goals

  1. user page loads are not penalized when feeds go down
  2. failures to connect do not harm the existing include file
  3. multiple attempts to read the feed to not penalize user
  4. feed can be mirrored for local/private use
  5. content can be formatted to taste

Below is a little program I wrote Thursday to grab news feeds from an AP Wire I found via Scripting.com for inclusion on blogs4God. Using the following CRONTAB syntax, the program is executed every 30 minutes.

30 * * * * /home/YOURPATH/getap.pl>/dev/null

The nice thing about this approach is that this particular feed does “get busy” from time to time and at one point on Friday went offline. My users did not notice because in most cases, I was able to get by the “busy signal” on the 2nd or 3rd attempt out of 10. In the case where the feed site went offline, my users merely viewed and older include file without interruption or delay.

Anyway, since I haven’t posted anything worthwhile in the past few days, I figured this was a good pennance:


Of course, now if the impatient news media and certain ‘Ophrahized’ peacenicks would quit interpreting combat casualties as a total military failure … we’d have something worth reading — but I digress.

5 Comments leave one →
2003 March 25

This is off-topic, but would it be possible to write a TrackBack program for Blogspot? Has anybody done that already?

2003 March 25

There is a standalone trackback tool (http://www.movabletype.org/docs/tb-standalone.html) for movabletype, however this requires access to a server with CGI access, something that most people have. Maybe you could find a nice person though who would host the file for you?
Actually if you read the very last part of the documentation they write…
“(Possible use) 3. Centralized tool
This TrackBack tool requires that the end user have the ability to run CGI scripts on their server. For many users (eg BlogSpot users), this is not an option. For such users, a centralized system (based on this tool, perhaps) would be ideal. ”

Hopefully someone will go with it, like those people that run websites that do remote comments

2003 March 26

Interesting, it’d be great if somebody would start a service like that. I’d do it, but I’m not really that techno-literate.

2004 May 8

I’ve taken your newsfeed code above and added some modifications as follows:

I linked to a Moreover RSS feed. There are many available and you can see all feeds available and implement the one you want at http://w.moreover.com/categories/category_list_rss.html

Added a line to delete the old include file before creating a new one. My server did not want to overwrite the old include (don’t really understand why because the directory is CHMOD 777).

Added a line to include the content description from the feed.

Increased the number of feed items output to 30. Most RSS contain approx. 15.

Added target=”_blank” to all links. Opens the newsfeed items in a separate window.

Revised Code is available at http://www.bvmc.org/pub/getap.txt

Remember to change the first line
#!/usr/local/bin/perl -w
to meet your specific needs. Mine needs a ‘local’ statement. Many do not.

I load the “getap.pl” file to my cgi directory and CHMOD to 755.

I create a “news” subdirectory under by cgi directory and CHMOD 777

I place an empty “newsfeed.xml” file in the “news” directory and CHMOD 777

I telnet in and “perl getap.pl” to get it all running.

I set up my CRON to run the getap.pl once per hour (you can set yours to the desired frequency you want).

I created a ’shtml’ file to hold the feed results. You can see it at http://www.bvmc.org/world_news.shtml

Good luck. Any ideas or other feedback is appreciated - especially if anything wrong is noted herein (Do not fear the reproach of men or be terrified by their insults. Isaiah 51:7).

Jim Konicki, Webmaster, Blessed Virgin Mary of Czestochowa Church - Latham NY (www.bvmc.org)

2005 February 9
Andre permalink

Just asking if I could run the code in IIS, because there isn’t XML::RSS module for it, I tried to use XML::RSS::Parser but it didn’t work, any help?
But if it works, it’s great, specially because it is in PERL.

Leave A Comment

You must be logged in to post a comment.