Web Mining with Perl

LWP, which stands for the libwww-Perl library, is a common collection of modules that is often installed on most web servers sporting Perl. These Perl modules provide a consistent and simple application-programming interface (API) to the World Wide Web (WWW). LWP provides support for redirection, cookies, basic authentication and robot.txt parsing.

When it comes to sucking down content from another page, a capable coder can rely on LWP::Simple. This lovely little CPAN module allows the developer to store the head or body of a web page in a scalar variable or file.

In other words, screen-scraping. Though XML is the preferred transport weapon of choice, not everyone is playing savvy with syndication yet. In which case, you may find yourself doing a bit of Web Mining with Perl.

Confused? Just wait for my next article and it’ll become a bit clearer.

