I’m in a situation where I need some custom parsing of a rather large Apache log file, a.k.a. to some of you as access_log. I’ve got some homespun Perl for the job, but was curious to see what was out there. One free/open source application that caught my eye was Scratchy – The Apache Log Parser and HTML Report Generator for Python.
According to the ‘About Page,’ Scratchy is a set of Python scripts to parse Apache web server log files and extract useful information. Scratchy can use this extraced data to create HTML reports so website administrators can easily view the digest their audience, trends and possible attacks. Extensibility being a primary goal of the project, the report appearance can be easily modified by tweaking a single config file.
I think it was the extensibility thingie that really got my attention. By modifying the relatively straight-forward configuration file, I can then automate nightly log processing that fits my specific needs. For those of you new to this site, such automation on a Linux system is accomplished by making an entry in the CRONTAB. Microsot IIS users should use the scheduler control panel, once they’ve installed Python.
The other thing that interested me about this project, is that I’ve been meaning to ‘Dive into Python.’ It seems many disenfranchised PHP users are headed that direction, and since I already understand access_logs and regular expressions, I figured this would be a good real-world example to examine.