With more church webmasters taking advantage of free, one-click installs (e.g. WordPress, Drupal, etc …) provided by inexpensive web hosting solutions, I figure it is time to provide a quick tutorial on how to harvest useful operational, user and security information the error logs using a variety of commands already at your disposal – free.
“I have error logs” some ask? To which my response is: “Probably, did you ask your host provider?”
Once you do find your error log file(s) – and most reputable hosts do provide them, usually through whatever host management application they provide (e.g. CPanel, Plesk, etc …) – then it’s time to answer the not asked often enough question “what do I do with them?”
Below is ny semi-definitive, and most certainly emphatic response:
Resolve 404 errors
404 is an HTTP response by your website’s server to a user-browser request to a file not found. This information is tracked in your access logs, but usually and often is included in your error logs.
Here is why this is important to you – reducing 404 errors:
- reduces user frustration;
- points out bugs in your configuration;
- saves you gobs and gobs of disk space;
- points out potential vulnerabilities; and
- once fixed, improves available user bandwidth.
First thing you need to do is figure out how your error log works, and what type of verbose messages it may or may not offer.
Short digression: Yes folks, for today’s lesson I’m assuming you are hosting on some form of a *nix platform – though one can actually perform the following functions on a Windows-based machine applying command line UnxUtils against a long file either on a server or FTP’d to your home computer.
Getting back to today’s lesson, here’s a simple example of what command-line I would enter if I wanted to see the last 50 lines of my error log:
tail -50 error.log
This quickly gives me insight on the type of error messages available. For the ubiquitous 404 error – which in my world is recorded in the error log file in plain English as “File does not exist” … your mileage will likely vary. With this key phrase in mind, I can now enter the command:
grep "File does not exist" -i error_log
Parsing logs into human-readable columns
Problem is, I probably get more information than I want. What I’m simply after is which IP is getting the error, how often, and on what page request. For that, I “pipe” the output from the “grep” command through Perl – which in turn parses the results by spaces.
grep "File does not exist" -i error_log | perl -l -a -n -e 'print $F," ",$F'
Counting the spaces, the IP address in my logs hits at position 7, the errant file at column 12. You’ll likely have to figit with these to get it to produce the results you’re interested in.
Once you do, my suggestion is directing these results into a temporary file you can visit for later use. For example:
grep "File does not exist" -i error_log | perl -l -a -n -e 'print $F," ",$F' > 404errors.05mar08.txt
Once you see where the errors are occurring, usually its just a matter of creating a more comprehensive 404 request manager, and/or replacing a file that got accidentally deleted.
Excluding certain entries
One last trick – let’s say you’ve fixed two of your errant files, and now want to see what remains in your error log.
Try this one on for size:
grep "File does not exist" -i error_log | egrep "\/(file1\.html|file2\.png)" -i -v | perl -l -a -n -e 'print $F," ",$F' > 404errors.05mar08.txt
Note that I used egrep instead of grep, the ‘e’ standing for regular ‘e’xpressions, which when coupled with the “exclude” operator of ‘-v’, provides us with a list of errant files excluding those you just fixed.
I realize that this may sound like ‘ancient geek’ to some. If that’s the case, then my advice is ask your hosting provider what type of error stats may be available through a pre-packaged application that many hosts provide such as “awstats” and/or “webalizer.” They don’t provide the ‘gory details’ one gets with the command line options above, but it’s good enough.
Yet for those who dare, there are additional benefits to learning how to parse your own error logs – for example, scheduling the above commands (that pipe into a file) in your cron table so you can quickly identify broken files and/or interesting inquiries from bad boys using a variety of anonymous proxy services and/or browsers in an attempt to set-up my blog as their own personal spam-bay.
You can also save money, support calls, and/or bandwidth by identifying missing pages, images and other fixable omissions.
For them, I have some .htaccess hacks awaiting them based on the useful input they provided me via my personally parsed error log.