Heal Your Church WebSite


Teaching, rebuking, correcting & training in righteous web design.

Dealing with comment spam

Yesterday, I came home from church to find someone had left some comment spam on my article entitled “Beyond the Blog and other links on making MovableType a Content Managment System.” This bothered me on several levels. Not only because this article is popular with those looking to use MT as a comprehensive CMS for their church or charity web site, but also because I feel online casinos and pr0n are abusive forms of ‘entertainment’ that often feed upon the addiction of those who can least afford it … and/or upon impressionable youths. So I took immediate action … and added the following line to my .htaccess file to block this guy’s IP from my entire site not using MT’s block, but by modifying my .htaccess file.

deny from 213.81.196.104

There are some other things you need to do to get the above to work … so if the above is new to you, then may I recommend reading the “Comprehensive guide to .htaccess/ Deny users by.” It’s short, sweet and not-so-geeky that most of you can get your arms around this.

Well, the above kept my site from getting multiple comment/spam posts, but didn’t stop this guy from revisiting. But since he came back via a different IP (80.117.31.181), I decided to check my user logs to see if there was a discernable pattern so I could block this guy before he left a comment … at least for now.

Like many of you, I host my blog on a service at a large, generic and inexpensive server farm. The downside is that sometimes you get a real stinker for a web host. The upside is that all of these systems are generally run on Apache servers equipped with CPanel as an administrative interface. So to get my “raw” access logs, I need to enter my username, my password then the URL for today’s access log. Having done this more than once, I use a FREE little program called WGet to retrieve the file from the command line … which works the same in both Windows/DOS as well as Linux:

wget –http-user=USERNAME –http-pass=PASSWORD URL

As you can see, WGet is VERY cool and can and should be used whenever you want to mirror your site and download a large ISO image, even when you’re Internet connection gets dropped intermittently. Just please, avoid the temptation of using WGet to consume the bandwidth of spammers.

Now I know what you’re thinking “but Dean, my raw user log comes in a .tar.gz format …” Yes, so does mine, but don’t panic, all you need to do is unzip and uncompress the file using either one the following commands:

gunzip
tar -zxvf FILENAME.tar.gz

I prefer the latter, but need to use the former when I’m in Windows/DOS Command line mode. I know what you’re thinking “but Dean, how is it possible for you to use *nix commands in Windows.” Glad you asked. I use the GNU utilities for Win32. And so should you if you don’t want to read every single line of your raw access log. What do I mean by this? Aside from WGet, Tar and gUnzip, one of the other tools I get with the this package is a utility to scan one or more input files named GREP.

Which is what I did using the following syntax:

grep “myspammedarticle.shtml” myaccesslog-10-6-2003

Now I know, I suppose I should have used regular expressions and scanned the log for the IP addresses of the spammer, but since he was going after the same article, I figured a straight-up text search of the filename would do the trick … and it did. What it revealed is that the dreaded ‘Marcos spam’ was coming from an a “Slovensky” using Google to find a URL he had advertised on my site sometime in the past. Well in the past, we’ve talked about using mod_rewrite to handle everything from making search-engine friendly URLs to how to block spambots, ban spybots, and tell unwanted robots to go to h-e-double-hocky-sticks.

Now what I needed, and fast, was to find was an article that discussed redirecting or blocking a visitor based upon an element in the signature of the site that sent them here … that is, block someone based upon a text match in the referring site. Enter fellow blog4God techBlogger, Eliot Landrum. Back in may, Eliot had dealt with this problem in an article entitled “More Complete Blockage.” There he offers a very simple RewriteCond based upon the $HTTP_REFERER.

Without going in as much detail as Eliot, here is a quick example of my own on how I might block an individual using a Google search based upon the URL of a casino site they may have previously spamvertised on my site:

Options +FollowSymlinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_REFERER} btd-online-casino\.com [OR] # comment spammer
RewriteCond %{HTTP_REFERER} (keyword1|keyword2|keyword3) [OR] # keyword blockage
RewriteCond %{REMOTE_ADDR} ^64\.140\.49\.6([6-9])$ [OR] # turnitin spybot
RewriteCond %{HTTP_USER_AGENT} anarchie [NC] # spam harvesting program
RewriteRule .* – [F,L]

As you can see, I added some other stuff as well just to give the directive some context. All this said, my point here isn’t to impress you with my technical wizardry, because as you can see from all the hyperlinks, much of this is stuff I’ve learned from others. No, rather I would hope you keep this article in mind the next time someone leaves spam in the comments section of your blog, both to learn how to block the offender, and to leave us a note to what other signatures such scum are using … so we can all encourage such individuals to take-up a more respectable line of work.

Meanwhile, I’m going to spend some time reading an article by the good folks at the Kalsey Group entitled Distributed comment spam prevention to see if there isn’t some way we can use their distribution list to automagically ammend my .htaccess file.

12 Comments

  1. Apparently I need to work on preventing comment spamming too. Been getting hit the past few days with it. Very annoying!

    Thanks for the added tips.

  2. I appreciate your tips but my web host is running IIS on Win2k. It’s all pretty much irrelevant to me, unfortunately.

    I guess I’ll just have to keep removing the spam and banning in MT as it happens, well unless I or somebody else figures out a way to get it done more automatically.

    But, thanks again for the tips.

  3. Jon … actually, them Gnu Tools for Win32 ( http://unxutils.sourceforge.net/ ) could prove handy for a variety of things … they have for me.

  4. You might want to check out Jay Allen’s comment spam macro killer for Movable Type: http://www.jayallen.org/journey/2003/10/comment_spam_macro_updated

  5. I’ll second the nod to Jay Allen’s solution. It’s really clever, easy to administrate once it’s up and running. And it works, I tell ya.

  6. Trackback spam will be the worst if/when trackback really takes off. It’ll be a lot quicker and easier for spammers.

  7. Pingback: Quick Links

  8. Pingback: A Chronicle of the Christian Faith

  9. Pingback: A Chronicle of the Christian Faith

  10. Pingback: A Chronicle of the Christian Faith

  11. Pingback: A Chronicle of the Christian Faith