Heal Your Church WebSite


Teaching, rebuking, correcting & training in righteous web design.

Using .htaccess to deal with a recent flood of trackback ping spam

“Holy smokes, I’ve been hit!

My comment spam ‘secret code’ filter is working like a charm – no spam in weeks, but now they’ve decided to spam through trackback. The other day I had two new trackback pings on older entries, both spam. This morning I had 135, all spam. Yikes. So, later today I’ll be deleting away, but it will take a while…” – Salguod.net, February 01, 2005

Updates 2
Had to make some changes, the spammer decided to ‘teach me a lesson’ by adding healyourchurchwebsite to his referrer. I’ve tightened that up – and in the process, also snarfed some information from him/them – and am now in the process of filing a formal complaint to the Feds. Take note to the sections in yellow where my examples include healyourchurchwebsite…

The Problem
Like many of you, I noticed a spike in Trackback Spam pointing to various card-shark subdomains at terashells.com, chat-nett.com and other domains that are sure to change on a daily basis.

First thing I noticed: the same crap coming in from a variety of anonymous proxies. This mean blocking by IP would quickly become a full-time job. As a stop-gap, I employed a girthy but quick-n-dirty .htaccess solution offered at Aaron Logan’s Loblogomy blog.

I knew I’d have to find a more efficient approach, I also know that Mark Pilgrim’s ‘How to block spambots …” was causing some other issues on my server because I suspect my server is configured slightly different than his. This happens.

Still, I didn’t want snoopers like the one I saw from BranDimensions.com, not that I’m hiding anything, but they’re not paying me for my bandwidth even though they profit from it. I needed a solution to solve my short-term trackback spam issue, and take care of my long-term no-pay no play policy regarding the commercial abuse of my bandwidth.

The Not-So-Final Solution
With not all that much searching, I found that Parker Morse of Flashes of Panic offered an elegant .htaccess approach that would get me 98% of what I needed. Ina post entitled ‘A little meanness,’ Morse employs a Blocking Referer Spam – mod_rewrite technique developed by Ed Costello back in May of 2004.

The (obligatory) Warning
Before we go any further, I need you to understand that while this is an excellent approach, it is not without its dangers. Dangers made clear in an absolutely must read, related post entitled “Killing referrer spam,” Caveat Lector offer this excellent advice:

BE AWARE: YOU CAN BORK YOUR WEBSITE WITH THIS. I’ve done it. (In fact, I did it two minutes ago. Go me.) How will you know your .htaccess file is borking your site? Well, usually, when you browse to your weblog’s URL you’ll get a “500 Internal Server Error” page of some sort instead of your beloved weblog.

Always, always, always keep a last-known-good version of your .htaccess file! If you’re using FTP to place your .htaccess file and you bork your site, you just upload the last-known-good file, and you’re golden.

Or in my case, working from a jailed ssh session I was able to do the following:

wget http://www.flashesofpanic.com/htaccess.txt -O htaccess_parker.txt
pico htaccess_parker.txt #see modifications below#
cp .htaccess htaccess_02feb05.txt
cp htaccess_parker.txt .htaccess

The Modifications
After downloading Parker’s text file version of his .htaccess file, I gave it a quick inspection and modified the following line:

from:
SetEnvIfNoCase Referer .*flashesofpanic\.com.* !spam_com

to:
SetEnvIfNoCase Referer “.*(blogs4god|healyourchurchwebsite|redlandbaptist|mission4me) *” !spam_com

The script also needed to be modified because I found some problems when trying to enter a post using my crufty old version of MovableType, so I had to add a line to Parker’s otherwise excellent approach. A problem also described in Laurabelle’s Blog article “Die spammers die!“. So after adding a few more drug names to the kill list, I immediately followed with another line of code:

SetEnvIfNoCase Referer “.*(phentermine|diet-pills|p …
SetEnvIfNoCase Referer www\.healyourchurchwebsite\.com\/cgi-bin\/mt/mt\.cgi.* !spam_ref

I suspect this fix was necessary because the way the .htaccess file is set-up, everyone is considered a spammer until we say they’re not. More on how-to modify and the mechanics of how this all works can be found over at Caveat’s column.

Finally, you may want to block the user agent CandyGenius has identified in this delicious post which asserts:

The trackback spammer is leaving the same signature as the comment spammer. It’s the same guy. Use the code above to block it all. (psxtreme & freakycheats but that will change tomorrow.)

Testing
A quick-n-dirty test of this is to Google your domain using one of the forbidden words. This is because that word will now appear in the referrer header from Google and you should be able to block yourself. For example “healyourchurchwebsite poker.” Not the most fool-proof test, but close enough for government work.

Now if I could just get rid of those irritating 414 generators trying to hack into an IIS server … which I obviously don’t use … I’m sure there’s an .htaccess solution out there.

Likewise, let me know if you have improvements or patches … I’d be interested in seeing them.

Update 11:54 AM
It is becoming evident that this trackback spamming is less about advertising, and more about denial of service. For about 2 hours this morning, my server was under attack – the information below thwarted all but two trackbacks out of several hundred attempts. In the meantime, I am pondering whether or not I should enforce my terms of service and provide the spammer a bandwidth test using a variation of the following wget command:

while [ true ]; do wget -r -nd –cookies=off –cache=off –proxy=on –delete-after –user-agent=”all your trackback spam is sucky” “http://online-poker.chat-nett.com”; done

However, if this is about denial of service, and since the spammer is abusing several anonymous proxies, it could be that the owners of the URLs are also innocent victoms. Your thoughts?

12 Comments

  1. Wow, it’s going to take me white to get my head around this. It sounds good, but I need a serious geek-to-english dictionary, or maybe a copy of HYCW for Dummies. :-)

    I’ve got homework to do. And more spam to delete.

    So, forgive me it wasn’t clear to me from your post, was your web site being down earlier because of the combination of a serious trackback spam attack and the changes to your .htaccess file?

  2. I agree, Doug. Dean is a better geek than me. But if I can ever figure out what the dickens he’s talking about, I know it will almost certainly help me.

    So when you find the HYCW for Dummies book, share a copy, willya? Heh!

  3. So Dean, are you going to post a link to a TXT version of your htaccess file for the rest of us to learn from?

    Oh, great guru of bloggedness.

  4. I also mass block referrers based on the domain that they are leaving. It’s pretty safe to assume that anyone referring to my domain from a .info, .biz or .ru domain is a spammer, at least for my site.

    RewriteCond %{HTTP_REFERER} \.info [NC,OR]
    RewriteCond %{HTTP_REFERER} \.ru [NC,OR]
    RewriteCond %{HTTP_REFERER} \.biz

  5. Dean –

    Delurking after reading HYCW for a long time – great blog. Question: you’re depending on the User Agent to send you an appropriate Referrer HTTP header. What’s to stop the bad guys from sending trackbacks (which, though I’m not a blogger, I assume are done by standard HTTP requests) with forged headers? Since mod_rewrite doesn’t do any session handling, a bad guy can simply send you a forged trackback ping that includes the spammy link in the transfered data.

    Okay, admittedly, IANAB (I am not a blogger) so I may not understand how trackback works. Feel free to set me straight. It seems like some sort of verification on the part of the trackback-recieving post would be more effective, but I can’t think of anything that isn’t fairly easily falsifiable.

    Ah well, just some random thoughts. Thanks for a very encouraging and interesting blog.

  6. I need that dummies book too.

    I know enough to know I need the information in this entry but not enough to use it. Maybe this weekend.

  7. While I can understand that it may look like a DDOS attack, I seriously doubt that it is. I too receive a similar hit a few weeks ago, but my site is so minor I can’t imagine anyone targetting me in this way. I think it is just a spambot that happens to hit hard.

  8. Hi!

    Thanks to linking to me, but I think we didn’t have the same problem. I’m not using Parker’s whole .htaccess file, just his spam_ref=yes concept and the associated Mod_Rewrite rules that deny based on that variable. So my problem was not that I blocked myself from posting, it’s that I didn’t block the requests that I wanted to block.

    Part of my problem turned out to be that my Movable Type installation is in a CGI directory, which is a separate document root. Yours and Parker’s don’t appear to be set up that way, so your root .htaccess rules apply for your whole site. I have to have (at least) two different .htaccess files.

    Good luck with fighting spam. I’m glad that we’re all using different methods; it means that it’s harder for the spammers to find one solution that catches all of us off guard.

  9. Hi!

    Thanks to linking to me, but I think we didn’t have the same problem. I’m not using Parker’s whole .htaccess file, just his spam_ref=yes concept and the associated Mod_Rewrite rules that deny based on that variable. So my problem was not that I blocked myself from posting, it’s that I didn’t block the requests that I wanted to block.

    Part of my problem turned out to be that my Movable Type installation is in a CGI directory, which is a separate document root. Yours and Parker’s don’t appear to be set up that way, so your root .htaccess rules apply for your whole site. I have to have (at least) two different .htaccess files.

    Good luck with fighting spam. I’m glad that we’re all using different methods; it means that it’s harder for the spammers to find one solution that catches all of us off guard.

  10. Ok, I’m starting to get my head around this. Oh – and don’t take my earlier comments as criticism. It’s more a comment on my (lack of) knowledge. It’s fun getting into this stuff and figuring out how it woks. Really.

    So, if I’m following this, it’s a little like a blacklist in that it looks for certain terms (in the referrer? the post? the ??) and if they are found it blocks the post. Do I understand this right?

    Also, as I browse my site via FTP, I don’t see any .htaccess file. Is that normal? Once I create one, where should I put it? I’m assuming in the top level folder (www.salguod.net)

    Thanks.

  11. Pingback: How Now, Brownpau?

  12. Wow, sorry Dean, didn’t mean to be a prophetic “voice in the wilderness” about the referrer thing. It brings up an interesting point, though. At what point do we switch from defensively protecting our sites to actively going after (ie turning into the feds) people. Is it the point where spamming becomes/is replaced by a DOS attack, or is it somewhere sooner? How does our position as the church affect that? I wonder how current anti-spam laws would apply to Trackback spam.

    I hope you get everything straightened out. In Him,

    - Ted