Enjoying so many hits from /., I figured I better give the geeks something worthwhile. Here is an article I posted back in late June of this year. For those of you who need to post your email address on your website, this should help:
With apologies to Edwin Star, and sung to the tune of his populist hymn "War,"
let us put forward in full-throated conviction:
What is it good for?
Absolutely nothing – say it again,
What is it good for?
Anyone who has run a church web site for any period of time has had to deal
with call from the pastor asking why he’s getting spam for porn sites – well,
at least I hope your pastor is complaining about that !-) How does this happen?
Easy, you put the pastor’s email address up on the web, then indexed the site
with various search engines (even though Google still hasn’t got mine right)
and then you sit back and wait for spambots to come visit your site, slurping
down all your email addresses – later using them to bombard you with an insidious stream of ads for snake-oil, getting rich quick or attempting to lure you into oggling their dysfunctionally licentious wares.
There are several ways to handle these ‘bots, methods I’ll be talking about
in the next few weeks that I’ve found moderatly successful. I say moderatly
because there is no full-proof method, especially as church staff gives away
their email address to various online vendors and e-zines, unwittingly "opting-in"
to a world of virtual hurt. That said, the damage from bots can be minimized
provided you understand what they are and how they work.
Essentially these ‘bots’ are software programs that act as browsers, sucking
down your content the same way MSIE or Netscrape does. Only these programs toss
out everything except hyperlinks, and email addresses. Anyone familiar with
Perl understands how easily this can be accomlished with a combination of LWP, HTML::Entities
and HTML::TokeParser. Regardless of the language, these bots traverse the found
links, and sell/store the email addresses either to spam you, or to sell our
email address to other spammers.
The trick then is to either mangle or ‘hide’ your email in plain site. One way
to do this is to use NUMERIC or HEXADECIMAL encodings, that is use codes instead of characters
to represent your email address. The advantage to this is that when a normal/nice visitor clicks on your link to an
email address, it works just fine. But for the spambots, they either send their tripe to an invalid email address because they
didn’t unencode it – which raises their operating costs and adulterates any list they’re selling — or they miss harvesting the address altogether because it doesn’t ‘look-like’ your average everyday email address.
This technique was the topic of conversation over at A
List Apart in which Dan Benjamin offers the NUMERIC encoded approach in combination
this approach may not be well suited for your site &/or audience. So instead, what
the “industrious” spammer has taken the time to build a smart flexible ‘bot, then I’m safer using my ‘obfuscated’
address as opposed to hanging one out there in plain text. I also encode the
"mailto:" in a further effort to make email links look-n-feel like
hyperlinks. To wit, I offer you the following little application:
Go ahead, give it a try, then try some others like the one mentioned at A
List Apart. What I’m hoping is that many of us will use a wide variety of anti-spam approaches. My thinking is that if we all used the same obfuscation tool, then spambots would have our lunch and eat it too. The more tools, the more moving targets. The ‘bot-heads, being a lazy and unoriginal bunch, would hopefully ignore us.