Heal Your Church WebSite

Teaching, rebuking, correcting & training in righteous web design.

April 12, 2017
by meandean
Comments Off on Spring Cleaning 2017

Spring Cleaning 2017

As if anyone follows this nearly abandoned blog anymore, here’s what I’m up to:

First, I’m mostly blogging about product management in a Lean and Agile context over at DeanOnDelivery.com with the same sprinkling of technical goodness and wacky humor y’all enjoyed here for about 10 years.

So you can catch up with me there, or perhaps at my twitter feed @deanpeters. Think of these as placces you boss me around with  your great ideas. It doesn’t cost to give it a try.

Second, it’s time for …

Spring Cleaning 2017

Let’s either reboot, or retire this blog.

  • I’ve already made private close to 775 out of 1050 blog posts I feel are no longer relevant.
  • The privatized blog posts will eventually be removed altogether and put into a deep freeze reference somewhere.
  • I’ll continue to chip-away at the content until I get things down to about 200 relevant posts.
  • Move any code snippets over to github, linking to them from here.
  • Expect a significant change in formatting, probably a parallax-ish.
  • Likely implement some sort of slack invite call-to-action button thingy here.
  • I’m doing alot with elasticsearch and azure search these days. Natural language processing too. Need to figure out how to perhaps introduce a little machine language fun as part of this blog. Dunno yet. Depends on what y’all say in the slack channel.

Apologies in advance to all those visiting this blog in hopes of updates on XP, Movable Type, dhtml, and other acts of obsoletism.

December 24, 2013
by meandean
1 Comment

Bad idea design poster #10 – Feature Creep

The misguided notion that somehow more is always better.

  • Main Entry: Feature Creep
  • Pronunciation: /fee-cher,kreep/
  • Function: intransitive verb
  • Etymology: Middle English feture crepen, from the act of over-building something
  • Date: December 24, 2009

Remember folks, flee from temptation to ‘gizmo’ up your site.

The misguided notion that somehow more is always better.

Instead, focus on workflow – that is the things your users want/need to do/learn from visiting your website.

April 7, 2013
by meandean
Comments Off on find-a-bot.sh – a nice little script to ID bots bugging your website site

find-a-bot.sh – a nice little script to ID bots bugging your website site

a nice little script to ID bots bugging your websiteOriginally published on May 30, 2008, made some modifications & bumped it up in the display queue.

Already demonstrating earlier this week how to block spambots and rogue spiders. Today I’m completing the lesson with a nice little bash script sample that can help you identify some of these non-browser ‘candidates’ by parsing your access logs and placing the results in an easy-to-read text file.

In other words, this script will selectively find most non-browser user agents that appear in your access logs like this:

24.190.239.220 - - [29/May/2008:05:16:19 -0700] "GET /about HTTP/1.1" 200 628 "-" "Java/1.6.0_06"
79.71.205.134 - - [29/May/2008:00:56:34 -0700] "GET / HTTP/1.1" 200 12888 "-" "Site Sniper Pro"

And turns it into a slightly saner and sorted output like this:

24.190.239.220 [29/May/2008:05:16:19 "Java/1.6.0_06"
79.71.205.134 [29/May/2008:00:56:34 "Site Sniper Pro"

Here is what your bash script might look like on a site running WordPress on shared host like DreamHost … I’ll explain some of the mechanics afterwards:

#!/bin/bash
#
# step 1 - modify these so you get paths like this:
#   /home/YOURROOT/YOURDOMAIN.coM/...
#
myroot="YOURROOT"
mydomain="YOURDOMAIN.COM"

#
# step 2 - leave alone if these days & formats work for you:
#
TERM=linux
export TERM
tdy=`date +%d%b%y`
ydy=`date -d '1 day ago' +%Y-%m-%d`
dby=`date -d '7 day ago' +%Y-%m-%d`
logfile="access.log.$ydy"

#
# step 3 - modify if you're using something other
#           than  WordPress on DreamHost
#
outfile="/home/$myroot/$mydomain/findabot"
logpath="/home/$myroot/logs/$mydomain/http/"
csspath="/home/$myroot/$mydomain/wp-content"

#
# step 4 - mother of all parsing statements, parse to taste
#	(note this version DOES sort)
#
# 	remember \ at the very end of line equals
#	bash line continuation of a command set
#
grep "$csspath" -v $logpath$logfile | \
  egrep " \"(Mozilla|Opera)\/[0-9]| \"BlackBerry[0-9]{4}" -v | \
  perl -l -a -n -e 'print $F[0]," ",$F[3]," ",$F[11]," ",$F[12]," ",$F[13]' | \
  sort -n > $outfile/$ydy.txt

#
# step 5 - maintain a manageable archive
#
if [ -e $outfile/$dby.txt ]; then
	mv -f $outfile/$dby.txt $outfile/bak.txt
fi

Okay, step 1 basically means you login to your site either SSH or even FTP and before navigating anywhere, issue the “pwd” command so you can determine your YOURROOT and YOURDOMAIN (though the latter may likely be your website’s url).

Step 2 is how we get date stamps for our input and output files. I found a nice simple example of date variable formatting of these over on an ExpressionEngine manual – but they’ll work in your bash script just fine.

Also, that line containing “7 day ago” can be modified to indicate how many days worth of logs you want to keep active. Similarly, the prior line containing “1 day ago” means you want to parse yesterday’s logs.

Step 3 is basically how I use variables to define file and directory paths based on what I coded for steps 1 and 2.

Step 4 combines all the elements from the above steps and taking a page out of my April 2nd article entitled ‘How to quickly check your error logs for oddities‘ issues a consecutive stream of grep and/or egrep commands.

Sometimes leveraging the ‘-v’ command to exclude elements, most noteably when I’m excluding known user agent strings for browsers.

This done, a bit of PERL command line magic is used to parse out the fields we want, where afterwards the selected data is sorted and piped into the output file defined in step 3.

Step 5 takes into account that logs can get big, so this is where we manage an archive … based on step 2 … for 7 days worth of entries.

find-a-bot gets into the bits and bytes of web site bottageIf you’re not familiar with creating bash scripts, you may encounter situations where you need to “chmod” or even “chown” the file to get it to work.

The next step – though not documented above – is to test the script and when you’re sure it’s working, modify your crontab file so your batch runs every night, like say 2:15 AM while you and everyone else are sleeping. Here’s what my crontab entry looks like:

15 2 * * * /home/YOURROOT/find-a-bot.sh > /dev/null

I’ve provided a .txt version of the file you can simply download from here.

Moreover, I’ve created a slightly more complex version to download of the above for use on a system running a something like vBulletin on a root or virtual private server operating with Fedora or RedHat.

The point is, while the above appears a bit complex, I can assure you it’s worth running as it can help you quickly discern over the course of a few days:

  • how often and how hard spambots are sniffing your system
  • how much of your bandwidth is consumed by feed readers versus browsers
  • which feed readers are hammering away at your site, ignoring your <skiphours /> and/or <skipdays /> data
  • how much bandwidth you might save by exporting your sermon’s RSS feeds to a service like FeedBurner
  • what spiders are ignoring your robots.txt file
  • tips on unusual visitors from interesting places from unique user agents
  • whether or not some of the comment spam is via “Mozilla-like”agents who botch their user agent string
  • how many of your visitors are infected with spyware
  • how many of your visitors are trying to hide their tracks by visiting you with an anonymous proxy firing blank user agent strings
  • how many spamblogs are leaching your compelling content

Like I said, it will require just a little bash script know how, so with that, I leave you with these tutorials:

Oh and if you’re nice and leave a comment, I might even email you a link to my own archive of greatest bot hits over the past few days.

Especially if you share your own scripting recipes for spotting bots.

January 7, 2013
by meandean
2 Comments

Turning Spam Pings into a HoneyPot

Originally posted 

As the BrownPau reports, the Trackback Ping Spammers have been relentless – expending hours and energy figuring out new ways to waste our bandwidth and to destroy the blogosphere. So pardon me if I offer yet another post and yet another approach in an attempt to encourage these crooks to earn an honest living. This time taking a honeypot approach to any successfully posted trackback ping spam.

The Wikipedia defines a honeypot as:

… a trap set to detect or deflect attempts at unauthorized use of information systems …

The primary value of a honeypot is in the information it provides, which can be used for things such as detection, early warning and prediction, or awareness.

So here is my thinking, even though my .htaccess solutions are turning away hundreds of trackback attempts each day, one or two are sneaking through. That said, I’ve noticed that most of these attempts, successful or otherwise are from a somewhat finite set of anonymous/open proxies. Yes folks I’m talking about IP blocking, but not in the conventional sense.

Herding Cats

Now I know blocking IPs is like using vice-grips to contain Jello but remember, security is about layering counter-measures. So using some IP blocking along with some other techniques I’ve discussed earlier continues to harden this site, hopefully to the point of getting the spammer too go away — or at least go bother someone else.

Similarly, they come in bunches, usually early in the morning, or as in this evenings case, shortly after the start of the SuperBowl. It is for these same reasons, I suspect there will be a spam attack sometime tonight, it being Sunday night.

IP Mining

A few night back, when my site got hammered, I decided to clean my blog by directly manipulating the database — in this case using phpMyAdmin. My first thought was to generate the names of the offending referrers so I could amend my .htaccess file using the following, rather inefficient but gets-the-job-done SQL query:

SELECT DISTINCT x.tbping_blog_name
 FROM mt_tbping AS x, mt_tbping AS y
 WHERE x.tbping_ip = y.tbping_ip
 AND(y.tbping_blog_name LIKE "%texas%" OR
       y.tbping_blog_name LIKE "%poker%");

But then I grinned and thought, “Hey wait, why not let those one or two out of a lucky hundred spin their wheels when they come back for more?” which was immediatly follwed by “Foo, I don’t want to hand-jam all those addresses from my email to MT.

Then I grinned even broader after making a backup of my database using MySqlDump, and typing in:

INSERT INTO `mt_ipbanlist`
 (`ipbanlist_blog_id`, `ipbanlist_ip`,`ipbanlist_created_on`,`ipbanlist_modified_on`, `ipbanlist_created_by`)
 SELECT `tbping_blog_id`, `tbping_ip`, `tbping_created_on`, `tbping_modified_on`, '99'
 FROM `mt_tbping`
 WHERE tbping_blog_name
 LIKE "%texas%" OR tbping_blog_name
 LIKE "%poker%"

Viola, no more automated spam from the spammer’s favorite anonymous proxies. At this point I thought I might want to block these IPs from some other websites I administer, so I generated my own cut-n-paste to my .htaccess list:

Then Chuckled at:
 SELECT DISTINCT CONCAT( 'Deny from ', `tbping_ip` )
 FROM `mt_tbping`
 WHERE tbping_blog_name
 LIKE "%texas%" OR tbping_blog_name
 LIKE "%poker%"
 ORDER BY `tbping_ip`

Once I had exhausted all the utility I could think of, then and only then did I:

DELETE
 FROM `mt_tbping`
 WHERE tbping_blog_name
 LIKE "%texas%" OR tbping_blog_name
 LIKE "%poker%";

Which was followed by rebuilding my blog from the command line using mt-rebuild.

So where’s the Honeypot?

I haven’t build it yet. I had enough time to post the above article, or write the script. So if you feel so compelled to automate the above, then here’s my thinking:

  1. CRONTAB a point in time where you allow your site to get spammed by temporarily renaming the .htaccess file – or at better yet, using an .htaccess file that allows one or two well-defined spammer referrer in (e.g. texas-poker).
  2. CRONTAB a time to turn back on all your protections by putting the .htaccess file back in place and then:
    • run the MySQL scripts to insert IP blocks
    • run the MySQL script to clean-up the spam from MT database
    • use mt-rebuild to rebuild your messages sans comment spam

I think however in the future, I’m going to publish a blog and ask the big hitters to link me up. It will mostly post aggregated news, but it will also publish spam hit lists in text and XML formats for easy consumption by nice-people. But first I need to get some scripts working.

In the meantime, post anything related to the above scripts or ideas. I’m sure there’s some SQL that could be better written, for example, I noticed that run more than once, and you get duplicates … which means after backing up my data AND making a copy of mt_ipbanlist in the database, I needed to run the following:

DELETE mt_ipbanlist
 FROM mt_ipbanlist t1, mt_ipbanlist t2
 WHERE t1.ipbanlist_ip=t2.ipbanlist_ip
 AND t1.ipbanlist_id

I’m also sure I’ve overlooked some procedures that could be inserted to make the whole thing work better — or at least figure out how blackjack-123.com (64.234.220.141) plays into all this.Of course if someone could point me to a poisoned and/or booby-trapped mt-tb.cgi, I’d be much obliged.

January 3, 2013
by meandean
Comments Off on Church Marketing Sucks – An Infographic of their Top 10 Posts for 2012

Church Marketing Sucks – An Infographic of their Top 10 Posts for 2012

If you aren’t a regular reader of ChurchMarketingSucks.com, then either you like wandering in the church communications wilderness, or you just haven’t had time to add their RSS feeds to your aggregator … go ahead … I’ll wait.

Now that we’ve taken care of that piece of business, I thought I might bring to your attention their listing of their top 10 posts for 2012. Why? Glad you asked.

As your church &/or charity finalizes your communications strategy for 2013 — assuming your organization plans such things in advance — I thought it might be helpful to create a colorful handout of CMS‘ top 10 list so you could better identify targets for tactical textual content.

Infographic: Top 10 Posts at Church Marketing Sucks for 2012

Infographic: Top 10 Posts from Church Marketing Sucks for 2012

Once you’re done downloading this infographic, why not show some love and link on over to Church Marketing Sucks and read the rest of their top 10 posts for 2012 article?

Credit & Thanks to:

December 1, 2012
by meandean
2 Comments

Social Media is for engaging in dialogs, not a platform to pound the pulpit!

Dear social media friends & circles, if you use Facebook & Google+ as you would a blog, then don’t get upset if I ‘defriend’ & ‘uncircle’ you.

A Failed Social Media Strategy

Last night I ran out of patience with an individual on Google Plus (G+), who though they meant well, were entering a series of multi-paragraph posts on their stream, with quotes, photos and all.

  • Basically, they were treating G+ like a blog.

As Annoying as this was on my laptop, I ignored the situation. However, spending time on my new Samsung Droid Charge, I found right flicking finger blistered beyond belief as I had to page past several screens of this user’s tomes to see what anyone else was doing.

As soon as I ‘uncircled’ them, my GPlus mobile user experience went from ARGH to Aaaahhhh.

I’ve had similarly ‘defriend’ some on  Facebook as well, though for the most part, only in those cases, I’ve basically start out by ‘hiding’ all posts by an overzealous associate before going with the nuclear option.

And for what-its-worth, I make good use of gmail’s filtering capability for spammy newsletters that don’t respect the ‘unsubscribe’ option hey offer.

So what’s in it for me?

So what can we learn from the above kvetch-fest? Plenty, glad you asked.

Whether you engage in social media for personal fulfillment or as part of a larger digital marketing strategy, the point of social media is to ENGAGE others IN a DIALOG.

If you want to get all preachy and pound the pulpit and/or go all prose on your organizations next big shindig, then may I recommend some of the following tactical approachs?

  • create a killer blog post to present your idea or event;
  • give the blog post a ‘pheromone infused’ title;
  • make sure the 1st paragraph has ‘crack-like’ compelling content;
  • add #hashtags to your blog post’s tags;
  • using the above, create a magnetic 100-120 character meme/excerpt to post on Facebook, G+, Twitter, LinkedIn, etc …; and
  • now go take a look at your work on a variety of platforms (mobile, laptop, tablet, etc …), and tweak those that appear annoying, ill formatted and/or ineffective.

Finally, and more important than any of the above — make sure to follow-up with folks who post comments, re-tweet and/or reach-out to you based on your awe-inspiring words of wisdom.

Bottom Line

So here’s my call to action:

  • use blogs for providing compelling content;
  • use social media to engage others in dialog;
  • develop a tactical approach that uses the right tool for the job for your digital marketing strategy; and
  • have a digital marketing strategy.

Agree, disagree, have an opinion? Why not force me to practice what I preach and leave a comment?

December 18, 2011
by meandean
6 Comments

How to block spambots by user agent using .htaccess

How to block spambots by user agent using .htaccess .Originally published May 27, 2008, I’ve bumped this up a bit in the queue after some edits.

Spambots and spiders that ignore robots exclusion file can kill your site both in bandwidth and by potentially exposing information you don’t want ‘harvested.’ With that in mind, here is a quick-n-dirty guide to blocking spambots and rogue search engine spiders by using .htaccess. First the essential example codeblock, followed by a working example:

essential example codeblock

# redirect spambots & rogue spiders to the end of the internet
Options +FollowSymlinks
RewriteEngine On
RewriteBase /
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^spambot
RewriteRule ^(.*)$ http://www.shibumi.org/eoti.htm#$1 [R=301,L]

Next is to read my article on how to quickly check your error logs for oddities … which should provide you with a list of all sorts of unusual user agents worth blocking.

With said list, all that is left to do is create a working version that instead of sending people to the end of the internet, blocks them outright – which is probably a better move then sending the traffic elsewhere:

real-world/working example

# redirect spambots & rogue spiders to the end of the internet
Options +FollowSymlinks
RewriteEngine On
RewriteBase /
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector
RewriteRule .* - [F,L]

Note I provide 4 examples:

  1. ^$,
  2. ^EmailSearch
  3. ^Microsoft\ URL
  4. ^Web\ Image\ Collector

All to demonstrate how to use perl-like regular expressions parse out the user agent. For example:

  1. ^ – identifies the beginning of the user agent string
  2. $ – identifies the end of the user agent string
  3. \ – that is a slash with a space afterwards tells the parser to include the space between words
  4. [OR] – is placed after each of the multiple entries, except the last
  5. [NC,…] – is sometimes placed after an entry to scan it w/out concern to upper or lower case

In the process, I’m intentionally blocking empty user agents using .htaccess – “^$” – a search string that uses a regular express to test for nothing between the beginning “^” and end “$” of a user agent token. Sorry, but if you’re not willing to tell me who/what you are, I’m not willing to show you my content.

Also, be aware the above requires that you have mod_rewrite installed on your Apache server, and that you have privileges to create your own rewrite rules in your own .htaccess file. If you’re not sure, check with your hosting service and/or system administrator.

In most cases, such privs & access exists – but your mileage may vary – as they might in how your particular .htaccess file actually works in-the-wild.

That said, more tomorrow or Thursday on how to create cron job to list those “unusual user agents” ‘automagically‘ for easy identification – and if needed -anti-spam remediation.