Heal Your Church WebSite

Teaching, rebuking, correcting & training in righteous web design.

Stupid Array Tricks in Perl

Later tonight, I’m going to talk a little bit about security and ethics and things to keep in mind as church members come and go, or as you as a developer move from one church to another.

Meanwhile, I’ve had a brain fade this morning. I forgot how to sort arrays in Perl. Good thing way back in June of this past year, I mentioned the Perl Circus as a “Resource Filled” site.

Here is just what I needed to jog my memory:

@arr1 = (“zero”, “one”, “two”, “three”, “four”);
@arr2 = sort{$a cmp $b} @arr1; #in ascii order
@arr3 = sort{$b cmp $a} @arr1; #in reverse ascii order
@arr4 = sort{$a <=> $b} @arr1; #in numeric order
print “@arr2”;

RESULT: four one three two zero
DISCUSSION: The sort function takes a block and an source array. It will use the block to determine how to order the array it returns. The variables $a and $b in the sort block are special and can be used to force the ordering from top-down ($a, $b) or from bottom-up ($b, $a). Beware that using cmp will sort according to the ASCII order of the array elements, while <=> will sort according to their numeric value.

DUUUHH, I can’t believe I mentally misplaced something so simple as the above. Actually what I forgot was the difference between sorting alpha and numeric entities. I must be getting old. What am I saying? By most coder standards I am as old as dirt.

So help out this gray covered (but still gargantuan) crainium of mine. How about some comments, who’s got a page full of good PHP array tricks? Python? I’d say VB.Net but their collections class is insanely addictive I dare not mention it …


  1. Dean – You’ve got to see this hideous site linked from Tim Bednar’s e-Church, at http://www.e-church.com/Blog-detail.asp?EntryID=125.


  2. Speaking of fun with sort, this is an excerpt of perl code from my day job:

    @ipclist = sort {
    ($a =~ /^DI/ and $b !~ /^DI/ and -1) or
    ($a !~ /^DI/ and $b =~ /^DI/ and 1) or
    $toprocess{$a} <=> $toprocess{$b} or
    $a cmp $b;
    } keys(%toprocess);

    You can put an arbitrarily complicated expression in as the sort subroutine; this one sorts file codes, and the rule translates to:
    1. Codes beginning with “DI” first, then
    2. Codes in the order of precedences stored in the %toprocess array, then
    3. Codes by collating (alphabetical) order.

    Note the perl operators “and” and “or”. I’ve found them to be handy “do what I mean” operators, since they have a lower precedence order than anything else and therefore don’t need all the parenteses that you can end up with when using || or &&.

    Not that there’s a reason to do a nasty sort linke that in most websites.

    However, the way rule 2 works above is probably worth remembering when you’ve got a complicated sort order to deal with – if you can express the sort order via some kind of lookup table, you can stuff that table into a hash and use it in your sort.

    For example, let’s say that you’ve got a bunch of strings that look like:

    “Genesis 1:9”
    “1John 3:7”
    “Exodus 20:15”

    And say that they’re stored in an array called @verses. Now assume that you also have a hash of all the books of the bible that maps the name to a sequence number; assuming that you always spell things the same, this could be done with:

    @books = qw(Genesis Exodus … Jude Revelations);
    %books = ();
    my $i = 1;
    %books = map {$_ => $i++;} @books;

    To take variant spellings and such into account it’s a slight bit uglier, but still pretty straightforward. Anyway, here’s how you sort the verses:

    my ($ab, $ac, $av, $bb, $bc, $bv, $r);
    $r = qr/^(\w+) (\d+):(\d+)/;

    @verses =
    sort {$a =~ $r; ($ab, $ac, $av) = ($1,$2,$3);
    $b =~ $r; ($bb, $bc, $bv) = ($1,$2,$3);
    $books{$ab} <=> $books{$bb} or
    $ac <=> $bc or
    $av <=> $bv;}

    Now, there are some ways to speed this up that might be worth looking at if you had a huge list. The list verses cited in a single sermon, or even a month worth of sermons, doesn’t qualify. However, I’ll point it out anyway:

    my ($r, $ab,$ac,$av, %lookup);
    $r = qr/^(\w+) (\d+):(\d+)/;
    %lookup =
    map { /$r/; ($ab, $ac, $av) = ($1,$2,$3);
    $_ => 100*100*$books{$ab} + 100*$ac + $av; }
    @verses = sort { $lookup{$a} <=> $lookup{$b} }

    In case the line noise there is tough to follow, what this does is first build a hash called %lookup that translates verses to numbers that will tell you the order of the verses – for example, “Genesis 1:9” becomes 10109, and “Exodus 20:15” becomes 22015. (It’s 10000 times the book number, plus 100 times the chapter number, plus the verse number). Then, the sort order just uses the lookup table.

    Now I suppose I should present one more way of doing this, since this is actually the same method that python people use to perform complicated sorts efficiently. (The python sort method is reportedly abyssmally slow when given an arbitrary function to sort with) They call it the “DSU” (Decorate Sort Undecorate) pattern, and it’s basically the same as the lookup table, except that the lookup value is attached to each verse directly.

    That was unclear. I can say it better in code:

    my ($r, $ab,$ac,$av, @decorated);
    $r = qr/^(\w+) (\d+):(\d+)/;
    @decorated =
    map { /$r/; ($ab, $ac, $av) = ($1,$2,$3);
    [100*100*$books{$ab} + 100*$ac + $av, $_]; }
    @decorated = sort { $a->[0] <=> $b->[0]; } @decorated;
    @verses = map { $_->[1]; } @decorated;

    So basically you construct a new list, @decorated, where each element is a two-element array: [lookup number, verse]. Then you sort the @decorated list. Then, you get a sorted @verses array by pulling it out of the sorted @decorated. You can find this idiom in python in several places – for example, http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52234

    For those wanting to show off their perl, this can all be combined without the intermediate @decorated array:

    @verses =
    map { $_->[1]; }
    sort { $a->[0] <=> $b->[0]; }
    map { /^(\w+) (\d+):(\d+)/;
    [10000*$books{$1} + 100*$2 + $3, $_]; }

    Which appeals to me, and actually looks elegant, but I’m a bit odd.

    You know, it’d be really nice if I could use <pre> or similar tags in these comments. The code looks really ugly with all the indentation ripped out.