Heal Your Church WebSite


Teaching, rebuking, correcting & training in righteous web design.

Link Rot vs. mod_rewrite – round 1

I woke up a bit early today. Bleary eyed, I checked my email to find the following question:

hey dean :)

I’m working on a big redesign of www.anglicanmediasydney.asn.au , and was just wondering if I could ask a quick question about mod_rewrite? … I can? great :) Basically I’m trying to avoid linkrot, so I want to put all my old content into an /old/ dir, and then add rewrite rules to httpd.conf do something like this:

if URI is invalid
check /old/ dir for file
if file isn’t found serve 404

… so when someone tried to access
http://mydomain.com/file.html (which no
longer exists) apache checks for http://mydomain.com/old/file.html, and if
that doesn’t exist, it serves a 404.

While I’ve found some examples which are close I’m a designer, so I really need it spelt out for me ;)

This is as close as I’ve got (from the rewrite guide), but it doesn’t work:

RewriteEngine On
RewriteCond %{REQUEST_URI} !-U
RewriteRule ^(.+) http://mydomain.com/old/$1

any suggestions? if not don’t worry, i’ll keep looking…

thanks!!

Luke Stevens
Graphic/Web Designer
Anglican Media Sydney

Thanks Luke! What a great question! And may I add, what an excellent approach to something many church web sites bo through, but don’t adequately address – link rot – in this case via a re-design.

I haven’t hacked a solution myself – not yet, but I too will be facing this problem, which will be further complicated by the fact that I want to remove really long and wacky .CGI or .php urls with arguments and instead leverage the URL and/or subdomains to provide easier to remember URLS. For example:

http://www.redlandbaptist.org/cgi-bin/foo/bar.cgi?arg1=yo&arg2=mamma
becomes:
http://www.redlandbaptist.org/yo/mamma

or perhaps:
http://yomamma.redlandbaptist.org

There are a couple of ways to skin this cat, as pointed out in some articles over at ALA articles on the topic that deal with the issue, one via mod_rewrite, the others programatically via PHP: How to succed with URLS, Slash Forward and URLs, URLs, URLs. Speaking of programatically, there is “If I Were an InstaPundit” by yours truly.

There is of course Apache’s Module mod_rewrite URL Rewriting Engine page. And as Luke mentions in his email, there is the big-kahuna of mod_rewrite documentation, Ralf S. Engelschall’s Apache URL Rewriting Guide who’s intro sorta sums up what we’re both up against:

The Apache module mod_rewrite is a killer one, i.e. it is a really sophisticated module which provides a powerful way to do URL manipulations. With it you can nearly do all types of URL manipulations you ever dreamed about. The price you have to pay is to accept complexity, because mod_rewrite’s major drawback is that it is not easy to understand and use for the beginner. And even Apache experts sometimes discover new aspects where mod_rewrite can help.

In other words: With mod_rewrite you either shoot yourself in the foot the first time and never use it again or love it for the rest of your life because of its power. This paper tries to give you a few initial success events to avoid the first case by presenting already invented solutions to you.

Bottom line? Luke, I don’t have a solution for you … YET. But since it is something I need to work on, here are the resources I’m going to refer to. Perhaps, maybe, someone nice out there will throw one down as a comment in the interim.

5 Comments

  1. Hey, there’re a few ways you can do this, and a question or two you have to answer. The most important question to answer first is “Do you want this to be an external redirect, or do you want it to be seamless?” In other words, do you want this to send an redirect to the browser and have it fetch the new URL in the /old/ directory (showing the /old/ directory to the user), or do you just want to make it happen behind the scenes? I’ll assume you want behind the scenes, since no one likes to see /old/ in their URL bar. Here’s how to do it with mod_rewrite:

    RewriteEngine On
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{DOCUMENT_ROOT}/old%{REQUEST_URI} -f
    RewriteRule .+ %{DOCUMENT_ROOT}/old%{REQUEST_URI}

    This assumes that the /old/ directory is at your web root. Also, REQUEST_URI is the un-url-decoded request string, so if there are any escaped characters (like spaces) in your filenames, I don’t think it’ll work right. However, if your old files are just like /old/foo.html I don’t think you should have a problem.

    The other way to solve this problem would be to do some magic in a 404 PHP script, but this *should* be able to do what you want. I haven’t gotten to test it thoroughly, however, so be warned.

  2. Thanks very much for your interest Dean, and thanks Keith for 99.9% of the answer! :)

    I had to make one small change to get it to work, and that was change this line:

    RewriteCond %{DOCUMENT_ROOT}/old%{REQUEST_URI} -f

    to this:

    RewriteCond %{DOCUMENT_ROOT}/old%{REQUEST_URI} -U

    (ie change the -f to -U).

    Now it works! hooray!! :) This is *such* a relief, I’ve looked (or more specifically, googled) high and low for an answer to this – it’s *great* to finally have one, thanks guys!

  3. One prob… requests for files which don’t exist are now timing out (or eventually getting “zero sized reply”), instead of getting 404′s. Is there any way to tell apache to serve the appropriate errordocument if the above rewrite rules fail?

  4. Why do you insist on using the -U? :) The point is that you want to make sure the file exists. As far as I know, the -U does a sub request, and you might be sending Apache into an infinite loop looking for your URL, because in each request you’re using these mod_rewrite rules again! Either that or you’re seeing a URL with -U that doesn’t actually exist with -f, and then you try to go to the file %{DOCUMENT_ROOT}/old%{REQUEST_URI} which doesn’t actually exist.

    Most importantly, why didn’t the -f work?

  5. Argh, I had some leftover test rewrite stuff at the bottom of my virtual host setup which might have been interfering. May have been why -f wasn’t working, so I cleaned it up and changed it back to your original suggestion, and all is well. Thanks!! :)