- Tuning mod_rewrite
- The Abbreviation Challenge
- The RewriteRule Solution
- The RewriteMap Solution for Multiple Abbreviations
- Automatic URL Abbreviation
- About This Article
The RewriteMap Solution for Multiple Abbreviations
The RewriteRule solution would work well for a few abbreviations, but what if you want to abbreviate a large number of links? That's where the RewriteMap directive comes in. This feature allows you to group multiple lookup keys (abbreviations) and their corresponding expanded values into one tab-delimited file. Here's an example map file at (/www/misc/redir/abbr_webref.txt):
d dhtml/ dc dhtml/column pg programming/ h html/ ht html/tools/
The MapName specifies a mapping function between keys and values for a rewriting rule using the following syntax:
${ MapName : LookupKey | DefaultValue }
When you are mapping a construct, you generalize the RewriteRule regular expression. Instead of a hard-coded value, the MapName is consulted, and the LookupKey accessed. If there is a key match, the mapping function substitutes the expanded value into the regular expression. If there is no match, the rule substitutes the default value or a blank string.
To use this external map file, we'll add the RewriteMap directive and tweak the regular expression correspondingly. The following httpd.conf commands turn rewriting on, show where to look for your rewrite map, and show what the rewrite rule is:
RewriteEngine On RewriteMap abbr txt:/www/misc/redir/abbr_webref.txt RewriteRule ^/r/([^/]*)/?(.*) ${abbr:$1}$2 [redirect=permanent,last]
The first directive turns on rewrites as before. The second points the rewrite module to the text version of our map file. The third tells the processor to lookup the value of the matching expression in the map file. Note that the RewriteRule has a permanent redirect (301 instead of 302) and last flags appended to it. Once an abbreviation is found for this URL, no further rewrite rules are processed for it, which speeds up lookups.
Here we've set the rewrite MapName to abbr and the map file location (text format) to the following:
/www/misc/redir/abbr_webref.txt
The RewriteRule processes requested URLs using the regular expression:
^/r/([^/]*)/?(.*) ${abbr:$1}$2
This regular expression matches a URL that begins with /r/. (The ^ character at the beginning means to match from the beginning of the string.) Then the regular expression ([^/]*) matches as many non-slash characters it can to the end of the string. This effectively pulls out the first string between two slashes following the /r. For example, in the URL /r/pg/javascript/, this portion of the regular expression matches pg. It also will match ht in /r/ht. (Because there are no slashes following, it just continues until it reaches the end of the URL.)
The rest of the pattern /?(.*) matches 0 or 1 forward slashes / with any characters that follow. These two parenthesized expressions will be used in the replacement pattern.
The Replacement Pattern
The substitution (${abbr:$1}$2) is the replacement pattern that will be used in the building of the new URL. The $1 and $2 variables refer back (backreferences) to the first and second patterns found in the supplied URL. They represent the first set of parentheses and the second set of parentheses in the regular expression, respectively. Thus for /r/pg/javascript/, $1 = "pg" and $2 = "javascript/". Replacing these in the example produces the following:
${abbr:pg}javascript/
The ${abbr:pg} is a mapping directive that says, "Refer to the map abbr (recall our map command, RewriteMap abbr txt:/www/misc/redir/abbr_webref.txt), look up the key pg, and return the corresponding data value for that key." In this case, that value is programming/. Thus the abbreviated URL, /r/pg/javascript, is replaced by the following:
/programming/javascript/
Voila! So you've effectively created an abbreviation expander using a regular expression and a mapping file. Using the preceding rewrite map file, the following URL expansions would occur:
"r/dc" becomes "dhtml/column" "r/pg" becomes "programming/"
The server, upon seeing a matching abbreviation in the map file, will automatically rewrite the URL to the longer value.
But what happens if you have many keys in your RewriteMap file? Scanning a long text file every time a user clicks a link can slow lookups down. That's where binary hash files come in handy.
Binary Hash RewriteMap
For maximum speed, convert your text RewriteMap file into a binary *DBM hash file. This binary hash version of your key and value pairs is optimized for maximum lookup speed. Convert your text file with a DBM tool or the txt2dbm Perl script provided at http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html.
NOTE
Note that this example is specific to Apache on UNIX. Your platform may vary.
Next, change the RewriteMap directive to point to your optimized DBM hash file:
RewriteMap abbr dbm:/www/misc/redir/abbr_webref
That's the abbreviated version of how you set up link abbreviation on an Apache server. It is a bit of work, but once you've got your site hierarchy fixed, you can do this once and forget it. This technique saves space by allowing abbreviated URLs on the client side and shunting the longer actual URLs to the server. The delay using this technique is hardly noticeable. (If Yahoo! can do it, anyone can.) Done correctly, the rewriting can be transparent to the client. The abbreviated URL is requested, the server expands it, and serves back the content at the expanded location without telling the browser what it has done. You also can use the /r/ flag or the RewriteLog directive to track click-throughs in your server logs.
This technique works well for sites that don't change very often: you would manually abbreviate your URIs to match your RewriteMap abbreviations stored on your server. But what about sites that are updated every day, or every hour, or every minute? Wouldn't it be nice if you could make the entire abbreviation process automatic? That's where the magic of Perl and cron jobs (or the Schedule Tasks GUI in Windows) comes in.