Automatic URL Abbreviation
You can create a Perl or shell script (insert your favorite CGI scripting language here) to look for URLs that match the lookup keys in your map file and automatically abbreviate your URLs. We use this technique on WebReference.com's home page. To make it easy for other developers to auto-abbreviate their URLs, we've created an open source script called shorturls.pl. It is available at http://www.webreference.com/scripts/.
NOTE
XSLT gives you another way to abbreviate URLs automatically. Just create the correct templates to abbreviate all the local links in your files.
The shorturls.pl script allows you to abbreviate URLs automatically and exclude portions of your HTML code from optimization with simple XML tags (<NOABBREV> ...</NOABBREV>).
Using this URL abbreviation technique, we saved more than 20 percent (5KB) off our 24KB hand-optimized front page. We could have saved even more space, but for various reasons, we excluded some URLs from abbreviation.
This gives you an idea of the link abbreviation process, but what about all the other areas of WebReference? Here is a truncated version of our abbreviation file to give you an idea of what it looks like (the full version is available at http://www.webreference.com/scripts/):
b dlab/ d dhtml/ g graphics/ h html/ p perl/ x xml/ 3c 3d/lesson dd dhtml/dynomat/ ddd dhtml/dynomat/dialogs/ dc dhtml/column ... i http://www.internet.com/ ic http://www.internet.com/corporate/ ... jsc http://www.javascript.com/ jss http://www.javascriptsource.com/ jsm http://www.justsmil.com/ ...
Note that we use two and three-letter abbreviations to represent longer URLs on WebReference.com. Yahoo! uses two-letter abbreviations throughout their home page. How brief you make your abbreviations depends on how many links you need to abbreviate, and how descriptive you want the URLs to be.
The URL Abbreviation/Expansion Process: Step-by-Step
In order to enable automatic link abbreviation (with shorturls.pl) and expansion (with mod_rewrite), do the following:
Create an abbreviation map file (RewriteMap) with short abbreviations that correspond to frequently used and longer directories separated by tabs. For example:
d dhtml/ g graphics/ dc dhtml/column gc graphics/column ...
Add the following lines to your httpd.conf file to enable the mod_rewrite engine:
RewriteEngine On RewriteMap abbr txt:/www/misc/redir/abbr_yrdomain.txt RewriteRule ^/r/([^/]*)/?(.*) ${abbr:$1}$2 [redirect=permanent,last]
Try some abbreviated URLs (type in /r/d etc.). If they work, move on to step 4; otherwise, check your map and your rewrite directives. If all else fails, contact your system administrator.
Convert your RewriteMap text file to a binary hash file. See http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html for the txt2dbm Perl script.
Change the RewriteMap directive above to point to this optimized *DBM hash file:
RewriteMap abbr dbm:/www/misc/redir/abbr_yrdomain
Now your rewrite engine is set up. To automate URL abbreviation, point shorturls.pl to the text version of your RewriteMap file, input your home page template and output your home page, and schedule the job with cron on UNIX/Linux, or the Schedule Tasks GUI in Windows:
echo "\nBuilding $YRPAGE from $YRTEMPLATE\n" /www/yrdomain/cgi-bin/shorturls.pl $YRTEMPLATE $YRPAGE
That's it. Now any new content that appears on your home page will be automatically abbreviated according to the RewriteMap file that you created, listing the abbreviations you want.
Use Short URLs
You could name your directories using these short, cryptic abbreviations. Using descriptive names for directories and file names has advantages, however, in usability and search engine positioning. Using URL abbreviation, you can have the best of both worlds for high traffic pages like home pages.
For front page or frequently referenced objects like single-pixel GIFs, logos, navigation bars, and site-wide rollovers, however, you can use short URLs by placing them high in your site's file structure, and using short file names. For example:
/i.gif (internet.com logo) /t.gif (transparent single pixel gif)
I've seen some folks carry the descriptive-names-at-all-cost idea to extremes. Here's a surreal-world example:
transparent-single-pixel-gif1x1.gif (actual file name)
Some search engine positioning firms sprinkle keywords wherever they are legaland in some places where they're not. Again, it's a tradeoff. Bulking up your pages with keyword-filled alt values and object names may increase your rankings, but with the advent of backlink-based search engines like Google and Teoma, these practices are fading in effectiveness.
You could even use content negotiation or your srm.conf file to abbreviate file type suffixes. This technique is pretty extreme, seldom used, but perfectly valid. Here's an example:
i.g (.g = .gif, srm.conf directive of AddType image/gif g) i (content negotiation resolves to i.gif, could later use i.png)