There’s almost no tool I lean on more heavily than my .htaccess file. This invisible file generally sits in the root directory on an Apache served website. From there it seamlessly controls all the web traffic of your site.
Some of the common uses of the htaccess file are to redirect 404 errors to a custom “Page Not Found” page, to restrict access to a directory or file or to prevent directory listings –all very important and useful activities. But my favorite use of the .htaccess file is for URL masking.
In general, if you go to a URL –say http://www.example.com/sweets/taffy.html –it means that there is a file called “taffy.html” in a directory called “sweets” on a server directed to by the domain name “example.com”, subdomain “www”. With a htaccess file, however, you can insert a redirect command that will invisibly redirect incoming traffic to that URL to any other file on your domain.
There are three major reasons for URL masking.
1. As a security measure. If you don’t want everyone who visits your site to know what language you’re programming in or where your files are located, URL masking is your only hope. So “sweets/taffy.html” might really be “cgi-bin/taffy.pl” or “asp/taffy.asp” or “programs/taffy.php”.
2. To make your URLs look better if you have “dynamic pages”, meaning that you are passing variables to the programming through the URL string. For instance, “test.asp?p=5&dir=pizza&s_id=11950″ can become “pizza/11950/page5.html”. This has the added security advantage of concealing the names of the variables being passed through the URL. Of course, you still access the passed variables the same way in the actual programming –through the $_GET or $_REQUEST superglobal for php, or via the Request object for asp.
3. For Search Engine Optimization. This last may be the most common use of URL masking because it’s automatically built into many blogging and shopping cart programs. For whatever reason, search engines tend not to like dynamic URLs with a lot of queries visible in the string, so URL masking can give the search engines the appearance of the orderly directory-and-stable-pages they prefer. And if you’ve ever wondered how that particular piece of magic is being performed by your favorite blog platform, this is probably the answer.
A note of caution –all relative links will parse in relationship to the displayed URL. So if you have a file at “/programming.pl?dir=fox” but it displays as “fox/testprogram.html”, a relative link like “index.html” will take you to “fox/index.html”, not “index.html” in the root directory as you may have intended. That also goes for the page images if they have relative links.
OK –theory is fine, but what about practice? How do you use the htaccess?
First make sure that Apache has mod_rewrite enabled.
Then create a file in the root web directory called “.htaccess” (there may be one there already. If so, you can add on to the end of it, but don’t erase it or you’ll change the way your site functions).
Add these lines of code:
RewriteEngine on
RewriteBase /
After RewriteBase put “/” if you are in the root directory, or the name of the subdirectory if you are in one (which is not generally a good idea –it’s best to have your URL rewrites as close to the root as possible).
Now you can add as many rules as you want, in the following format
RewriteRule ^URL that is being masked$ DisplayedURL
So, the URL that the person will browse to, which should be written relative to the current directory, goes in between a caret (”^”) and a dollar sign (”$”) followed by a space and then the real location of the file.
RewriteRule ^sweets/taffy\.html$ programs/taffy.asp
(Note: The “.” in the first URL must be escaped by placing “\” in front of it, because otherwise it is read by the program as a wildcard matching any character. The second URL displays as written).
The rewrite rules become much more powerful when you add the ability to use regex (’regular expressions’) which allow you to rewrite an unlimited number of URLs with a single rule.
RewriteRule ^sweets/(.*)\.html$ programs/$1.asp
Anything enclosed in parenthesis in the first URL is saved for reuse in the second URL. The first set of parenthesis goes into the variable “$1″, the second in the variable “$2″ and so forth.
As mentioned before, the “.” stands for any non-whitespace character. The * means match as many of those characters as you want. In effect, we’re telling the program to save anything between “sweets/” and “.html” and to place whatever that is in the variable “$1″.
So this new rule will still redirect “sweets/taffy.html” to “programs/taffy.asp”, but it will also redirect “sweets/toffee.html” to “programs/toffee.asp”, and “sweets/chocolate.html” to “programs/chocolate.asp”. Anything that matches the pattern will work!
There’s much more to rewriting URLs, but this little bit is enough to get a lot of valuable work done. Just be careful! Any small mistake in the .htaccess file can make your entire site inaccessible.
