.htaccess Web Site Tips

To start with you can download an example file of what I use for my .htaccess file. Then you can follow along below to learn what each of the functions are that I use in my .htaccess file. Also note that one of the most common uses of .htaccess files is directory passwording, and that is not covered here but I am sure there are plenty of other sources for more information on that.

.htaccess is a Unix based feature that allow you to override many different server function and control other features. Also note that on some servers any file that starts with a dot is considered a hidden file and may not show up in your ftp program. You should check the documentation of your ftp software to figure out how to display hidden file. I personally use Ws_FTP Pro and with that software I can type in "-la" in the white box under the "mkdir" button on the right and hit return to list all hidden file.

Directory Index

#####################################
# Default Directory Files
#####################################

DirectoryIndex index.html index.htm index.cgi index.php

The directory index command tells the server which files can be used for the default file to open when a directory is accessed. For example when you go to http://expressproducts.net/ you are requesting a directory and not an actual file.So this tells the server to look for each one of these files in the order listed. Most servers will already have something similar to this by default but you can use this to override this feature, or change the order in which you want them to work.

Directory Browsing

#####################################
# Prevent Directory Browsing
#####################################

Options -Indexes

Next I use this function to disable all directory browsing. Really the idea here is that there is usually no need for someone to be able to see everything that you have in a certain folder and when someone is trying to browse a directory they are usually up to no good!

Protect your .htaccess file

#####################################
# Prevent Access to .htaccess files
#####################################

<Files .htaccess>
order allow,deny
deny from all
</Files>

What this does is just make sure that NO ONE will be able to access your .htaccess file from the web. In my case my .htaccess file is not even in my public-html directory so that also eliminates the problem.

SSI Support

#####################################
# Support SSI on .htm and .html pages
#####################################

AddHandler server-parsed .htm
AddHandler server-parsed .html

The reason for adding this bit of code is that often the default configuration of a server will require that you name all files that include ssi in them to be named .shtml. This bit of code just tells the server to also look for and execute ssi calls within documents with the .htm and .html extensions. Being a Perl programmer I often use ssi calls to simplify things in my sites so it is helpful to enable it on all pages.

IP Blocking

#####################################
# Misc. Denied IP Addresses
#####################################

<IfModule mod_rewrite.c>
RewriteEngine on

RewriteCond %{REMOTE_ADDR} ^67.55.74.228$ [NC,OR] # Date and description just for my record
RewriteCond %{REMOTE_ADDR} ^72.135.9.58$ [NC,OR] # Thousands of download of commerce-cgi on 10/2
RewriteCond %{REMOTE_ADDR} ^208.101.11.59$ [NC] # Pounding my bbs on 10/16

RewriteCond %{REQUEST_FILENAME} !/banned.htm
RewriteCond %{REQUEST_FILENAME} !/400.htm
RewriteCond %{REQUEST_FILENAME} !/401.htm
RewriteCond %{REQUEST_FILENAME} !/403.htm
RewriteCond %{REQUEST_FILENAME} !/404.htm
RewriteCond %{REQUEST_FILENAME} !/500.htm

RewriteRule .*\.(htm|html|php|cgi|zip|tar|gz)$ /banned.htm [L,R]

If you read through my other tutorials you will see that I am a big believer in monitoring your web site activity. But once you start you will really find out just how many people are actually abusing your site every day. While the other part of this is to then be able to prevent them from accessing your site. One way to do this is by blocking there IP address.

The first 2 lines of this code are checking to make sure you have mod_rewrite on your server which should prevent errors if you don't, and then the second is turning it on. The next 4 are the lines with the actual IP addresses that are banned and you can just add an additional line for each IP you want to ban. Then the next 6 are documents that they will not be banned from viewing. Obviously the banned.htm page needs to be in there, and the others are my error page. Then the last line is the one that actually redirects them to the banned.htm file IF the file they are trying to access has one of the extensions listed. I did this because I don't want to ban images on the banned page and this was one way of doing that.

Now the normal way to deny access to your site is to simply use "deny from xxx.xxx.xxx.xxx", and that works just fine, but I wanted to go a little further and display a message to the person that was banned to let them know just in case there was a problem. I the passed if I made a mistake I then got an email from the user thinking my site was down when it was actually that they were banned.

One other small point to keep in mind is that if you are deny an IP address found in your logs very often the IP address that you are seeing is forged, so keep that in mind when banning IP addresses.

Banned User Agents

#####################################
# Deny Useragents
#####################################

RewriteCond %{HTTP_USER_AGENT} ^FrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Java.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^libwww.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline.Explorer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZip [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC]
RewriteRule ^.*$ - [F]

I have trimmed this code down to just a few examples and if you would like to see the full list then you can download it from the link at the top of the page. Basically what this code does is ban access to your site based on the user agent that is reported in the request header. If it is on this list then they will not have access to your site.

I have recently updating the code above to that listed below. This actually takes a much more aggressive approach to blocking. Instead of listing each and every USER AGENT I want to block I instead list the ones that are allowed. The ones that are allowed must 'start' with msnbot, Opera, Yahoo, or Mozilla and then there are some additional exception to the Mozilla rule that I am blocking below that. This seems to work VERY well for me. I does block some of the lesser search engines that you may want to index your site, but personally I don't think they make any difference in your traffic and some actual have horribly misbehaved bots. Also the first line makes an exception for those that have no USER AGENT listed. The reason I had to add this is that I use PayPal IPN and when they post to your site it is done without a USER AGENT.

RewriteCond %{HTTP_USER_AGENT} !^$
RewriteCond %{HTTP_USER_AGENT} !^(msnbot|Opera|Yahoo|Mozilla) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla\/4\.0$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla\/5\.0$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla\/4\.0\+ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla\/4\.0\ compatible$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla\/4\.0\ \(compatible\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Charlotte.*searchme\.com [NC]

RewriteRule ^.*$ - [F,L]

Use redirects to hide the technology behind the cart

#####################################
# Redirect URLS
#####################################

RewriteRule \/search\/ /cgi-bin/search.cgi [L,T=application/x-httpd-cgi]

The example here is of a standard redirect. When someone accesses "www.example.com/search" they are redirected to "www.example.com/cgi-bin/search.cgi" without the user knowing about it. The point I want to make here is that this is also useful as a method to hide the technology that you are using. For example if I just posted to search.cgi someone would know that I am using a Perl script to run the search. And even worse then that is that if there is another known program on the web called search.cgi that is known to have a flaw in the code I may end up with a bunch of bots hitting my site trying to exploit my script.

Another good use for this is if you are moving or renaming a page so that visitors to the old pages are automatically redirected to the proper page.

Referrer Spam

#####################################
# Spam Referrers
# http://www.spywareinfo.com/articles/referer_spam/
#####################################

RewriteCond %{HTTP_REFERER} hackerviet [NC,OR]
RewriteCond %{HTTP_REFERER} insurance [NC,OR]
RewriteCond %{HTTP_REFERER} poker [NC,OR]
RewriteCond %{HTTP_REFERER} 24hours-credit [NC,OR]
RewriteCond %{HTTP_REFERER} baby-casino [NC,OR]
RewriteCond %{HTTP_REFERER} texas-hold-em [NC,OR]
RewriteCond %{HTTP_REFERER} hold-em [NC,OR]
RewriteCond %{HTTP_REFERER} holdem [NC,OR]
RewriteCond %{HTTP_REFERER} doctor-pills [NC,OR]
RewriteCond %{HTTP_REFERER} thesmart-casino [NC,OR]
RewriteCond %{HTTP_REFERER} westvalleyhigh [NC,OR]
RewriteCond %{HTTP_REFERER} highest-credit [NC]
RewriteRule ^.*$ %{HTTP_REFERER} [R,L]

</IfModule>

Referrer spam is where someone forges the referer header and then accesses your site. The thought here is that hopefully you are someone that monitors your log files and you will see this link and visit there site. Or hopes that your report on activity are publicly viewable on the web and that a search engine will spider your reports and increase the number of sites linking to them therefore increasing there ranking in the search engine. The truth is it is just a pain in the butt so I wrote this little bit of code which has been very successful. It looks at the referer and based on the value it REDIRECT THEM BACK ON THEM SELVSE! HAH! So instead of generating a bunch of bogus traffic to your site it redirects them back to there own site

The last line we are just closing the code that looks if mod_rewrite is installed because we are done with the mod_rewrite code now.

Error Page

#####################################
# Error Pages
#####################################

# 400: Bad request
ErrorDocument 400 /400.htm

# 401: Authorization required
ErrorDocument 401 /401.htm

# 403: Forbidden
ErrorDocument 403 /403.htm

# 404: Page not found
ErrorDocument 404 /404.htm

# 500: Internal server error
ErrorDocument 500 /500.htm

This is just your standard error pages code but I highly recommend using these and then customize each one of the page to explain to the user what the problem is that has occurred. Also make sure that these pages have lots of navigation/search features so that for example they hit the 404 page not found they can then search for the proper page. May help keep people on your site and generate more sales.

Hope you found this helpful!