BigNoseBird.Com- home Small Logo
The 508 compliant Guide to 
       Big Nose Bird
BACK
WARNING: Be very careful editing your server configuration or .htaccess files. Even a minor typographical error can make your site unusable! Always make a backup copy of any file so you can recover quickly.

Using the Mod Rewrite URL Rewriting Engine to Deal with Bad Robots and Pesky Spambots

One of the greatest features of the APACHE server is Mod Rewrite. This optional module allows you to control URL access in an almost infinite manner of ways. Our task at hand though is to protect our server from wasteful accesses that for a variety of reasons can drag the server to its knees.

The problems with many robots and spambots can be broken down into a few areas:

  • They either ignore the robots.txt instructions file, or attempt to exploit it to find otherwise unlinked directories.

  • Due to programming errors, they can get caught in loops, attempting to access files that do not exist.

  • If they are what is called multi-threaded, they can launch an almost unlimited number of concurrent connections to your site creating a serious system load.

  • Do you really feel like paying for bandwidth when all somebody is doing is trying to get e-mail addresses out of your pages?
There was a time when I was using a browser detection in my Server Side Includes that would basically spill about 200K of garbage down the throat of any spambot that came our way. Okay, I confess that revenge felt good, but when I thought it over I realized that I was placing more strain on our server, and by providing a huge list of bogus e-mail addresses, was placing a strain on the SMTP server that the spammer would eventually hijack. It was then that I decided to start using the RewriteEngine module.

Any visiting spambot or what I feel is a problem robot is directed to:

problem.html
In this page, I explain why they ended up where they did. In the case of people attempting to capture the site for off-line viewing, I try to be of assistance. If somebody thinks enough of BNB to save it, I owe them something in return.

The elegance of this solution is that the offending 'bot never sees anything but that one small page. No matter what URL they request from our site, that is the only page they will ever see. It is handled at the server level and cannot be bypassed.

NOTE: In order to use this feature of the Apache Server, you must make sure that the server was installed with the mod_rewrite.o file. This is done by adding the line to the Configuration file before compiling the server.
AddModule modules/standard/mod_rewrite.o

THIS SOUNDS GREAT! HOW DO I DO IT?

As of this writing, here is my little rewrite instruction code:

RewriteEngine  on
     RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon   [OR]
     RewriteCond %{HTTP_USER_AGENT} ^EmailWolf     [OR]
     RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro  [OR]
     RewriteCond %{HTTP_USER_AGENT} ^CherryPicker  [OR]
     RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO     [OR]
     RewriteCond %{HTTP_USER_AGENT} ^Teleport*28     [OR]
     RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
     RewriteRule ^.*$ problem.html  [L]
What this code basically says, is that if the HTTP_USER_AGENT from the beginning matches any of the listed values, to redirect them to the problem.html page.

There is a performance penalty for placing RewriteEngine directives in your .htaccess file, but I recommend doing so for the following reasons.

Since you are most likely not going to be dealing with a lot of spiders at once, and since they are not going to get anyplace anyway, what is called the Chicken & the Egg Problem is not going to be much of an issue. As you identify new 'bots, you can add them to the list without having to restart your server.

Note: Do NOT place any links to your site on the page the spiders or spambots are being redirected to! You can also protect individual directories by creating an .htaccess file in the directory you would to forbid access to.



Find or Give Help on the BBS
 
Home Top E-Mail
If it looks great, it's by Christine
Some Fine Print
© 1997-2003 BigNoseBird.Com®, Inc. All rights reserved. All other trademarks are the sole property of their respective owners. The products that we recommend are only ones that we use. We have no relationship with any of the authors or their companies. We cannot assume responsibility for their ultimate performance or lack of same. We also cannot assume responsibility for either any programs provided here, or for any advice that is given since we have no control over what happens after our code or words leave this site. Always use prudent judgment in implementing any program- and always make a backup first! For further information, please read our Privacy Statement. We can be contacted at webmaster@bignosebird.com.


<reallybig.com>
Web Builder Network Portal
Advertise
on the
Reallybig.com
Network
BigNoseBird Newsletter
Subscribe
Un-Subscribe


Sign up today to receive our low volume newsletter. Tips, tricks, news, and whatever else crosses our minds.
Back Issues
Privacy Statement