Brainstorms and Raves - Mobile

skip to navigation

Behind the Scenes with Apache's .htaccess
07:56 PM - Oct 9, 2005

Although I’m a designer and not a programmer or server-side specialist, for a few years I’ve used Apache’s .htaccess to a limited degree for clients' websites, primarily for simple URL redirects and setting up custom error pages. Now that I can use Apache’s .htaccess for my own websites, I’ve been immersed in learning more about how to use this powerful tool conservatively but effectively to redirect URLs and to combat spammers and bad bots. Today’s post provides links to some of the online sources that I’ve found especially helpful.

First, A Word of Warning

Keep in mind that one little typo or incorrect rule within an .htaccess file can cause an internal server error and take your entire website offline. Especially if you’re new to using an .htaccess file, I highly recommend setting up a test directory to work on your .htaccess file. In addition, always make a backup of your .htaccess file before making any changes. That way, if you do happen to make a typo or other error, you can load your backup file again to keep your website up and running while you look for the source of the problem(s).

In addition, many caution those new to .htaccess about not getting too carried away and ending up creating excessively big .htaccess files. Keep in mind that the server will process this file for each request at your website, so you don’t want to negatively impact your server’s performance. For those with access to the httpd.conf file on your Apache server, many recommend using that instead of .htaccess, especially for better server performance. Many of us on shared servers, though, don’t have access to it, including myself.

I prefer to think of .htaccess as just one of a variety of approaches and tools for managing URLs (especially URL redirecting), managing custom error pages, and combating bad bots and spammers. It’s a fantastic tool that I’m thrilled to be able to use for my own websites finally, including this one. (About two months ago all of my websites moved to a new server.)

Regarding combating bad bots and spammers, .htaccess is one of several tools and approaches that I use. My goal is to keep things simple and block the bad guys without blocking everyone else. No one single approach can do it all, though, and bad bots and spammers continually work on ways to get past all the blocking approaches discussed online. So far I’m able to block nearly all of the bad bots and spammers, but new ones always come along, so I watch my logs closely, too.

On to some website links that I’ve found especially helpful.

Apache Documentation

First, here are several links to the definitive source for Apache 1.3 and Apache 2.0 specifically related to using .htaccess, especially for redirecting URLs and blocking bad bots and spammers.

Apache 1.3
Apache 2.0

How to Use .htaccess, mod_rewrite, and Related (for Apache)

.htaccess Tools

I’ve been scouring the Internet looking for tools that will check .htaccess files for typos or other potential problems. So far I haven’t found anything, although I did find some tools that will help you create .htaccess rules and test user agent strings. They’re listed below.

Tools to Generate .htaccess Rules

Try one of these tools to generate redirects, hotlink protection, password protection, or blocking bad bots. At the minimum, you can try them out as learning tools to see how something might be handled. Note that they might not do very complex rules.

Tools to Test .htaccess Rules

Forums Devoted to Apache .htaccess, mod_rewrite, mod_setenvif, and Related

You’ll find enormously helpful tips and troubleshooting help for using .htaccess, mod_rewrite, mod_setenvif, and related Apache features via these forums. You don’t need to subscribe to read most discussions, although you’ll need to sign up to post your questions or comments, and Webmasterworld Forums has subscriber-only areas in addition to their freely available areas.

Using .htaccess to Block Hotlinking, Stop Bandwidth Theft

I absolutely love the availability of preventing other websites from directly linking to my server’s images, CSS, JavaScript, etc. using .htaccess. Here are a couple of tutorials on how to do it.

Note that you might wish to allow certain sites to directly link to a specific image, such as an icon image for your newsfeeds, while still not allowing hotlinking to all your other images. I recently added my newsfeeds-related icon image to a separate directory, and in that directory’s .htaccess file I’ve specified a rule using Apache’s <Files> directive to allow hotlinking to that specific image only. I’m currently testing that to see how it goes for the next few weeks. I prefer that people download the icon to use from their own servers, so if I find other websites abusing the hotlinking for that image, it’s easy enough to individually prevent them from hotlinking to it and make more restrictive rules within that separate directory’s .htaccess file.

Using .htaccess to Ban Bad Bots and Spammers

Note that some of the Webmasterworld forum links might require a subscription.

Some helpful forum threads:

Weblogs, Wikis, Sites, Sections Devoted to Combating Bad Bots, Spammers

Thoughts on Dealing with Comment, Referral, Trackback Spam

As I mentioned above, no one approach will be totally effective or even practical in blocking comment spam, referral spam, or trackback spam. Blocking by IP address or host can quickly become impractical, as anyone knows who’s tried to block solely by IP address. Your ban list will grow rapidly, IPs get outdated just as fast, and IPs often come from zombie machines. Blocking by user agent can help, but spammers spoof user agents and you don’t want to block legitimate users. There are known spoofed user agent strings that you can add to your ban list, though, which can help quite a bit. Blocking by referrer can be helpful, but once again your ban list will grow quickly, too, similar to IP lists. Blocking by keywords for referrers and hosts can help cover most spam referrals and hosts, but I’ve also recently found spammers trying more legitimate-looking domain names. Keep in mind that spammers are always coming up with new ways to get around blocking approaches, too.

Largely for these reasons I’ve found it most effective for my own websites to use a combination of several approaches and tools. Each of my websites is different, though, so I don’t do the same things at each site, although there is certainly some overlap.

Here are some helpful articles on ideas and ways of helping to combat the spammers.

Regular Expressions

Learning even just a little about regular expressions can be valuably helpful. Learning more about regular expressions can go a long way with writing leaner mod_rewrite rules and other rules for your .htaccess files.

Robots.txt

Unfortunately, many bots disregard or don’t even look at your robots.txt file. Good ones will, though, and it’s worth creating, even if the bad bots ignore or don’t even look at it.

For my own websites, as long as the bot or spider behaves itself properly, I typically allow it, but I do have exclusions in my robots.txt file. Known bad bots or spiders and bots or spiders that disregard the rules or behave badly are banned from my website via my .htaccess file.

Here’s some information on how to create and check a robots.txt file for your website.

Which Bots or User Agents are Good or Bad?

HTTP Error Codes

Most of us probably know what a 404 error is (page not found), but there are lots more server-side error codes. You can create custom error pages with more helpful error messages, adding rules for them within your .htaccess files if you wish, such as a custom 404 message. You can view this website’s custom 404 error message to see what I mean. Here are some helpful sources for more information about error codes.

Server Vulnerabilities

top

[Home: Mobile · Main]
[Mobile Archives: 2008 · 2007 · 2006 · 2005 · 2004 · 2003 · 2002 · 2001 · 2000
All years (one page)]
[Main Archives]

Copyright © 2000-2008 Shirley E. Kaiser, M.A., SKDesigns. All Rights Reserved.