As a web site author or operator your choices are
either to moderate your blogs and guest books, where every post requires your
approval, or require registration before allowing posts, or try some automated
way of repelling the spammer's link-loading attempts. Since the first
option involves a huge amount of ongoing maintenance and the second makes life difficult
for honest visitors, I'm a big fan of the automated approach.
With the automated approach, we need to discern
whether or not the spammer really is a "guest", and we need to check the content
being posted. Then we can easily repel the attempts to post the spam, and optionally we can implement mechanisms to punish the
perpetrators. It's remarkably easy to implement a strong defense without
discouraging real visitors.
Guest or not, here I come...
As you navigate a web site, each time you click
on a hyperlink the web browser sends information to the web server about which
page you came from. This is known as the "HTTP referrer". Since a true
guest to your web site would most certainly have clicked on another page on your
site to reach the guest book, the guest book page should see the referrer value as an
identifiable page from your web site. If it's empty or its value is some unrecognizable
web site or page, we know we've got a spammer in our clutches. We can then
simply reject the post or we can go so far as to entertain ourselves by abusing
the spammer.
You can check for the referrer in two places for
the best defense. First check from the page to
which you POST the submission (the target of your FORM). The second place
to check is the page
that contains the FORM (as long as it's not your home page). We must check to make sure the
visitor arrived on that page from one of your own and not an outside source.
The first angle is the easiest. In ASP you can see the URL of the
referring page with this code:
<% strPageURL = Request.ServerVariables("HTTP_REFERER") %>
Then your code can check strPageURL
to see if it matches the URL of the page containing your form.
The second angle takes a little more work.
In JavaScript the referrer is available in the document.referrer
property. Don't be misled by the different spelling - these are not typographical
errors. What we need to do is pass the referrer as a value hidden in your form, by including this
HTML/Javascript code before the applicable </FORM> tag in the web page
containing your form:
<script language="JavaScript"><!--
document.write('<input type="hidden" name="Referrer" value="');
document.write(document.referrer);
document.write('">');
// --></script>
Sure, we could have checked the referrer
immediately on the form page, but this is more entertaining. By passing along the value to the target of
your form instead of using it right away, you can delay your handling of the
spammer and make him waste his or her time and effort filling out the form.
You can grab this passed-along value of the
form page referrer in ASP when the form is posted by using this ASP code in
the form target page:
<% strReferrer = Request("Referrer") %>
This way, the page that processes your form can
see whether or not the form's page had a valid referrer.
If you wished to be very sophisticated, you can
use a cookie to track how many pages on your site have been read by the
visitor, and choose a minimum number of page hits to consider one of your
visitors to be truly a guest. I haven't seen the need, yet.
Here a link, there a link...
This third angle on spam prevention is a little easier. If the post
consists of many instances of "http" and HTML tags, i.e. "<A HREF=...>", etc.,
and/or you find those strings in fields meant for names or other text, then we
know we've got a spammer. You can use regular expressions to easily count
the occurrences. Here's a function in ASP that takes a regular expression and the
character string to be checked, and returns the number of matches.
Included is a sample expression:
<% Function RegExpCount(strPattern, strString)
Dim regEx, Match, Matches
Set regEx = New RegExp
regEx.Pattern = strPattern
regEx.IgnoreCase = True
regEx.Global = True
Set Matches = regEx.Execute(strString)
RegExpCount = Matches.Count
End Function
intIllegalTags = RegExpCount("<a href|<[a-z]>|</[a-z]>|<br.?>|http", strGuestComments) %>
In this example,
strGuestComments
contains the text of the attempted guest book post. If you decide that someone
is permitted to post just one link in your guest book, just check to see if
intIllegalTags
is greater than 1, otherwise just check for it being greater than 0.
Now that we've got 'em...
Once we we've caught the spammer committing a
crime, we
need to choose a punishment. We can simply throw an error message and keep
the post from being entered, or we can issue threats, or we can just waste the
spammer's time. Since the spammers like to waste our time, I enjoy the
idea of wasting
theirs. A critical component in this scheme, since my site is ASP -based,
is this:
http://authors.aspalliance.com/stevesmith/articles/sleeptimer.asp
Steve Smith's SleepTimer is a DLL which, once registered on your
Microsoft IIS -based web
server, allows you to insert processing delays into your web pages. When I detect
spammer activity, I use delays of a minute or more before rejecting their
submission. The more links they attempt to post, the longer it delays a
response.
Now the big question is, do you tell them why
their post is being rejected after they've waited? Or do you even tell
them it's been rejected at all? I'm not sure which is more fun - telling
them to piss off or letting them waste more time trying to figure out if things
are working.
Here's a tip - if you're going to keep your spammers waiting,
you should make sure they don't get bored or frustrated too quickly. So
instead of letting them stare at a blank page while your delay is going, you
should give them something to chew on. That should keep them from hitting
the [Back] or [Stop] buttons prematurely. This example displays a creeping
line of asterisks as the seconds go by:
<% Sub Delay(seconds)
response.write "<p>Working - please be patient"
For x = 1 to seconds
Call objTimer.DoSleep(1000)
Response.Write " *"
Next
response.write "</p>"
End Sub %>
But to use this you need to make sure the first line of
your ASP code contains:
<% Response.Buffer = false %>
Turning off buffering makes sure that the web browser sees
everything as it's happening. Otherwise your web browser waits until ALL
the HTML on your page is downloaded before anything appears at all. And if
you're using shared borders in Microsoft Frontpage, turn them off. Otherwise your
page gets buffered regardless of the setting in ASP.