SpamBayes is a Bayesian email spam filter, one
of the new generation of spam filters which attempt to intelligently
distinguish spam from legitimate email based on the characteristics and
content of the email.
SpamBayes is also an Open Source project, which
means you can't beat the value. It's absolutely free! Even if you
wanted to pay for it, all you can do is donate to the Python Software
Foundation, or contribute your skills if you're a programmer. You can download SpamBayes from
http://spambayes.sourceforge.net,
and you can read all about the software and its background there too.
By itself, SpamBayes is generally not useful to
90% of the PC users. OK, this is a contrived number actually, but by
itself the product is a set of Python scripts and proxies for POP3, IMAP and
Procmail. There's a fair amount of setup effort involved, and submitting
email to be categorized is not the most seamless operation. It's not
exactly user friendly to most folks. But don't despair; that's where the
Outlook Plug-In comes in!
The Outlook Plug-In for SpamBayes neatly
packages all the required run-time components along with extensive hooks into
Microsoft Outlook, and provides it in a simple installer that gets everything
into place without drama and obeys Windows' usual Add/Remove Programs applet
if you later wish to remove it. Once installed, a new toolbar appears in
your Microsoft Outlook with two or three context-sensitive buttons used for
managing SpamBayes and for submitting emails to the categorizer.
Installation
Installation is a snap. Most importantly,
remember this product is for the full-fledged Outlook product, not Outlook
Express. There is a warning about making sure Word is closed if you use
it as your email editor, but frankly it's a good idea to make sure all your
programs are closed. From there on it's just a matter of hitting the
[Next] button a few times.
The first time you run Outlook after installing
SpamBayes, a configuration wizard appears. Unless you've been collecting
junk email recently, most people will choose the first option - "I haven't
prepared for SpamBayes at all." The wizard will offer to create two
folders, "Junk E-Mail" and "Junk Suspects". You may choose your own
folder names if you prefer. These folders are used to store email that
has been categorized by SpamBayes.
The installation will have created the desired
program folder (C:\Program Files\SpamBayes
Outlook Addin\ by default) and a folder for
its database and configuration settings in your Windows user profile folder.
The installer creates no icons on your desktop or in your Programs menu.
The only evidence of its presence is its toolbar in Outlook and an entry in
Windows' Add/Remove Programs control panel applet.
Usage
Although SpamBayes functions immediately based
on some built-in assumptions, it really needs to be trained to be an
effective, personalized tool. Optionally you may also make adjustments
to the balancing act it performs when categorizing your email.
Training amounts to collecting some "ham"
(desirable, legitimate email) in one folder, collecting some "spam" (junk
email) in another folder, and feeding it all to SpamBayes using the SpamBayes
Manager tool. Alternatively, you can train SpamBayes incrementally by
using the [Delete As Spam] and [Recover From Spam] buttons on emails as they
come in.
The training allows for some subjectivity in
the classification of spam. If you decide you really do want to
see ads for penis enlargement, then by all means, you can! And if you
decide that all BCC'd email from your cousin Ralph is junk, SpamBayes can be
trained to know that too.
Pro's & Cons
SpamBayes installs and uninstalls without
fanfare, and appears to do no harm. If you over-train SpamBayes, it will
slow down as the database becomes bloated. Although you can't
incrementally un-train it or pack the database, it's easy enough to start
again from scratch.
Outlook 200 or later is pretty much all this
plug-in needs. It does not matter whether you use POP or IMAP or connect
to an Exchange server.
Outlook still makes it usual noises and plants
the envelope icon in your Windows System Tray even when SpamBayes has
classified an incoming email as junk and moved it to the specified folder.
So for those of you whose workflow revolves around your inbox activity, errant
disruptions will continue.
There's no evidence of SpamBayes' database
being designed for multiuser access. So if you use Outlook from more
than one place, you may have to train each installation individually, or share
out the database folder, or come up with a creative way to replicate the
database.
SpamBayes won't delete your email. But
that's probably a good thing since it's not foolproof. If you're
hell-bent on letting your email quietly disappear you can design a rule to do
it, but I don't think it's wise.
SpamBayes not work very well with languages
other than English.
Effectiveness
I installed SpamBayes both at home and at the
office. A fairly large amount of spam comes to both email addresses, due
mostly to my foolishly naive use of those email addresses on Usenet many years
ago. The big difference between the two is that at home, my email comes
into an Exchange server running Vamsoft's Open Relay Filter, which checks email against
six different DNSBLs before letting it through, but at work the email is
subjected to a crude conglomeration of manually maintained IP, domain and text
filters, quietly discarding anything it doesn't like.
As first lines of defense go, the DNSBLs are
truly the way to go. I receive ten to twenty times the spam at work that
I receive at home, after the respective filtering. The interesting part
here may actually boil down to the characteristics of the email that makes it through that first
line of defense.
After two months of training, nearly all the
spam coming to my personal email address is properly categorized. Yet
35% of the spam coming into the office email address still lands in the
"suspected spam" folder and every once in a while I still see legitimate email
landing in the "spam" or "suspected spam" folders. The training database
at the office keeps getting fatter but accuracy does not seem to be improving.
One would think that with at least 10 times the
volume of spam, the office setup would be more accurately trained than at
home. But - and this is truly just an assumption - it seems that the
text filters used there as a front-line defense may actually be hobbling
SpamBayes' categorizer by not being able to train on some of the more conspicuous,
easily identifiable spam.
Or perhaps the distinction between work
-related email and absolute garbage is too vague!
Conclusion
Bayesian filtering is still in its relative
youth. Likewise, SpamBayes makes no pretense to be a mature product.
It is still in flux, the authors experimenting with various theories and
watching how the spammers respond. But for a product with a fractional
version number it seems to be as good or better than many shareware and
commercially available competitors.
As with any anti-spam measures, one needs to
keep a sharp eye not to lose legitimate email. SpamBayes does a terrific
job of catching the few emails that make it through my gauntlet of DNSBL
checks at home, and has turned my office inbox from an unusable littered
wasteland back to a genuinely useful work tool.