Bogofilter
is a mail filter that classifies mail as spam or ham (non-spam)
by a statistical analysis of the message's header and content (body).
The program is able to learn from the user's classifications and
corrections.
Bogofilter is or can be integrated with graphical mailers,
such as KDE's KMail,
GNOME's Evolution
or Claws Mail
(formerly known as Sylpheed-Claws), or it is run
by a mail delivery agent (maildrop, procmail) script to
classify an incoming message as spam or ham (using
wordlists stored by BerkeleyDB). Bogofilter provides
processing for plain text and HTML. It supports multi-part
MIME messages with decoding of base64, quoted-printable,
and uuencoded text and ignores attachments, such as
images.
The
statistical technique is known as the Bayesian technique and its
use for spam was described by Paul Graham in his article A
Plan For Spam in August 2002. Gary Robinson, in his weblog
Rants
(September 2002), suggested some refinements for improved discrimination
between spam and ham. Bogofilter's primary algorithm uses the
f(w) parameter and the Fisher inverse chi-square technique that
he describes. Paul Graham's new article Better
Bayesian Filtering (January 2003) suggests some useful parsing
improvements.
Bogofilter
is written in C. Supported platforms: Linux, FreeBSD, Solaris,
OS X, HP-UX, AIX, RISC OS, SunOS, OS/2 …
|