Exim and Bayesian Filtering with Bogofilter

Configuring Exim for Bogofilter:

This document explains how to mark incoming email as Spam or Ham using Bayesian algorithms. The tool used to analyse the mail is Bogofilter, written in C, by Eric Raymond. This configuration has been successfully tested on a Red Hat 7.3 system with Exim 4.12 and Bogofilter 0.10.2.

Installing and Training Bogofilter:

Firstly, we need to download, compile and install Bogofilter. Since I install most applications compiled from source in their own folder in ‘/usr/local’, this is how I configured it before running ‘make’ and then ‘make install’:

./configure --prefix=/usr/local/bogofilter

I create a link to often used binaries in /usr/local/bin so that they are in everyone’s path. Here I link ‘/usr/local/bogofiltel/bin/bogofilter’ to ‘/usr/local/bin/bogofilter’.

Next we create a wordlist directory readable by Exim (I created mine as ‘/etc/bogofilter’) and train Bogofilter on existing messages. I’ve been keeping most of the Spam I’ve gotten recently, so all I had to do was run this:

bogofilter -d /etc/bogofilter/ -I ~/mutt/spam -s

Then give it some valid emails like this:

bogofilter -d /etc/bogofilter/ -I ~/mutt/zaidi -n

Exim Configuration:

For Exim, all we will do is add an ‘X-Bogosity’ header by piping each message through Bogofilter. This will add a ‘Yes’ for Spam and ‘No’ for Ham, plus a ‘spamicity’ rating.

In the Routers section of the Exim config file, we add this just before ‘userforward:’:

  domains = +local_domains
  # the next line looks for X-flag to see if the mail has been scanned
  condition = ${if !def:h_X-flag: {true }}
  driver = accept
  transport = bogofilter_filter

Under the Transports section, add this after the ‘remote_smtp’ transport:

  driver = pipe
  command = /usr/bin/exim -oMr spam-scanned -bS
  use_bsmtp = true
  # next line adds the X-flag so we will later know mail has been scanned
  headers_add =  X-flag: true
  transport_filter = "/usr/local/bin/bogofilter -d /etc/bogofilter -lcd -p -e"
  group = exim
  return_fail_output = true
  user = exim
  home_directory = "/tmp"
  log_output = true
  return_path_add = false

Before local delivery, remove the ‘X-flag’ header by adding it to ‘headers_remove’ in the ‘local_delivery’ transport,:

  driver = appendfile
  headers_remove = X-flag
  file = /var/spool/mail/$local_part

Now you have a working system that scans each message passing through it. Once the wordlist database has been trained properly and starts marking messages correctly, you can use your .forward file to separate Ham and Spam. Something like the following will do:

if $header_X-Bogosity: contains "Yes"
  save $home/mutt/suspicious

If you are confident of the system, you could freeze the message or send it to /dev/null instead of saving it.