E-mail security: detecting spam (IV)

Knowing how Bayesian filtering works we will try to find some programs which use it and see which is the most useful one for us. I&ll give a list and you should choose the most appropiate for you.

We can split the filtering programs depending on where they work: on the server or on the client. The programs working on the server have some advantages, as they look at more mail messages (they see mail from all users in a system) it is easier and faster to train them. Furthermore, there is only one place to administer it, making the administrator task easier. At the same time, the users don't need to receive the spam so they don't spend additional bandwith and time. On the other hand, they are not so customizable by the user, which might prefer his own techniques to detect spam and false postivesand, if the user doesn't have access to the server he will not be able to install it.

One of the most known server-side filtering software is SpamAssassin, which uses different checks to test for spam, one of them being Bayesian filtering. Each one of this tests adds or substracts a score from the mail and at the end of the runs this score will determine if the mail is spam or not. Amongst other these test include mail-header tests, text-content rules, white-lists and black-lists and collaborative databases, making this program one of the most accurate. This can also be used as client-side filtering, although the installation will not be as easy as others.

Another aproach to server-side filtering is the one used by ASSP which works with any king of mail server, as it stands as a proxy (getting the data and transmiting) in front of the real mail server and filters the data before it is delivered. It also uses Bayesian filtering and allows the settings of white-lists so you can define addresses which will be always accepted. It can also scan messages against viruses, which will drop even more malicious mail.

The last server-side software we are going to see is DSpam. This has some characteristics differentiating it from other Bayesian filters. In this case, the tokens are not only analyzed one by one, but also in pairs, which gives a better view to know if a mail is spam or not. It laso uses Bayesian Noise Reduction and other new approaches to filtering, which promise to give a hight detection rate. It includes a web-based interface to administer it, where each user can train it depending on the personal tastes.

If we don't have access to the server or we don't want to play with it, we can use a client-based filter, installed in our computer which will analyze the mail once it has been downloaded (or while downloading) and will flag it as spam or legitimate mail. The advantage of this kind of approach is that it can be highly integrated in our mail reader, so might be easier to use by the user.

If we use Thunderbird, it already includes a filter, as we saw in the last post about spam. This is really easy to use, as we only have to click a button to tell it if we think a message is spam and once it is trained it will move automatically all spam to a predefined folder, or can even delete it automatically (I don't recommend it in case of false positives).

If we use Outlook instead of Thunderbird, one good option is SpamBayes. This software also uses some new approaches which are explained in the background page. One interesting characteristic of SpamBayes is that it doesn't have only two states, spam and non-spam, but also a third one, unsure, when it doesn't know how to classify a message. This way, we can choose what we want to do with it: keep it, delete it or use it to train the program. Although it includes a plugin for using it with Ooutlook, it can also be used with other programs as a proxy, and even in other operating systems like Linux or Mac OS X.

To finish this list, we are going to have a look at one of the first mail filters I used. It's called POPFile and works as a proxy in front of the mail server. Our mail client will connect to POPFile and POPFile will connect to the mail server, analyzing the mail as it downloads. One of the things I like most about it is the ability to classify any kind of e-mail, not only spam. So, POPFile can distinguish between work-related mail, mail from our children,… or any other different classification we want to do. We only have to create the categories and assign some messages to each one to train it and it will classify the received e-mails. It also has a web-based interface to manage all of this.

The list of spam filters is quite long and this is only a selection of some of them. You will have to see which one fits you better and use it. Remember to always train it correctly before you do automatic actions on the mail received as you could lose some mails if you don't do it correctly.