E-mail security: detecting spam

If the volume of spam we receive is overwhelming us and we can’t keep up with classifying , we need an automated way to separate spam from legitimate mail. One of the most famous methods was proposed proposed by Paul Graham in a paper called A plan for spam, where he talked about some algorithms which use probability to classify each message.

The basis for this method is a previous training of the algorithm, where we must feed it with spam messages and legitimate mail telling which is which. With this data, the algorithm breaks the messages in words and assign a probability to each word for being in a spam message and another for being in a legitimate mail.

When a new message is received, it’s broken in words like the training messages and the saved probabilities of each word are analyzed with a formula called Naive Bayes, which returns a final probability for the mail being spam or not.

Most of the known mail classifier use, at least, this method, usually combined with others, but we can see this is a really powerful way of classifying.

Another approach to classification is the one used by Spamassassin which has a series of rules that assign some points when it applies to the mail. As more points are assigned the mail has more probability of being spam, and it is classified as such when it surpasses a threshold.

Spamassassin also uses the Bayesian filter but it’s not the only way to check for spam, as it usually has distinguishable characteristics which may make it different enough from legitimate mail to be easily classifiable.

But spammers are adapting to the measures, modifying the mails they send so they are not detected as spam by the filters and it’s necessary to tweak these filters and find new ways to throw spam to trash.


Leave a Reply





Sponsored links


Search

Search in the Becoming paranoid Archive


Subscribe

Enter your email address:

Delivered by FeedBurner