E-mail security: detecting spam (II)

As spam filters get more advanced, less spam is allowed to enter into user’s inbox so the business model of spammers gets hurt. Instead of thinking that people don’t really like to receive spam and they would prefer less intrusive ways to get publicity, they try to workaround these filters in, sometimes, really clever ways. So, spam filters have to be continually modified and adapted to not fall into these new tricks.

As Bayesian filtering is the most common used technique, this is what spammers try to escape more frequently. We told that Bayesian works by calculating the probability that a word is from spam or from legitimate mail, so what spammers do is modify the messages so they get more probability of being legitimate mail.

One of the ways to do this is insert random but common words in spam, so the spam words contribute less to the score and the message goes under the filter. We can see an example of a real spam:

Spam1

The real content of the spam is contained at the bottom but at the beginning of the e-mail there are some lines with text which come from the novel The Master and Margarita and try to hide the fact that this is an spam.

Another way to try to evade the filters is by sending the content as an image. This technique is also used in the last example we have seen, but it’s a really common one, as we can see in this other e-mail:

Spam2

Although this may look like an HTML email, in fact all the content is inside an image, with no text to be analyzed by the filters, so it gets more difficult to identify the message as spam because we have no words to compute the probability. Sometimes this technique works against the spammer, as it’s quite strange for a legitimate mail to contain only an image with a link, so some more advanced filters might detect this message as spam correctly.

One last technique is the use of unknown or made-up words to confuse the filter. As Bayesian works by looking the probability of already seen words and knowing if they are more likely to occur in legitimate mail or in spam, when an unknown word is found the filter can’t really know if it belongs to spam or not, so it can’t classify it correctly and the spam might just evade the filter. Let’s see an example:

Spam3

We can see that instead of ordering the message says orderinq with a Q as the last letter, which looks quite similar to the G. Also, the word Viagra is not written with a V letter at the beginning, but with the slash and forward-slash symbols like this \ /. There are more example in these two sentences, as almost every word is modified to evade the filters.

Sometimes, it gets so difficult for spammers to be sure their junk will reach the recipient that the messages they sent have almost no sense and it is quite hard to know what they are really trying to advertise.

Spam4

Can you guess what they are trying to say?

If we have a Bayesian filter which checks our e-mail it is a good idea to keep it updated and trained. It’s quite easy and shouldn’t consume a lot of our time, unless we receive incredible amounts of e-mail. To do this we should check from time to time the folder where spam is moved to check if there has been any false positive (that is, a legitimate mail message which has been classified as spam). If there is any, we must tell the filter that message is not spam so it changes the probabilities of the words included in it. Checking this folder from time to time is a good idea anyways, so we don’t lose any important e-mail which might have been miscategorised. It’s also important when we receive spam which is not filtered as such, not only delete it, but tell the filter that message is spam so we can keep it trained.

There are other methods to classify spam which are not based in Bayesian filters and we will see them in next posts.


19 Responses to “E-mail security: detecting spam (II)”


  1. 1 [GEEKS ARE SEXY] Tech. News

    Spam is truly a plague on society. I manage a 150 users environment, and my users receive around 2000-3000 spam email per day. It was getting a nightmare.. I installed GFI Mail essential, which is a centrally managed anti-spam application for Microsoft Exchange servers, and it solved 99% of my problem.. The software uses a combination of 10 different methods to detect spam, and it’s been incredibly effective up to now.. If you guys are interested, I wrote about it right here:

    Corporate anti-spam at its best: GFI Mail Essentials

  2. 2 boohiss

    That last example looks like you were viewing it in plain-text mode when it was meant to be viewed in HTML or something. The letters V I A G R A would be a normal sized font and the letters between would be smaller, less visible fonts.

  3. 3 jme giffo

    As well as Spam being a plague on society its also a good education for us at such an early stage of this new world we know as the internet, if we can get this problem sorted now (i.e. the next 5 to 10 years) we will be able to create a stable base the future,

    the problems stated in this blog entry of the bayesian filter, wouldn’t over time the bayesian filter pickup random spam words such as {jio1j2 dijj i 1jijd12} etc, in theory in the long term the bayesian filter will work, we might just have to wait 50 years lol

    I look forward to your next blog entry on other methods of attempting to classifiy spam :)

  4. 4 Brian

    As the President of Usermail.com, I can attest to the work it takes to keep spam out of user’s email boxes. One issue we often times run into is that we can implement a new technique in order to catch spam. However, if this technique tags just 10 messages that were not spam, it’s a failure. There is a very thin line between what people want in their inboxes and don’t want. I applaud the efforts of Openspf, Yahoo, etc. as something needs to be done in order to get this problem under control.

  5. 5 David Cooper

    I use SpamAssassin, and find it works pretty work. An amazing amount of stuff gets pulled out. But I’ve stopped bouncing emails to false addresses to my server. I couldn’t beleive that my mail server was getting clogged trying to reject the emails back to the originator.

    Dave
    Listen to the parrotscience podcast http://www.parrotscience.com/podcast

    And I can’t stand the amount of spam I get in forum posts… does my signature count as spam?

  6. 6 z

    spam is really simple to fix as long as sending email is no longer free.

    money. (one of) the greatest inventions of all time. if i decide my incoming email is $.10 a piece, then let them send all the spam they want – i dont care. 2000 spam email per day is $200/day. not bad at all.

    of course i wont charge my friends, so there’ll be a white list, so no money involved there. and if there’s someone who’s not yet on my while list, heck, ten cents isnt that much of a deal to pay, especially as nine out of ten of these are going to be one time or perhaps even waived on receipt of email.

    there could be other means to send email for the first time so addressee can add sender to a white list. like a web form (protected with turning captcha type test) on a system known to be benign.

    but bottom line, unless you’re on white list, the only way your message is getting in addressee’s inbox is if you pay postage.

    why do people have such a problem with adding postage to email anyways?

  7. 7 mark

    It’s very tought to beat spam. I know cause I’m getting more and more spam in gmail everyday.

  8. 8 boobtoob

    try out popfile. all of the stuff above popfile knows how to avoid.

    free, open source, multiplatform, multilanguage.

    http://getpopfile.com

    enjoy! :)

  9. 9 madelman

    Hey, tahnks everyone for your comments. You have given me a lot of good ideas for the next posts.

    Expect some more posts about evasion of the filters and different techniques for classifying spam.

  10. 10 justin

    @ z (comment number 7)

    Creating email postage isn’t a solution.

    The amount people are willing to pay for postage (you suggested 0.10 USD) multiplied by the number recipients the spammer is trying to send to, creates that additional cost. But that cost is covered by the few people who buy through the spammer’s website. So while not as lucrative as before, it is still a profit.

    If you want to see an example of companies sending spam with postage, go outside and check your snail-mailbox.

  11. 11 James Heinrich

    I ended up writing my own PHP-based mail filter (http://phpop3clean.sf.net) to catch all the above-mentioned techniques (and more). Writing my own gave me the flexibility to adapt to the latest filter-evasion techniques (almost) as fast as spammers come up with them. The latest almost-impossible-to-filter technique spammers use is image spam (spam message as an image) with randomized pixels throughout the image (generate spam message, but randomize 10 pixels of the image for each recipient the spam is sent to).

  12. 12 Ashok

    If one of the common ways to distribute a spam-run is to use some random insecure desktop machine, isn’t it just a matter of time before the padding they use is derived from real threads the user of that machine has participated in? Hell, you could even make it look like a plausible conversation some of the time.

  13. 13 Jenny

    I was going through this article how egreetings can be used to send spam. And since the mail is sent out by the greeting card company the actual mail will be sent to your inbox.

  14. 14 Pete

    Guys, if you actively want to stop the spamming mail servers, google for teergrubing.
    You can set up a Teergrube server simply and all it uses is the spammer’s server bandwidth and resources by keeping the spammer’s email server transmissions open for a few hours if they send a large number of emails to you in a short time.
    It also ‘encourages’ absent admins to look after their servers to tighten them and prevent open relays.
    http://www.iks-jena.de/mitarb/lutz/usenet/teergrube.en.html

  15. 15 Jorge Santos

    There’s a new email protocol proposal called EmailXT that solves the current email problems and adds more features to it. It is still a work in progress but a proof-of-concept application already exists.

  16. 16 DominoG

    But they have a new technique every day.

  1. 1 Article on spam detection
  2. 2 enemieslist.com: Spam News
  3. 3 » E-mail security: detecting spam (V) » Becoming paranoid » Tips about computer security, privacy and staying safe online

Leave a Reply





Sponsored links


Search

Search in the Becoming paranoid Archive


Subscribe

Enter your email address:

Delivered by FeedBurner