free adobe palace script font download

Adobe Photoshop CS5 cheap free safe download of adobe acrobat reader download crack for adobe authorware 7 cheapest download stand alone adobe flash adobe elements download

adobe cs2 download free crack key generator

Adobe InCopy CS5 for Mac cheapest adobe flashplayer free download adobe version cue update download windows cheapest adobe illustrator 10 free download adobe photoshop 70 download

download adobe photoshop cs2

Adobe Creative Suite 4 cheap adobe photoshop cs3 iso file download free adobe acrabat reader download buy cheap adobe acrobat raeder v7 download free download adobe acrobat professional 6

adobe photoshop font download

Adobe cs5 Design Premium cheap adobe flash reader download free ware download adobe photo shop buy cheap download adobe imageready free adobe dream weaver 8 download

free download adobe photoshop cs2

discount Adobe Flash adobe 8 reader download adobe version 7 download buy cheap manually download of adobe flash player 9 how to download adobe premiere pro free

adobe illustrator download free

cheap PDF to-EXE Converter adobe photoshop font download download adobe indesign cs3 buy cheap adobe reader download full download adobe acrobat megaupload

adobe illustrator cs3 crack download

Autodesk Simulation 2012 buy cheap adobe 5 free download adobe purchase products maintenance download contacting typeface cheapest adobe reader free download for windows vista download adobe premiere pro

adobe acrobat 6 full download

AutoCAD Electrical 2012 discount adobe reader vista download free download adobe photoshop cs3 extended me trial buy cheap download adobe professional download adobe svg viewer

download adobe gamma

cheapest Autodesk AutoCAD download adobe download manager free acrobat adobe download cheapest download adobe reader fur windows 2000 free download adobe flash player

crack adobe photoshop cs3 download

buy cheap AutoCAD 2010 adobe pagemaker download full free free adobe image ready download discount adobe illustrator cs2 trialware download adobe flash player 8 free download

free download adobe premiere pro cs3

AutoCAD 2012 buy cheap adobe image ready download download adobe photoshop elements discount download free adobe standard sf86 sf 86 sf 86 download adobe pdf

adobe download photo shared shop

cheapest AutoCAD for MAC adobe premmiere free download download adobe shockwave player discount free adobe acrabat reader download adobe flash player and download

iran download adobe photoshop

cheap adobe acrobat x free download of adobe reader 8 download and edit adobe files buy cheap adobe fash player download adobe after effects full download

adobe flash lite download

adobe acrobat x suite discount buy adobe photoshop download free download adobe acrobat reader professional 6 cracked discount adobe flash player 9 active x download free adobe ilrator download

adobe premiere tryout download

buy cheap adobe creative suite 4 download adobe photoshop 50 download adobe photoshop full cheap adobe photoshop download discount software download adobe 5

download adobe gamma download

discount adobe creative suite 5 how to download adobe on psp free ware download adobe photo shop cheapest adobe flash player version 9 free download free download adobe lightroom

cnet download adobe

adobe cs5 cheap adobe player download center free download adobe ilustrator discount best adobe acrobat download adobe indesign 2 for windows download

adobe acrobat download for mac

Adobe cs5 Design Premium cheapest adobe audition full download adobe photoshop element download buy cheap adobe acrobat reader 5 0 free download adobe download 8

adobe player 8 download

cheapest Adobe CS5 for mac adobe gamma download adobe acrobe free download cheap free download adobe pagemaker can we download adobe flash player file

download adobe flash player stand alone

buy cheap Adobe cs5 Production Premium adobe internet explorer download security adobe cs3 patch download discount adobe premier download crack free download adobe flash

download and install adobe flash onto my computer

cheap Adobe Dreamweaver CS5 download adobe photoshop 70 adobe pdf reader free download discount where can i download adobe flash player 9 download adobe creative suite 2 premium

how to download adobe flash files

Adobe eLearning Suite discount adobe premmiere free download free download adobe photoshop cs2 cheap pc wont let adobe plug in download download adobe illustrator cs

adobe flashplayer 9 download

Adobe eLearning Suite 2 buy cheap direct download links adobe download adobe illustrator cs cheap adobe illustrator 8 download adobe shockwave player download

download adobe media encoder

Adobe Flash Catalyst CS5 cheapest download adobe premiere effects adobe photoshop elements download discount adobe lightroom update download mac osx adobe reader kostenloser download

adobe after effects cs2 download

Adobe Illustrator CS5 cheapest adobe player download center adobe download free premiere discount adobe illustrator 8 download adobe flash direct download

free download for adobe streamline 4

cheapest Adobe Indesign CS5 free download adobe after effects for mac download adobe reader cd cheapest adobe acrobat reader free download download adobe ultra

adobe creative suite 3 download

cheap Adobe Photoshop CS5 adobe audition download free adobe acrobat reader 6 download cheapest adobe acrobat writer download adobe 10 download

macintosh download adobe acrobat reader

Adobe Photoshop Lightroom 3 discount adobe audition 3 free download adobe macromedia flash player 7 download discount free adobe photoshop full download adobe photoshop elements 5 free download


E-mail security: detecting spam (III)

Before talking about other methods for detecting spam, let’s have a closer look to Bayesian filters and programs using this technique to classify mail. This will be a technical post, so it might not interest to all of you. In next posts we’ll see some software which uses these filters.

I’m not a mathematician, so I might make a few errors when trying to explain the theory behind the filters. Please forgive me. The article in Wikipedia explains it better than I can do it.

The main formula where Bayesian filtering stands is:

Bayes1

which says that the probability of an e-mail being spam given the words contained in it is equal to the probability of these words appearing in a spam message, multiplied by the probability of a message being spam divided by the probability of the words appearing in any message.

Wow, it looks quite complicated. One of the most known papers about this kind of filtering is A plan for spam from Paul Graham. We’ll see some code from it.

Well, to be able to calculate this result we need, in first place, to break the message in words, which are called tokens, from where the probabilities are taken. This partitions are really important, as they will affect the final result depending on how they are done. If we have the sentence It’s a shame we could break the words in It-s-a-shame or maybe in Its-a-shame or even It’s-a-shame and each of them might give different results when used.

Once the message is broken in tokens, we can calculate the Pr(word|spam) with the next code (this was code in Lisp originally):

(let ((g (* 2 (or (gethash word good) 0)))
      (b (or (gethash word bad) 0)))
   (unless (< (+ g b) 5)
     (max .01
          (min .99 (float (/ (min 1 (/ b nbad))
                             (+ (min 1 (/ g ngood))
                                (min 1 (/ b nbad)))))))))

When we have calculated the probabilities for all the tokens in the message, we get the most relevant ones (the ones which probability is farther from 0.5, so the nearest to 0 or 1). Paul decided to use the 15 most relevant and stores them in a list called probs, applying the next formula to it:

(let ((prod (apply #'* probs)))
  (/ prod (+ prod (apply #'* (mapcar #'(lambda (x)
                                         (- 1 x))
                                     probs)))))

If the result is bigger than 0.9 we consider that the e-mail is spam and classify it as such. So, although the theory may look hard once implemented it is far easier. Maybe the only problem with this code is it’s Lisp, which not so many people know about.

Let’s make this even easier by looking at the source code of Mozilla Thunderbird, the famous opensource mail reader, which includes a Bayesian module to classify mail. The implementation in Thunderbird is slightly different from the original, but the concept remains the same.

The algorithm is implemented in the file mozilla\mailnews\extensions\bayesian-spam-filter\src\nsBayesianFilter.cpp in the function classifyMessage. It’s implemented in C++, but we are seeing it in “pseudo-code”. It uses some different variables:

  • mGoodCount: number of non-spam messages classified
  • mBadCount: number of spam messages classified
  • mGoodTokens: hash table with good tokens and number of times they have appeared
  • mBadTokens: hash table with spam tokens and number of times they have appeared

Take care, as the same token might appear in both hash tables with different number of apparitions. For example, the word hello is equally probable in spam and non-spam messages. When the algorithm is not yet trained default values are assigned:

if (mGoodCount == 0 || mGoodTokens.count() == 0)
    message is spam
si (mBadCount == 0 || mBadTokens.count() == 0)
    message is not spam

If the algorithm has been trained then it’s applied with the next formula (adapted from Bugzilla):

for each token {
 hamcount = number of token appearances in non-spam
 spamcount = number of token appearances in spam 
 hamratio = hamcount / nGoodCount
 spamratio = spamcount / nBadCount
 
 prob = spamratio / (hamratio + spamratio)
 
 n = hamcount +  spamcount
 prob = (0.225 + n * prob) / (.45 + n)
 
 distance = abs(prob - 0.5)
 if (distance > = .1) {
  token.distance = distance
  token.prob = prob
 }
}

With this code, we have the probability for each token. This is saved in a list sorted by distance (distance is taken as the difference between probabilities) and the first 150 elements are taken. A probability distribution chi2 is calculated and if the result is bigger or equal to 0.9 the message will be classified as spam.

But, we don’t need to know all of this unless we want to write one filter ourselves. There are lots of already available filters which work quite good and get rates of detection around 99%, sometimes even better than a human.


Leave a Reply





Sponsored links


Search

Search in the Becoming paranoid Archive


Subscribe

Enter your email address:

Delivered by FeedBurner