[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [User] new spam technique (fwd)



On Thu, 21 Aug 2003 hcrouch@mchsi.com wrote:

> ----------------------  Forwarded Message:  ---------------------
> From:    Rob Funk <rfunk@funknet.net>
> 
> So when Bayesian spam filtering came along, letting everyone automatically 
> detect spam based on the words used within, the spammers retaliated by 
> included random words from a dictionary or just "words" made of random 
> characters.

I've found that for the most part once trained the bayesian filters (at
least bogofilter) will pick up on the random character words.  I've not
had one of those spams hit my inbox in over a month (that is, they're
properly sorted to my spam box).  Every once in a great while one will
slip by.  The last one that did contained very little text (as in, the
body was small).  Bogofilter was trained accordingly.

The key with bayesian (at least bogofilter) is to have a good amount of 
spam to prime the pump with to begin with.  Otherwise you'll be beating 
your head.  In addition, I found that I had to crank bogofilter up from a 
"5" that would pop it to my spam box to a "3", basically meaning that the 
spam threshold was set lower.

It just happens that I've got an account that does nothing but collect 
spam.  The beauty of it is that all of the spams have been collected via 
the buckshot method.  The email address has never been published, 
anywhere, anytime.  Every email that makes it to my account on that server 
has done so only because of the buckshot method.  The volume of spam these 
people get is astronomical.  Thus, I had a good source of many different 
varieties of spam to get my spam database primed with.

> But now they've gone a step further -- they're now making use of the 
> multipart/alternative MIME type.  Netscape pioneered the use of 
> multipart/alternative to give both a plain-text and an HTML view of a 
> message, with the client choosing the version they prefer.  Now the 
> spammers are sticking reasonable-looking text (e.g. a piece of a history 
> essay) in the text/plain part, and their ad in the HTML part.  Most modern 
> mail readers will just show the HTML, but the spam detector is looking at 
> the whole thing.

So then, where's the problem?  As the writer points out, the filters look 
at the entire text of the email.  The spam words will still score on the 
spamdar and ultimately tag it as spam.  In addition, you can setup 
bogofilter to input anything it sees as spam into the database 
automatically.  So, anytime I get something bogofilter classifies as spam 
bogofilter automatically adds the text of the email to the spam database, 
thus upping the score on the word(s).  

I can see what the spammers are trying to circumvent, but ultimately I 
don't think they'll be as successful as they think because their wares 
will still trigger a spambin response. 

> I'm starting to think that the spammer and anti-spammer industries are 
> where all the innovation is happening these days.

Amen.

Sean...


--
Believing I had supernatural powers, I slammed into a brick wall.
	--Paul Simon
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
KG4NRC  http://www.rimboy.com  Your source for the crap you know you need.


-
To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.