[Coco] Re: Spam

Mon Jan 16 14:46:13 EST 2006

Dennis Bathory-Kitsz wrote:
> At 01:34 PM 1/16/06 +0100, you wrote:
> 
>>Dennis, any clues?
> 
> 
> Nope.
> 
> As I recall, three have slipped through to the maltedmedia list itself in
> the past two years, and none that got through to this list from Yahoo or
> Gmane.
> 
> It was explained to me once, but I don't remember the specifics ...
> something about how the header is formed so it looks like it came from the
> list itself during the forward from the receiving mail server (which runs
> several anti-spam programs) to the distributing list server. I know that it
> happens on Yahoo groups now & then, and from what you say, apparently on
> Gmane (where they are very diligent about catching & squashing these).

This is what was tagged on the message, presumably from gmane:

: X-Spam-Report: 29.0 points;
: *  2.2 RCVD_HELO_IP_MISMATCH Received: HELO and IP do not match,
: but should

| Original-Received: from unknown (HELO 216.92.131.37) (213.85.190.61)
|	by qs281.pair.com with SMTP; 15 Jan 2006 22:26:13 -0000

Resolved qs281.pair.com to 216.92.131.37

This is a trivial to detect trivial forgery, zombie used to do the 
spamming is spoofing the I.P. address of a mail server in the receiving 
domain in the hello address.  What should be present is the fully 
qualified domain of the sending mail server, not an I.P address.

While many real mail servers will not have correct information here, and 
the RFCs do not require it, this specific condition is a 100% indication 
of spam.

No real mail server would ever have be saying hello with the I.P. 
address of a receiving mail server, this so it should be a simple test 
to reject spam.  I do not know how hard it would be to implement in a 
mail server.

The SpamAssasin script obviously does not know about this long time 
spamming script because it should have noticed that it was the same as 
the receiving mail server I.P. and therefore giving it a score of 100% 
spam.  This alone would make the rest of the tests a waste of CPU power.

This appears to be a implementation problem with SpamAssasin, it seems 
to always run all of it's tests, instead of the minimum needed to 
classify an e-mail as real or spam.

  *  4.0 RCVD_NUMERIC_HELO Received: contains an IP address used for HELO

Given a large quantity of mail, this is not good enough to use for 
filtering.

  *  0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
  *      [score: 0.5000]

Bayesian filtering has shown not to be reliable with large quantities of 
mail.

  *  2.0 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
  *      [<http://dsbl.org/listing?213.85.190.61>]

This is a 99.99% indication that the message is spam.  The only way to 
get into the list.dsbl.org is to have a confirmed security problem on a 
machine.

The only way to get off the list.dsbl.org is to have a working 
postmaster or abuse address that can be derived from the rDNS for the 
mail server.  There are a small number of real mail servers that have 
trouble getting off of the list.dsbl.org because they either do not have 
a valid rDNS as required by RFC, or their mail server automatically and 
silently delete all e-mail to the postmaster or abuse addresses.

So it is possible that real e-mail will be blocked by list.dsbl.org, but 
only if the ISP owning that I.P. space is totally incompetent.

  *  3.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in
  *      bl.spamcop.net
  *      [Blocked - see <http://www.spamcop.net/bl.shtml?213.85.190.61>]

This is only an 80 to 90% chance of spam, because of people accidentally 
self reporting, multi-hop exploits,  and rare parsing errors by the 
robot.  But this is good enough to trigger a content scanner to look for 
other spam clues.

  *  4.0 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
  *      [213.85.190.61 listed in sbl-xbl.spamhaus.org]

This is a 100% indication that the e-mail is spam.  Since the XBL has 
been in existence for several years, I have not heard of an incorrect 
listing.  Once this test fails, there is no point in running any other test.

  *  0.4 URIBL_AB_SURBL Contains an URL listed in the AB SURBL blocklist
  *      [URIs: bestratewww1.com]
  *  2.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist
  *      [URIs: bestratewww1.com]
  *  1.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
  *      [URIs: bestratewww1.com]
  *  3.2 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist
  *      [URIs: bestratewww1.com]
  *  4.0 URIBL_SC_SURBL Contains an URL listed in the SC SURBL blocklist
  *      [URIs: bestratewww1.com]

This is a high indication of spam, and this test should only be done 
though if some other problem was found with the message.  While this is 
a 99.9999% chance that the message contains spam, it also could be a 
real message discussing spam.

These tests have also been mostly obsolete for two years now.  The SURBL 
lists are 8 hours behind the latest throwaway domains that the spammers use.

SpamAssasin 3.0 has a more accurate test, it looks up the I.P. address 
of the URLs in the spam.  If the I.P. address does not exist like this 
one does not, it either spam or a typo.  If the I.P. address is in the 
sbl-xbl.spamhaus.org, then the message definitely contains spam.

  *  2.3 LONGWORDS Long string of long words

Not sure how good this test is.

-John
wb8tyw at qsl.network
Personal Opinion Only