Yenya's World

Mon, 08 Jan 2007

Spam in 2007

I've came across an interesting case of mail misclassified by our DSpam filter. Some of the reasons given by DSpam were the following:

Date*2007+11, 0.99000,
Received*Jan+2007, 0.99000,
Date*2007, 0.99000,
Received*2007+11, 0.99000,
Received*2007, 0.99000,

It seems that when DSpam has been initially trained, all mail which contained the string "2007" in Date: or Received: headers was spam (obviously - only spam or severly misconfigured mail servers had the system date that much in the future).

The question is, what is the correct solution of this problem: should the four-digit number in those two headers be a hard-coded exception? Should the DSpam use a higher-level information (like SpamAssassin does), such as "Date: is more than 36 hours in the future"? Or maybe should users every year on January 1st send few messages to the DSpam training address?

Section: /computers (RSS feed) | Permanent link | 1 writebacks