Mon, 08 Jan 2007
Spam in 2007
I've came across an interesting case of mail misclassified by our DSpam filter. Some of the reasons given by DSpam were the following:
Date*2007+11, 0.99000, Received*Jan+2007, 0.99000, Date*2007, 0.99000, Received*2007+11, 0.99000, Received*2007, 0.99000,
It seems that when DSpam has been initially trained, all mail which
contained the string "2007" in Date:
or Received:
headers was spam (obviously - only spam or severly misconfigured mail servers
had the system date that much in the future).
The question is, what is the correct solution of this problem: should the
four-digit number in those two headers be a hard-coded exception?
Should the DSpam use a higher-level information (like SpamAssassin does),
such as "Date:
is more than 36 hours in the future"?
Or maybe should users every year on January 1st send few messages to the
DSpam training address?