Amiga.org

The "Not Quite Amiga but still computer related category" => Alternative Operating Systems => Topic started by: OSS542 on September 19, 2004, 04:52:24 PM

Title: Mozilla Bayesian Junk Filter becoming much less effective ?
Post by: OSS542 on September 19, 2004, 04:52:24 PM: I've seen Mozilla's junk mail filter being praised quite a bit in the past. Many people claim the filter catches 95+% of incoming spam. I have noticed recently, however, that my Mozilla filter seems to be becoming less and less effective. It has never been any better than catching about 80% or so, and recently catches less than 50%. I originally just trained the filter on incoming mail when it made a wrong judgement, and got it to perhaps 80%. When it began to become less effective (about 50%), I tried retraining it using the recent saved mail (ham and spam, about 2000 pieces of each type). This also seems to have had no effect on improvement for new incoming mail. Has anyone else noticed this, or am I perhaps doing something wrong ? It just does not seem as good as it should be.
I only get about 40 or 50 a day through our corporate network, and don't have a "hotmail" addy.

Regards,
OSS542
Title: Re: Mozilla Bayesian Junk Filter becoming much less effective ?
Post by: corsavert on September 19, 2004, 06:38:01 PM: Yeah, the spammers are getting smarter and including all kinds of valid-looking text with the spam so that the Bayesian filters don't think enough percentage of the message is spam in order to filter them.
Title: Re: Mozilla Bayesian Junk Filter becoming much less effective ?
Post by: OSS542 on September 20, 2004, 03:21:32 AM: A further update:

I have rerun junk mail controls on my stored "spam", and notice that all the wrong determinations are made on the "spam" received in the last day or so. It does not seem to adapt very well to the newly received spam. I tried training on the saved mail (both ham and spam), retesting using the same mail, and correcting the results. I repeated this loop (about 5-10 iterations) until I was able to get a perfect score (at which I was surprised). It seems to learn the saved mail very well, but does not recognize new incoming junk as such based on what it has learned.