PDA

View Full Version : SA Training... It works!


Justec
11-14-2007, 08:29 AM
For anyone who has clients that complain about spam I would definitely get them to use the Learn SPAM / HAM feature. I have one client that has been using it for a little while and it was working well. After I did a server move and didn't copy the training DB to the new server their spam shot up. After a month of training on the new server they are barely getting any spam to their inboxes.

tiger
11-14-2007, 11:25 PM
How much spam and ham messages were trained? (250?)
And, need to keep the messages on the server forever for it to work?
If so, can you specify how much space it took for the trained spam messages?

How about image spams (i.e., spam content is with the image only)?

I guess spamassassin can't do anything for this unless there is some special plugin for it (I read somewhere that there is one; don't remember further on this) Does iworx use one such thing with their SA setup?


thanks

Justec
11-15-2007, 10:29 AM
A lot of training. One of the users has 200 Ham and 1800 spam trained.

The messages are kept on the server until the nightly training is run and then they are deleted.

I dont think there is anything for image emails at this point.

tiger
11-16-2007, 02:23 AM
Sorry to disturb again, Can you specify how much size it takes for the training DBs? (especially for that user with 1800! spams trained)

When I was receiving spams for guessable addresses like webmaster@ , info@ , mail@, etc (I have filtered them now), a majority of them were image spams (should have been about 80%). So, I thought how SA would handle that.

After googling, I think this was the place where I came to know about it: http://blog.fastmail.fm/2006/11/23/more-servers-installed-to-deal-with-spam-load/. The plugin is FuzzyOcr (http://fuzzyocr.own-hero.net/).

Justec
11-16-2007, 10:54 AM
Things to keep in mind:
*You can only train 250 messages in each folder per day.
*You must have at least 200 spam messages and 200 ham (non-spam) trained
before the Bayesian filter will start having an effect.
*You must train both spam and ham (non-spam) for the Bayesian system to be
effective, and you should try to keep around the same number of each
trained at a time.
*You can create an IMAP folder named "Spam", and messages tagged by
SpamAssassin will be delivered to this folder rather than your INBOX.
Someone from Iworx will need to comment on the image spam stuff, I haven't looked into it at all.

As to the size of the actual DB for SA training I dont know. This is stored in an Iworx database and dont like to mess around in there if I dont have to

tiger
11-17-2007, 01:02 AM
.. actual DB for SA training ... This is stored in an Iworx database

Thought that it would be under the siterworx/mailbox account. (shouldn't it be that way?)

Now, what if a siteworx user having a lot of messages trained, moves to another host?
Need to do the training again?

tiger
11-17-2007, 11:11 PM
Any reply from iworx guys?

IWorx-Tim
11-17-2007, 11:51 PM
Thought that it would be under the siterworx/mailbox account. (shouldn't it be that way?)

Now, what if a siteworx user having a lot of messages trained, moves to another host?
Need to do the training again?

As of right now that's how it would have to be yes, but the guys are gonna create a way to back up those settings. In the meantime if you absolutely have to you can backup and import just those tables from the iworx databases. I don't have the table name in front of me right now but if you need it one of the devs can provide it. I believe Justin did this when he moved from CentOS 4.5 to 5. As long as that's all you dont touch anything else in there it shouldn't hurt anything else.

tiger
11-18-2007, 04:58 AM
but the guys are gonna create a way to back up those settings
Thanks.

And, no urgent. I just wondered about the current implementation.

I think, it should be part of the siteworx backup system. (even if it will be provided in nodeworx backup)

IWorx-Socheat
11-18-2007, 09:39 AM
I think, it should be part of the siteworx backup system. (even if it will be provided in nodeworx backup)

Actually, it was added in InterWorx 3.0. :) The BayesDB and Horde addressbook are exported for each mail user, and will be restored during import.