A quick introduction first...
I registered my domain name and started this website back in 2001.  Over these years I have been through two hosting companies and three different hosting platforms. 
After recently moving my domain/site to a new host (due to my original hosting company going from outstanding to lousy), my domain is now managed through cPanel.

So far I think this is my favorite site management platform that I have used.  I'm not certain which management platform I originally had with my first hosting company,
but there were no built-in tools for SpamAssassin.  In fact, it was only added to my domain's server after asking my hosting company to install it for me, which they
graciously did (this is back when they were "outstanding").  The downside was that everything having to do with SpamAssassin was unsupported and I had to work with
it manually.  The upside was that I learned quite a bit about how it worked during that time and had everything running very smoothly as far as my spam handling was
concerned.

When my original hosting company got sold, two things happened:  One, support quality went downhill fast, and two, my domain was moved to the Plesk platform. 
While I was pleased to see that Plesk made training SpamAssassin pretty easy, it offered no other options as far as tweaking scores or filtering to different mailboxes
based on spam score.

As I began searching for a new hosting company, that is when I started reading more about cPanel and saw lots of various demos on the web.  I felt that it looked easy
to use and most importantly, it not only supported SpamAssassin, but also seemed you could customize things a bit with it.  I also saw that cPanel allowed for some
pretty flexible rules-based mailbox filtering.  Both of these were important to me, and once I found a hosting company (with which I am very happy with), I immediately
started to get back to handling my spam the way I used to:  Using multi-level spam filtering and training.

I've been to many sites discussing SpamAssassin and have seen some people complaining that it doesn't work very well.  "Out of the box", I think it is okay, but by
training SpamAssassin and tweaking settings, you really can get it working pretty well for you.  I originally wrote the tutorials on this page at the request of my current
host provider after explaining to him how I handle my spam.  I decided to also post them here where they can (hopefully) easily be found in a Google search, and I also
have a forum here where people may get more help if needed.  I am posting this on my site in the hope it will help others, but I don't and can't "officially" support it like I
do for the software that I write.

With the introduction out of the way, on to the tutorials.  I have divided it into two parts:  Multi-level filtering and training.  I use both, but you can choose to follow
whichever you want, or simply use them to gain a little more info and insight into handling spam with Spamassassin and come up with a plan on your own.
Taking control of your spam:  Part 1 - Sorting by spam level
Written by Kevin Dommer

Although cPanel allows some basic configuration of SpamAssassin to help you handle your spam, it is a bit limited.  I’ve seen some cPanel demos around
the web and some seem to have features and options available that others do not.  I’m not sure about the discrepancy, but this page aims to help other
cPanel users take better control of their spam.  For example, I have seen a “Spam Box” option in some cPanel demos but I don’t have this option in my
cPanel with my hosting company.  But not to worry, because you can still set up a “spam box”, and that is just one of the ways you can help sort your spam. 
While the instructions on this page work with my particular cPanel setup, I imagine they should work with most cPanel setups offered by various hosting
companies.

I am going to try to write this in as clear a way as possible, but there is only so much you can do with a technical document and I am not a technical writer.
Before I begin, the examples I will give are just a guideline.  There are several ways that one can achieve handling of spam.  Some ways are easier, some are
harder.  I’ve been using SpamAssassin for around 9 years at the time of this writing and this is how I have come to prefer handling my spam.  Over the years I
have gathered bits of information from various online sources on how to do some of these things, and some of it I have come up with myself through
experimentation.  You can follow this to the letter or change things a bit to suit your own needs or tastes.

This tutorial is presented with some assumptions as listed below.  I am not going to go into too much detail about these things because they are something
that should be understood to a certain degree if you decide you want to follow this tutorial.  In some cases my instructions may be enough even if you are
treading in new territory but prior knowledge or experience in these things may help.

ASSUMPTIONS:
1) You know what SpamAssassin is and have a basic idea of how it works (rule-based spam filtering which assigns a spam score to every email it scans).  For
more information, see the official SpamAssassin website at http://spamassassin.apache.org.
2) You are familiar with cPanel and know how to get around in it.
3) You know how to access and edit plain text files on your site (whether through cPanel’s File Manager or by using your favorite FTP application) and also
how to adjust file permissions (CHMOD).  These will come into play more for bayes training.
4) You know how to set up email accounts on your site.
5) You know how to access the email in your accounts (both from your PC and through Webmail).
6) You understand that this is merely a guide and no guarantees are made as to the results you will receive.  Incorrect or careless settings could possibly
result in missing emails!  I assume no liability for any problems or damages that may arise from following this guide.
7) The settings I use are for a single domain (no sub-domain) and although we have several email addresses, they are all for a single household and
therefore we have no issues with privacy in the sense that we don’t mind if someone else accidentally sees an email not intended for them.  (This can
happen when going through a “spam box” and when training SpamAssassin’s Bayesian database in part 2).  There are steps you can take to minimize this,
but it is likely to happen at some point.  This tutorial is not meant for resellers or managers of websites with various clients who have individual (private)
email accounts through your website.  I will include some additional info on how to maintain a bit of privacy while still training your bayes database.
8) Advanced settings and adjustments like this are outside of the normal scope of most web host providers’ support.  While they MIGHT help with minor
issues, they should not be expected to do so.  You may try our forum here if you’d like more help with anything on this page.

Setting up your multi-level spam handling
The goals:
* We will designate a minimum score for SpamAssassin to mark an email as spam.
* We will designate a slightly higher score and set up a filter to send those messages to another mailbox (a “spam box”) so they do not come into
your inbox every time you check your email.
* We will designate an even higher score and set up another filter to send those messages to yet another mailbox, with the eventual goal being
that those very high scoring spam mails will be deleted without you ever having to look at them (optional but recommended after taking some
time to verify your settings).

How To:

1) Set the number of hits required before a mail is considered spam.  I believe the default is 5.  Over time you may want to adjust this number up or down. 
For our purposes here, this number is not critical, but ideally it will only flag a message as spam if it really is in fact spam.
     a) In cPanel, click on the SpamAssassin shortcut
     b) Click the ‘Configure SpamAssassin’ button
     c) For required_score, enter 5
     d) Click ‘Save’ at the bottom to save the changes

2) Back in the SpamAssassin configuration page you will notice that there is a Spam Auto Delete option (and you may have a Spam Box option depending on
your host provider).  While at some point the plan is to use these features, I do not recommend setting them here!  We will do this manually later as it will
give us more control, and even after that I do not recommend changing the settings here because it can cause problems with the filters we will create.  In
short, do not change any of these ‘Spam Auto Delete’ or ‘Spam Box’ settings here.  With that warning out of the way, lets set up some new email accounts
and some account-level mail filters!
     a) Back at your main cPanel page, click on the Email Accounts shortcut
     b) Create a new email account called spam (you can use any name you want, but I will refer to it as spam in this tutorial).  This is where our mid-range
scoring spam will get redirected to.
     c) Create another new email account called spam2 (you can use any name you want, but I will refer to it as spam2 in this tutorial).  This is where our high
scoring spam will get redirected to, and the goal is to eventually delete this account and have the emails being sent here get immediately deleted instead. 
Emails scoring this high are always spam, so straight to the trash they will go (eventually)!
     d) Back at your main cPanel page, click on the Account Level Filtering shortcut
     e) NOTE:  If you already have other account level filters in place, you will need to decide in what order you want them processed.  The order they are
listed in the filter list is the order that they are processed when your mail comes in.  Incorrect filter order can cause unexpected results!  I have no way of
knowing what other filters you might have so use your best judgment, but generally speaking I would think that you would want these new spam filters
first.  We will assign numbers to the beginning of our new filters to help remind us what order they need to be in (very important).  cPanel always puts the
last filter you edit at the end of the list.  If they end up out of order, go back to a filter you want to move down and edit then activate it to move it down.
     f) Click the ‘Create a New Filter’ button.  This will be our filter for very high spam.  Initially you want to set this pretty high (we will start with 15), but you
will bring this down quite a bit over time.
          *) Filter Name:     #1: Spam High
          *) Rules:     [Spam Bar] [Contains] +++++++++++++++      (note that is 15 + signs)
          *) Actions:     [Deliver to folder]
          *) Click the dropdown box that appears, click the + sign next to your domain name, then click on spam2.  The box should then say /yoursite.com/spam2
          *) Click the [+] button off to the right to add another action.
          *) For the second action, choose [Stop Processing Rules].  If you don’t do this, then high spam will be caught again in the next filter and routed to your
mid-range spam box rather than the high spam box.  We don’t want that to happen!
          *) Click Activate to activate the filter then click to go back to the main filter page.
     g) Click the ‘Create a New Filter’ button again.  This will be our filter for mid-range spam.  This should be a number higher than the minimum spam score
you set earlier, but not too much higher.  If you used the default of 5 earlier, maybe set this to 7 (+++++++).  The idea is that any spam below this number
(spam score of 6.9 or lower) will go to your inbox as normal because it may not really be spam and you don’t want to miss it.  Spam with a score between
this number and your “high” number is most likely spam but we can’t really be sure, so we will redirect it to our mid-range spam box and check it
periodically.
          *) Filter Name:     #2: Spam Mid
          *) Rules:     [Spam Bar] [Contains] +++++++      (note that is 7 + signs)
          *) Actions:     [Deliver to folder]
          *) Click the dropdown box that appears, click the + sign next to your domain name, then click on spam.  The box should then say /yoursite.com/spam
          *) Click Activate to activate the filter then click to go back to the main filter page.
     h) Now back on your main filter page, be sure that it lists your two new filters and that they are in the correct order.  Remember that if you edit one, it
may change the order on you.  Edit the other one and activate again to bring it down to the bottom.  #1 (Spam High) should be first in the list and #2 (Spam
Mid) should be second.


You might be thinking:  So what did I just do and what is going to happen to all of my email?  Here’s a quick breakdown:

1) All emails determined to be “ham” (not spam) will be delivered to whichever mailbox they were originally intended for.  Email sent to
you@yourdomain.com will still arrive in your inbox.  Email sent to otheryou@yourdomain.com will still arrive in that inbox.
2) Any emails that SpamAssassin flags as spam with a score below the number of + signs you designated in your second filter (mid-range spam) will still
arrive in their originally intended mailbox as mentioned above, with the exception that the subject line will be modified to say that it is spam.  Don’t panic
if this happens to a legitimate email.  We can train SpamAssassin later, and so long as it is a relatively low spam score there really is no harm done anyway. 
After all, the email still arrived in your inbox, right?
3) Any emails that SpamAssassin flags as spam with a score at or above the number of + signs you designated in your second filter and below the number of
+ signs you designated in your first filter will be routed to your spam mailbox.  This puts it all in a handy spot that you can check periodically to make sure
you didn’t miss out on an email that was incorrectly flagged as spam for some reason.  Should you ever find a legitimate email here you can easily forward it
to its original recipient (i.e. you) through webmail or however you access this mailbox and it should then arrive in your normal inbox.  We will also use the
email here to train SpamAssassin in the next part of this tutorial.  You can add this account to your normal email client (such as Outlook Express) if you'd like
to check it regularly, or only check them through webmail (which is what I do) so I don't have to look at them every day.
4) Any emails that SpamAssassin flags as spam with a score at or above the number of + signs you designated in your first filter will be routed to your spam2
mailbox.  These will be high-scoring spam and as already mentioned, eventually the goal is to delete these without ever seeing them.  Again, you can add
this account to your normal email client (such as Outlook Express) if you'd like to check it regularly, but emails making it to this mailbox are generally always
spam so there really is no need to check them regularly.
5) Note that SpamAssassin only examines messages below a certain size in order to prevent it from choking on large emails and slowing the server down (I
don’t remember the exact size).  What this means is that emails with large attachments or lots of images don’t get scanned by SpamAssassin at all.  Because
of this, some spam can slip right through with ease (this also applies to emails from people in your blacklist).  Fortunately, very little spam mail is ever larger
than the imposed size limit.

So what kind of immediate results can you expect from all of this?  Well, I almost guarantee that you will continue to receive spam in your inbox when you
check your email.  Some should be properly flagged as spam, some may be flagged as spam but it is really a legitimate email, some spam will not be flagged
as spam at all.  In other words, you likely won’t see much of an immediate change.  This is where the tweaking begins!

In my examples above, I purposely suggested very conservative numbers for your required _score as well as the required spam levels for your two email
filters.  This is to prevent you from missing any important emails, but as a result, your multi-level spam handling will not be very effective until you tweak
the numbers, so read on as we get into that.  After careful testing of my PERSONAL spam situation, extensive bayes training and tweaking of scores assigned
to specific SpamAssassin tests, my personal settings are:  required_score 3.7, Mid-range spam score (minimum score to get placed into my spam box) is 5,
and all emails with a score of 8 and higher get deleted.  Do not use these settings for yourself!  You really must take your time and do lots of bayes training
before you can begin to tighten things down like this.

When email is flagged as spam, you can see what kind of score it got by examining the email headers.  Using this information, you can then further tweak
your required_score number as well as adjust the levels at which your spam gets sorted to your other mailboxes.  You may choose to also set up your email
client to check messages in your spam and spam2 accounts, but I prefer to do that through webmail.  Initially you will probably find much more spam in your
normal email account’s inbox than in the spam and spam2 accounts because the scores required for making it to the spam and spam2 account are pretty
high.

You can also tweak scores for certain SpamAssassin tests, which will help increase the effectiveness of your multi-level filtering.   For example, a lot of
spam I receive gets points added for a test called “RCVD_IN_BL_SPAMCOP_NET”.  This particular SpamAssassin test looks at a public internet blacklist to see
if it from a known spammer.  While I suppose anything is possible, a hit on this test almost guarantees that it is spam.  cPanel allows you to tweak scores for
tests and it is easy to do.  I forget what the original score for this particular test is, but I have increased it in my SpamAssassin configuration file.  Here’s how:

From your main cPanel screen, click on the SpamAssassin icon then choose ‘Configure SpamAssassin’.  There should be some blank boxes next to labels
called score.  For this example, type (or copy & paste) the following into that box:

RCVD_IN_BL_SPAMCOP_NET 3.5

Save your changes, then go back to your configuration and you should see your new test score has been set.  The next time an email comes in and gets a hit
on that test, it will now get 3.5 points added to the score.  Obviously this increases the likelihood that this spam mail will be bumped up into your mid-
range spam box.  There are several other tests that I have adjusted the scores on, including Bayes tests, but the points you assign to these tests should
always be slowly tweaked over time.  While the above sample just about ALWAYS indicates spam, the problem is that just because a test gets a hit on a
piece of spam, it does not mean that it ALWAYS indicates spam.   It is also a good idea (before you begin to get more aggressive with your spam filtering) to
utilize the whitelist option in SpamAssassin to whitelist all of your friends and other important email addresses that you want to prevent from getting
flagged as spam.  Whitelisting an address automatically assigns a score of -100 to the email, thus eliminating the possibility of a false-positive.  You can
easily add email addresses to your whitelist and blacklist through the ‘Configure SpamAssasin’ page in cPanel.

As each day passes, you will get ham and spam coming in (just as you already have been).  Your job now is to look at all of them and see what kind of scores
your spam is getting and what kind of scores your ham (non-spam) is getting.  Then SLOWLY adjust your required_score as well as the number of + signs that
determine how to route the spam with your filters.  It is important to resist the urge to use aggressive numbers right away as this will only lead to increased
false-positives.  My personal goal is all legitimate email and little to no spam making it to my normal inbox, less than 20-25 spams making it to my mid-range
spam box (per WEEK) with no false-positives, and all the rest making it to my high spam box (which is actually just deleted now), and after a couple months
of tweaking and BAYES TRAINING, I have reached that goal.  I will cover bayes training in another installment.  It is a bit more complicated and involved than
what we’ve covered here but the rewards can be quite worth it.

Earlier I mentioned that the eventual goal for the high scoring spam was to get rid of it altogether and never see it.  Also as mentioned, I am at that point
now, but you must give it time and wait until you are CERTAIN that nothing legitimate ever makes it to that last high spam box.  Once you are sure of that,
you can make these final changes below.  This will cause all of these high-scoring spam emails to get deleted immediately.  Warning:  You will never see
them and there is no way to ever get them back.  If you later decide you do not want to automatically delete high scoring spam anymore, change the rules
back to how you had them set before (as described above).

1) From your main cPanel screen, go into Account Level Filtering and click ‘Edit’ next to your first (#1 Spam High) filter.
2) For the first action (Deliver to folder), change it to [Discard Message] and click Activate.  Be sure to leave the [Stop Processing rules] action in place.
3) Go back to the main filters page.  You will notice that cPanel has now moved your “#1” filter below #2.  As mentioned earlier, that’s not what we want and
that will allow all those high spams into your mid-range spam box.  To fix this problem, click on ‘Edit’ next to the “#2” filter to bring up the filter’s settings. 
Click Activate and go back to your filter page.  They should now be in the correct order.
4) If you are sure you will no longer ever want to keep that high scoring spam again, go ahead and delete the Spam2 email account.

I should also mention that no matter how you do it (whether directly through your favorite email client or Webmail), you might want to periodically empty
the mail out of your Spam and Spam2 mailboxes, especially if you are concerned about a mailbox size quota.  If you are not doing any bayes training, there
is no need to keep this extra spam at all once you are done checking for false-positives and determining the scores and if/how you want to tweak your
settings.  If you will do bayes training, you will want to hang on to them in order to feed them into SpamAssassin.  On that note, there is a setting you can
adjust in SpamAssassin to automatically learn spam over x score.  This is what I do with the really high spam that my filter deletes.  I never see it so I can’t
use it to manually train SpamAssassin, but I don’t need to because SpamAssassin automatically learns it as spam for me as soon as it comes in!

See part 2 right below for information on Bayes Training in SpamAssassin.  This allows you to teach SpamAssassin what is legitimate email (“ham”) and what
is spam.  It takes a while before the bayes filter kicks in (SpamAssassin does not use bayes tests until it has learned at least 200 ham and 200 spam
messages), but once it does, SpamAssassin’s accuracy goes up pretty quickly.  What’s more, you can assign higher scores to bayes tests, such as “BAYES_99”,
which means that SpamAssassin is 99% sure that the email is spam based on bayes testing.  Armed with that, you can assign it a higher score and get it out of
your inbox (and possibly even have your #1 filter (Spam High) delete it automatically should you so choose).
Google
Taking control of your spam:  Part 2 - Bayes Training in SpamAssassin
Written by Kevin Dommer

Even if you don’t plan to incorporate my multi-level spam handling in the first part of this guide, you might want to read through it as a bit of an introduction
to this part.

I won’t get into a lengthy discussion on what Bayes filtering does or how it works but I will provide a brief example (albeit it may not be 100% accurate).

SpamAssassin works by comparing a database of “rules” to the email messages it scans.  An example might be the word “Viagra”.  If an email has the word
Viagra in it, it might assign a small amount of points to the email based on a built-in rule.  If the email scores enough points (from additional rules) to meet
the minimum required score to be marked as spam, it does so.  This is a VERY basic example, and the rules that SpamAssassin uses are actually much more
complex than this (and there are lots and lots of built-in rules).  But continuing to use this example, let’s say you never get legitimate emails containing the
word Viagra.  Bayes (or Bayesian) filtering is based on statistics.  After training SpamAssassin on several hundred ham and spam samples, another email
coming in with the word Viagra might also cause SpamAssassin to assign even more points based on a bayes test.  Since SpamAssassin has never seen a ham
message with that word, it might add points for a test called BAYES_99 (meaning that SpamAssassin is 99% certain that this email is spam from a statistical
perspective).  On the other hand, if you have a buddy that often sends you emails with the word Viagra in it, and you have taught SpamAssassin that those
emails from your buddy are ham (which, technically you should because they are not spam), then SpamAssassin might think:  Of all of the emails trained so
far that contained this word, 20% of them were trained as ham and 80% were trained as spam, therefore my guess is that there is an 80% chance that this
email is spam.  But it doesn’t stop there.  Bayes scanning looks at many different elements of an email and determines what is important and what is not. 
So that 99% chance may actually ultimately become an 80% chance of being spam if other elements of the email suggest that it may also be legitimate
(ham).  Again, this is a very crude example, but it should be enough to give you an idea of how bayes filtering works, and why bayes training can increase
SpamAssassin’s accuracy considerably.  It should also be obvious that it is also important to train it often and train it accurately.

There are several different approaches to take when it comes to training SpamAssassin.  Unfortunately, cPanel doesn’t offer a direct and easy way to do
this.  Fortunately though, it can be done with a little custom configuration.  Bear in mind though that training is pretty much an ongoing thing.  You can
probably stop doing so after a while, but as spammers get smarter and spam contents change, training once again becomes a necessity.  And although a
well-trained SpamAssassin installation can be very effective, you will always have a few that slip through anyway.

As mentioned in part 1, there may be some privacy issues when it comes to training.  In order to properly train SpamAssassin, you must hand-sort all of the
emails that come into your domain.  In my situation this is not a problem because although I have email addresses for myself, my wife and a small software
business, we openly view each other’s emails all of the time.  Personally, I have forwarders set up to forward copies of ALL email that comes into my
domain into one central mailbox that I can use for spam training.  This simplifies things for me but this leaves no privacy as I can see all emails that come in
(whether they are for me or my wife).  You can take this route, or you can do your training on an individual basis.  In the latter case, it will be up to each
email account user to sort his or her own email.

---------------------------------------------------
DECISION TIME:
If you decide to forward all of your many email accounts to one for the purpose of making your training easier, you can follow the steps below.  Otherwise
just skip to Preparing for Bayes Training.

To forward all emails to one separate mail account for training:
     1) Create a new email account.  For the purpose of this tutorial we will call it backup, but you can call it whatever you want.  This is the account you will
use for sorting your ham & spam for use by the training script.
     2) Click on the Forwarders shortcut in cPanel.
     3) Click the Add Forwarder button
     4) In the first field, enter an existing email address on your domain to forward
     5) For the destination, enter your newly created address, such as backup@mydomain.com and save your changes.
     6) Repeat the steps to forward each of your existing email accounts to your new “backup” account.  It is not necessary to do this with your spam or spam2
addresses since spam will be automatically sorted to those by your filters and the messages there can be easily trained as there shouldn’t be much to sift
through.
---------------------------------------------------

Preparing for Bayes Training
The goals:
* We will manually edit our SpamAssassin configuration file to make some changes to the way SpamAssassin uses bayes.
* We will create special folders in our mailboxes (through Webmail) that will be used to sort ham and spam.
* We will install a small CGI script onto our site that will automatically teach SpamAssassin the emails you have sorted.
* We will set up a cron job to run the CGI script every week.

How To:

1) Edit the SpamAssassin configuration file.  This can be done directly through cPanel or by using your favorite FTP program.  For the sake of simplicity, I will
explain how to do it through cPanel.  Depending on your host provider or cPanel version, these directions may not be exact but it should lead you in the
right direction.  Some host providers may already have these settings in your default SpamAssassin config file.  If the settings below are already available in
your SpamAssassin Settings through cPanel, then manually editing this file as outlined below is not necessary.
    a) In cPanel, click on the File Manager shortcut and select ‘Home Directory’.  Make sure the option to show hidden files is selected then click Go
    b) In the left pane, click on .spamassassin
     c) Click on user_prefs to highlight it then click on the Edit button near the top of the browser window
     d) Confirm that the encoding type is set to us-ascii then click on Edit
     e) Create some blank spaces in the file by pressing [Enter] at the very beginning of a line (being careful not to mess up anything already in there and then
copy and paste the following into the file:

use_bayes 1
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam -2.0
bayes_auto_learn_threshold_spam 15.0

     f) NOTE:  the nonspam threshold should be low enough so that there is no possibility that a SPAM with a low score can be auto-learned as ham.  Some
spam comes in at a 0, so you definitely want it below 0.  I am recommending -2.0 to be conservative as it is, but you can set it even lower if you want to
ensure that it definitely can’t auto-learn spam as ham (such as -20.0).  You will be manually training your ham anyway, and after the bayes filters kick in you
can bring this back closer to -2 since SpamAssassin should be more accurate on scoring spam.
     h) NOTE: the spam threshold should be high enough so that there is no possibility that a HAM that happened to get flagged with a high spam score can be
auto-learned as spam.  You may notice that the number I gave for an example happens to be the same as the level I recommended for your ‘high spam’ filter
in part 1.  This will be handy for when the time comes that you decide to outright delete those high scoring emails.  If this number matches the level at
which you delete high scoring spam, it will automatically be learned as spam before deleting it and you won’t miss out on training any emails.  If you want
to be extra cautious with this setting for now, set it to something higher like 30.0. You will be manually training all of your spam for now anyway.
     h) use_bayes 1 simply tells SpamAssassin to use bayes testing.  This will not do anything until you have learned 200 ham and 200 spam messages.  You
may leave it on, but if for some reason in the future you wish to disable it, just change the 1 to a 0.
     i) bayes_auto_learn 1 simply tells SpamAssassin to automatically learn emails as ham and spam as they are scanned, based on the thresholds that you
have set.  If you want to turn this off, change the 1 to a 0.
    j) When finished making your changes, click on the Save Changes button in the upper-right hand corner of the window.

2) Create special learning folders in each email account that you will use for training. If you want to allow each email account holder to manage their own
sorting (for privacy issues), then the following steps need to be done for each email account in your domain.  At a minimum, you will also need to do this for
your spam mailbox (mid-range spam).  You can choose to have SpamAssassin automatically learn all of the high spam (which ends up in the spam2 mailbox)
by matching the  auto-learning threshold setting in your configuration file as mentioned above.  But if you prefer to train those manually then the following
steps need to be performed for that mailbox too (probably a good idea at first).  If you choose to forward all emails from all accounts to one other account
and use that for all of your sorting and learning then you need to do the following for that account as well.  If you do NOT intend to do sorting and training
on individual accounts, you really don’t need to do the following for those but it only takes a few seconds and it is harmless to have these extra folders
whether you use them or not  (you may want to do it individually someday).  IMPORTANT NOTE:  Most people have their email clients on their PC set to
download new messages then delete them from the server.  If you are leaving the training up to each individual (or sorting email individually for each email
account), then you will need to change your email client (such as Outlook Express) to leave the messages on the server so that they are still there for you to
sort.  Once sorted, you can then delete them from the server through webmail, but you must be sure you don’t accidentally delete an email in Webmail that
hasn’t been downloaded into your normal email client yet.  I know this may sound confusing and it can be a bit of a hassle.  This is one reason why I chose to
forward (copy) all incoming emails to a separate email account that I use for training as outlined above.  Since these are copies and nobody ever accesses
these messages outside of webmail, there is no issue and no changes to Outlook Express are necessary.  But this option is also where the privacy issue
comes into play.  If you do not understand this or need clarification, please post in the forum before proceeding.
     a) Going through each email account one at a time, log into your account through Webmail and go into Horde.  I only use Horde so these instructions are
based on Horde.  While I am sure these things can be done in other webmail clients, I am not familiar with them.
     b) Proceed to your inbox in Horde then click on the ‘Folders’ button at the top of the window.
     c) Click where it says ‘Choose Action’ and select Create.  Your browser may prompt you to allow a script to run before allowing this action.
     d) A small window will come up where you can enter a new folder name.  Type the following and click OK:  learn_ham
     e) Repeat the process above to create another new folder and name that one learn_spam
     f) Once your new folders are created, you can log out and log back in as the next user and repeat the process of creating your new learning folders for
each email account that you wish to sort spam in.

3) Next we need to install a CGI script that will automatically feed the sorted emails into SpamAssassin’s learning module (sa-learn).  To give credit where
credit is due, I use a modified version of a script created by Ian Douglas (http://iandouglas.com/spamassassin-trainer/).  From his website you can download
a CGI script that is customized to your needs.  Originally I used his script but I decided that I wanted mine to work a bit differently so I modified his to come
up with something a bit more suited to my needs.  The choice is yours on how you want to proceed with the actual training, but the rest of this guide will be
referring to my custom version of his script.  It should be noted that his original script works with both MailDir and Mbox mail formats.  I don’t know enough
about cPanel installations to know which host providers use which, but MY script ONLY works with MailDir, which is what my site uses.  If you are uncertain,
contact your hosting provider or search the internet for more information on how to determine your mailbox type.  (I may add this info here at a future
date).
     a) Download the salearning.cgi script here.  Right-click the link to save it to your computer then unzip it to extract the CGI script.
    b) CGI scripts are plain text files with a CGI extension.  It is recommended to use Notepad or a similar "basic" text editor to edit them.  You will need to
edit it before you can use it as follows:
     c) Scroll down to the MAIN SETUP area and edit the variables to match your information.  You will need to enter your domain name (such as
mydomain.com) and your cPanel username.  There is also an option to automatically delete the messages after learning is done.  I recommend “Y” since it
cleans things up nicely when it is finished.  If you choose “N” then you will need to manually go into your learning folders through webmail every time you
finish training so that you can manually remove the already-learned emails.
     d) Scroll down a bit further and you will see a boxed off section that begins with “Specify users to scan mail for…….”.  There are already two set up: 
backup and spam.  These are what I use (again, I use the “backup” option, where all of the incoming email gets forwarded to that single address) but if you
are going to do your sorting and scanning on each mailbox, then you need to create new lines just like the ones shown, substituting the mailbox name in
quotes with your own mailbox name.  Repeat for each mailbox you are scanning.  Be sure that you have previously created the learn_ham and learn_spam
folders, otherwise the script may error out when executed.  It doesn’t matter if the learning folders are empty or not, just so long as they exist.
     e) Once you are done editing the file, save it then upload it to the .spamassassin folder found in the root level of your site.  To do so, go into the File
Manager through cPanel and browse to the .spamassassin folder (as previously mentioned when editing the user_prefs file), then click the button at the top
of the window to upload a file.  Browse your computer for the salearning.cgi file that you edited and upload it to the .spamassassin folder on your site.  You
may need to ensure that the permissions are set correctly on the CGI script.  At a minimum, they should be set to 750.  To check them, right-click on the CGI
script in the File Manager and choose Change Permissions from the menu (CHMOD).

4) Create a cron job to automatically feed your ham and spam into sa-learn every week.
     a) In cPanel, click on the Cron Jobs shortcut.
    b) It’s a good idea to have the cron manager send you an email every time the script runs so go ahead and enter your email address here if it isn’t there
already.
     c) Under the Add New Con Job section, enter the following:
Minute:  0
Hour:  1
Day:  *
Month:  *
Weekday:  1
Command:  ~/.spamassassin/salearning.cgi
     d) Click the Add new Cron Job button and all should be well.  The settings above will cause the cron job to run every Sunday night at midnight.  This gives
you all week to sort your ham and spam at your leisure.  The nice thing about it is that if you don’t have time, the job will still run but there will be no harm
done….. it simply won’t learn anything.  If you’d like to test your cron job, you can edit it and temporarily set it up to run every minute (using the default
option).  Wait a minute and check your email to see that you got the results.  If all is well, be sure to change it back to the settings above (or whatever your
preference is).


So what’s next?  All you need to do is sign in to your email account (via webmail) every few days or once a week (whenever you want) and go through the
messages in your inbox.  Select all of the ham (good email) then using the move feature in Horde, choose the learn_ham folder to move it to.  The messages
will disappear from the inbox and be moved to the learn_ham folder.  Next, do the following for your spam:  Select all of the spam messages and move
them to the learn_spam folder.  If there are some that you are not sure what to do with, just delete them without moving them.

It takes a while to build up a large database of ham and spam in SpamAssassin but once the bayes filtering kicks in, you should notice a drop in spam (or at
least more accurate classification).  After that, you can begin to tweak some of your scores a bit more to tighten things up a bit.

I hope this helps and please direct any questions or comments to the forum located on this site.  While I will offer help there, I likely will not offer help via
email.