The content filtering functions of JSpamFilter are handled through the Filter File and the settings therein which determines the SPAM scoring.. The Filter File is structured for ease
of use and simplicity, efficiency and speed. A relational database look-up feature is in development.
This section and the next will include several examples to illustrate the use of conditional searching using the Filter File.
- The filter is applied in all cases except the DNSBL "REFUSE" ("*") setting, so you can "unblock" messages that are on "Block" DNSBLs through the use of negative scores. Filter settings cannot un-refuse REFUSE ("*") DNSBLs.
- You can search for certain keyphrases if one or more "root" keyphrases are found. This improves accuracy and efficiency, by skipping searches for words that only bode evil when found in context. A bad analogy is to think of it as "progressive slots" on keywords.
- The "d00d-speak" feature of the Anti-Obfuscation filter performs a repeated letter check; the filter collapses repeated letters and rescans the filter.txt file. This picks up filterable terms where the spammer tried to obfuscate the meaning by repeating letters. The next section includes a full description of the anti-obfuscation filters.
- JSpamFilter performs recursive content searches to defeat attempts to obscure MIME content by nesting MIME boundaries.
- JSpamFilter will even decode messages that are "Base64 encoded", which defeats other content filters that rely on the message being sent in plain ASCII text.
- Full support for binary message transfers (BDAT).
- Full Unicode support is provided for messages and filter terms. (Use "FilterFileEncoding=charset" in JSpamFilter.conf to select the character encoding for the filter file).
The basic format for a Filter File entry is:

// [comment]
[score] [word or phrase]
|
JSpamFilter will look for all occurences of a particular string, even inside other strings. If you want to search for a term in isolation and make sure
that you do not pick up any "outer" strings, add leading and trailing spaces. For example, if you wanted to search for the string "reed" and associate to it a score of 50.
If made the following entry to the filter file:
50 reed
You would also score the word "freedom" in that "freedom" contains the string "reed".
To avoid this, add a leading and trailing space to the entry (i.e., a regular space before and after the word reed) as follows:
50 reed
|
Note: you will need to provide the full path to the filter file in your jspamfilter.conf file:
FilterFile=c:\JSpamFilter\filter.txt
The full path to the filter.txt file.
A snippet of the file might look like this:
30 EMAIL ADDRESSES
50 ONE TIME MAILING
30 CALL IMMEDIATELY
20 TO ORDER
10 TOLL FREE
A sample filter.txt is included in the distribution zip file. (PLEASE SEE NOTE ABOVE ON FILTER FILE ORGANIZATION)
|
To test your filter's effectiveness and speed, save the full source of the
e-mails you want to score into files named "*.mail" (as in, "spam01.mail,
spam02.mail, notspam01.mail", etc.), then score all of the messages in
batch by using the command:
java -jar FilterTest.jar *.mail
(Your filter file must be named "filter.txt", and must be in the
current directory.)
If you look at the message headers of inbound mail, you'll notice that JSpamFilter
adds its own X-JSpamFilter-Version header; in 3.0, the score and elapsed time
(in 1/1000 of a second) are also part of that header, like this:
X-JSpamFilter-Version: 3.8 Enterprise (Modest Software) (Score: 50,
32ms)
|
|
If you discuss SPAM filtering with a colleague via e-mail, it's easy to get
caught by your own filters. It is possible to use negative numbers
in your filter.txt to allow those messages to pass the filter, like this:
-300 friend@some-other-mail-server.com
|
Every word or phrase found will increase the score for that message by the [score]
provided. For example:
20 second income
50 guaranteed money maker
30 earn up to
JSpamFilter will search for "second income" at 20 points, "guaranteed money maker" at
50 points [so for me, that phrase reads "guaranteed JSpamFilter tag"], and "earn up to"
at 30 points. So, if you received a message as follows:
Subject: A Second Income From Home!
Make $$$Millions$$$ Selling Anything From Home!
THIS IS A GUARANTEED MONEY MAKER!!!
EARN UP TO $14,243.87 A WEEK!!!
CLICK HERE NOW!!!!!!! |
This message would score 100 points and if, for instance, the Block threshold was set for 100 points,
this message would be blocked from entering the server
|