03 August 2005
Filtering spam in Novell Evolution
When I switched to Novell Evolution, finding an anti-spam solution became a top priority. Having warmed to Evolution after noticing that its interface was no longer an imitation of Microsoft Outlook, I quickly learned to appreciate its centralized mail and business tools. Spoiled by Mozilla Thunderbird's built-in spam detection, I wanted some equivalent in Evolution.
Evolution's filtering tools for handling incoming messages provide the raw material for spam detection. However, the filters have difficulty knowing which characteristics of incoming mail should be treated as signs of spam. Information I gleaned from the Internet was only moderately useful; most of it was incomplete, obsolete, or inaccurate.
To find a solution, I pored over the headers of messages that Mozilla Thunderbird had detected as spam. From this research, I isolated the most common characteristics of spam and built several filters without leaving Evolution. Wanting to further improve spam detection, I spent several evenings testing various instructions for linking Evolution with SpamAssassin through a filter until I found one that worked. Taken together, these filters provided all the spam filtering I needed to remove my last obstacle to using Evolution.
Evolution's filtering tools
Evolution's filter rules are created from Tools > Filters > Add. Each rule consists of a logical condition with an If statement, which sets the conditions under which the rule applies, and a Then statement, which applies what happens when the conditions are met. You create both If and Then statements by selecting the Add button in the appropriate pane, then selecting from drop-down boxes or typing in fields. By default, a rule applies when all If statements are met, but you can also set the Execute action drop-down list to if any criteria are met. For convenience, give each rule an appropriate name, so you don't have to open it to know what the rule does.
For spam filters, If statements have many possibilities. Focusing on a specific message header, such as the Sender, Subject, Recipient, or on a message's contents, If statements can detect an exact phrase using the is building block, or part of a phrase using contains, starts with, or ends with. You can also introduce a degree of fuzziness by using sounds like or regular expressions. Many of these search patterns also have an opposite, such as is not. If none of these patterns, alone or in combination, does what you want, you can select Pipe to Program to call a shell command, such as grep.
By contrast, you'll need only a limited number of Then statements. Before you write Then statements, you'll want to create a Spam folder; Evolution comes with a Junk folder, but, in Debian, you can't use it when writing a filter rule from the GUI, not even if you modify /home/[user]/.evolution/mail/filters.xml in a text editor. For each spam rule, you'll want to use three Then statements in this order:
Move to Folder Spam
Set Status Read
Stop Processing
This set of Then statements delivers the email to the Spam folder and marks it as read, so you aren't distracted by the appearance of unread messages in the folder. It then keeps other filters from being applied to the same message and delivering it to other folders. Once you've tested your rules together, you may want to change the second statement to Set Status Deleted. However, until you're confident with your set of rules, this choice may cause you to lose legitimate messages.
While testing the results, you may want to assign a color or a sound to each piece of email marked by a rule, so you can see how many messages it is catching.
Completed rules are listed in the Tools > Filter window. Evolution processes them sequentially, so the order in which they're listed in the Filter window can affect the how useful a rule is. For example, a rule that moves messages from person@address.com to a particular folder will never come into effect if a rule that is processed first deletes all mail in which the sender's address contains @address.com. Since combinations of rules can have unexpected results, the Filter window includes buttons for moving a selected rule up or down in the list.
When you first write a list of rules, you'll probably need to debug them. If you do, opening View > Message Display > Show Email Source can help by showing the message headers that the normal view conceals.
Creating basic spam filters
The simplest spam filters in Evolution are the creation of whitelists and blacklists -- that is, lists of addresses from which you will or will not accept email. If you have a particular person in mind for either list, then the IF statement is: Sender is [email address]. If you want to include an entire domain, then the IF statement can be either Sender contains [domain name] or Sender ends with [domain name]. Set the rule to execute when any criteria are met, and you only need to create two rules, each a list of addresses or domains.
For most people, a whitelist is easy to set up. Place the whitelist at the top of the filter rules, and you won't miss any essential email. By contrast, because spammers frequently change addresses, a blacklist is likely to require constant updating. This updating defeats the utility of creating rules by forcing you to spend far more time dealing with spam than you care to.
For this reason, rules that identify characteristics of spam rather than email sources are likely to be more useful. For example, emails that list your address as both sender and recipient are likely to be spam, so you could create a rule to send them to the Spam folder using two IF statements: Sender contains [email address] and Recipients contains [email address].
Similarly, these two If statements create a filter based on size: Size (kb) is greater than 60 and Attachments do not exist. Since 60 kilobytes is about 9,000 words, that size should accommodate any mailing list digests you receive. If you don't subscribe to any mailing list digests, you can adjust the number to 20 or even lower. Either way, you can usually assume that a larger message without attachments will be spam full of graphics.
Other useful spam filters include a search for:
Words and phrases likely to be evidence of spam, such as click here or Cialis
Windows executables as an attachment. This search could be a list of IF statements containing the extensions of Windows executables, or simply Pipe to Program grep 'name=.*\.\(exe\scr\bat\pif\)'.
A date more than 96 hours before the message was received. Such a date could indicate a relayed message.
An empty Reply-To header
Many of these searches can be defined by If statements within a single rule.
Depending on your correspondents, you may also want to filter by character-set (charset). For example, I don't have any correspondents who write to me in Japanese, Korean, or Cyrillic characters. However, I regularly receive spam that uses these character sets. For that reason, setting up a filter on those character sets works for me. I also include a filter for messages that list no character set, since that can also be a sign of spam.
You can also search for HTML tags that are more likely to be used in spam. Filtering out all HTML email by searching for text/html seems a bit drastic, although some purists might consider it. More practically, you might consider setting up If statements that search the message body for:
large and extra large fonts (font size= "+)
tables (tbody)
red or blue text (#0000CC, #FFFF00)
Even if you have correspondents who insist on HTML email, these tags are still unlikely to be in the average legitimate e-mail. If necessary, you can filter such correspondents in a whitelist.
Evolution filters cannot check for all the signs that anti-spam software can detect. For instance, they cannot, as far as I can figure, assess the percentage of the message body that is in HTML, or determine that Microsoft Outlook is falsely identified as the mailer. Nor can Evolution filters evaluate the likelihood that a message is spam. Yet, with ingenuity and a study of results obtained from other anti-spam software, you might manage to filter 90-95% of your spam without any other measure.
Evolution's filtering tools for handling incoming messages provide the raw material for spam detection. However, the filters have difficulty knowing which characteristics of incoming mail should be treated as signs of spam. Information I gleaned from the Internet was only moderately useful; most of it was incomplete, obsolete, or inaccurate.
To find a solution, I pored over the headers of messages that Mozilla Thunderbird had detected as spam. From this research, I isolated the most common characteristics of spam and built several filters without leaving Evolution. Wanting to further improve spam detection, I spent several evenings testing various instructions for linking Evolution with SpamAssassin through a filter until I found one that worked. Taken together, these filters provided all the spam filtering I needed to remove my last obstacle to using Evolution.
Evolution's filtering tools
Evolution's filter rules are created from Tools > Filters > Add. Each rule consists of a logical condition with an If statement, which sets the conditions under which the rule applies, and a Then statement, which applies what happens when the conditions are met. You create both If and Then statements by selecting the Add button in the appropriate pane, then selecting from drop-down boxes or typing in fields. By default, a rule applies when all If statements are met, but you can also set the Execute action drop-down list to if any criteria are met. For convenience, give each rule an appropriate name, so you don't have to open it to know what the rule does.
For spam filters, If statements have many possibilities. Focusing on a specific message header, such as the Sender, Subject, Recipient, or on a message's contents, If statements can detect an exact phrase using the is building block, or part of a phrase using contains, starts with, or ends with. You can also introduce a degree of fuzziness by using sounds like or regular expressions. Many of these search patterns also have an opposite, such as is not. If none of these patterns, alone or in combination, does what you want, you can select Pipe to Program to call a shell command, such as grep.
By contrast, you'll need only a limited number of Then statements. Before you write Then statements, you'll want to create a Spam folder; Evolution comes with a Junk folder, but, in Debian, you can't use it when writing a filter rule from the GUI, not even if you modify /home/[user]/.evolution/mail/filters.xml in a text editor. For each spam rule, you'll want to use three Then statements in this order:
Move to Folder Spam
Set Status Read
Stop Processing
This set of Then statements delivers the email to the Spam folder and marks it as read, so you aren't distracted by the appearance of unread messages in the folder. It then keeps other filters from being applied to the same message and delivering it to other folders. Once you've tested your rules together, you may want to change the second statement to Set Status Deleted. However, until you're confident with your set of rules, this choice may cause you to lose legitimate messages.
While testing the results, you may want to assign a color or a sound to each piece of email marked by a rule, so you can see how many messages it is catching.
Completed rules are listed in the Tools > Filter window. Evolution processes them sequentially, so the order in which they're listed in the Filter window can affect the how useful a rule is. For example, a rule that moves messages from person@address.com to a particular folder will never come into effect if a rule that is processed first deletes all mail in which the sender's address contains @address.com. Since combinations of rules can have unexpected results, the Filter window includes buttons for moving a selected rule up or down in the list.
When you first write a list of rules, you'll probably need to debug them. If you do, opening View > Message Display > Show Email Source can help by showing the message headers that the normal view conceals.
Creating basic spam filters
The simplest spam filters in Evolution are the creation of whitelists and blacklists -- that is, lists of addresses from which you will or will not accept email. If you have a particular person in mind for either list, then the IF statement is: Sender is [email address]. If you want to include an entire domain, then the IF statement can be either Sender contains [domain name] or Sender ends with [domain name]. Set the rule to execute when any criteria are met, and you only need to create two rules, each a list of addresses or domains.
For most people, a whitelist is easy to set up. Place the whitelist at the top of the filter rules, and you won't miss any essential email. By contrast, because spammers frequently change addresses, a blacklist is likely to require constant updating. This updating defeats the utility of creating rules by forcing you to spend far more time dealing with spam than you care to.
For this reason, rules that identify characteristics of spam rather than email sources are likely to be more useful. For example, emails that list your address as both sender and recipient are likely to be spam, so you could create a rule to send them to the Spam folder using two IF statements: Sender contains [email address] and Recipients contains [email address].
Similarly, these two If statements create a filter based on size: Size (kb) is greater than 60 and Attachments do not exist. Since 60 kilobytes is about 9,000 words, that size should accommodate any mailing list digests you receive. If you don't subscribe to any mailing list digests, you can adjust the number to 20 or even lower. Either way, you can usually assume that a larger message without attachments will be spam full of graphics.
Other useful spam filters include a search for:
Words and phrases likely to be evidence of spam, such as click here or Cialis
Windows executables as an attachment. This search could be a list of IF statements containing the extensions of Windows executables, or simply Pipe to Program grep 'name=.*\.\(exe\scr\bat\pif\)'.
A date more than 96 hours before the message was received. Such a date could indicate a relayed message.
An empty Reply-To header
Many of these searches can be defined by If statements within a single rule.
Depending on your correspondents, you may also want to filter by character-set (charset). For example, I don't have any correspondents who write to me in Japanese, Korean, or Cyrillic characters. However, I regularly receive spam that uses these character sets. For that reason, setting up a filter on those character sets works for me. I also include a filter for messages that list no character set, since that can also be a sign of spam.
You can also search for HTML tags that are more likely to be used in spam. Filtering out all HTML email by searching for text/html seems a bit drastic, although some purists might consider it. More practically, you might consider setting up If statements that search the message body for:
large and extra large fonts (font size= "+)
tables (tbody)
red or blue text (#0000CC, #FFFF00)
Even if you have correspondents who insist on HTML email, these tags are still unlikely to be in the average legitimate e-mail. If necessary, you can filter such correspondents in a whitelist.
Evolution filters cannot check for all the signs that anti-spam software can detect. For instance, they cannot, as far as I can figure, assess the percentage of the message body that is in HTML, or determine that Microsoft Outlook is falsely identified as the mailer. Nor can Evolution filters evaluate the likelihood that a message is spam. Yet, with ingenuity and a study of results obtained from other anti-spam software, you might manage to filter 90-95% of your spam without any other measure.