Nopam, no spam.
Nopam, stop the spams with no pain :-)
Overflowing spams bring lots of pains, email users sometimes suffer headaches for checking mailbox stuffed with spams, if they adopted spam filter, they would probably be heartbroken when normal mails be mis-deleted.
Meanwhile, corporate users waste lots of time and energy in finding the right solution, however, most of the detection softwares are expensive and defective, which is distressing as well.
Currently available techniques of antispam filter includes: blacklist, whitelist, rule based, and content filtration of Bayesian Filter.The majority of antispam filter combines these techniques, and Bayesian filter is the most important one of them.
The basic principles of content filtration in Bayesian Filter are; suppose we can divide the normal and junk emails into two groups, Bayesian filter uses statistic approach to learn probabilities for words, phrases and other characteristics that distinguish spam. The features of these keywords group then could be used to examine spam tendency of a new email.
However, there are still defects among these content filtration techniques. For example, it is not easy to establish an email sample big and precise enough, not to mention the high risk of prejudgment. Suppose there are lots of subjects listed "Very Important" in spams, which might not appear in the normal sample, the trained content filter could possibly mis-delete an email titled as "Very Important", and it could cause great damage. Please note, these are real cases.
Likewise, we found that many occurred misjudgment in the content filter in either electronic newspaper of shopping containing spam features or innocent emails are due to its spam content that happen to contain sensitive contents related to sex or drug therapies. There is always a complicate setting in the installation of anti-spam systems, for example, having to define the whitelist, blacklist, or threshold. One thing that causes headaches and troubles to the users is that anti-spam filter also requires feedback reports to train the system, like asking users to decide a detection intensity of light, magnetic or forced, etc., or asking administrators to set standard-line up to delete spams. Doesn't this mean users have to suffer an indecision between cleanness of mailbox and mis-deleted intensity of normal mail?
Design Principles
1.Deliver a high enough level of spam detection rate, keep the mis-deletion rate as low as possible, and pay less for the mistake.
2.System performance must be quick enough, invisibly executing, no effect on email sending or receiving, and no delaying.
3.System must offer extremely high stability and reliability, definitely no mistakes occurred during transfer, and guaranteed protection from mail lost.
4.The installation must be quite easy, no need of setting (eg. whitelist / blacklist)or training, performing automatically after installation.
5.To design a system for all districts and languages, we pondered over how to carry out a method that needs no training from normal/spam mail sample, nor going through painful learning experiences, that could be well adapted to all languages and districts. To meet this precondition, a technique that could dominate critical features of spam is essential. And those features shall be unrelated to mail content, language or district.
Through the ins and outs of occurrences, we comprehend that "The discrimination between spam and normal mail depends on the conduct instead of content."
Normally, spammers use all kind fake-simulated techniques to avoid blocking schemes. However, there is some certain similarities between different fake-simulated versions of one spam. That's to say, the manifestly and common features to distinguish spam from legitimate messages are fake-simulated, being sent abundantly, and with certain similarities, all these features show no concern with content, language or district.
Based on these features of spam and our expertises along with the achievements of network communication, search engine, similarity measure and agorithm design, we used similarity comparing and fake-simulated analysis technique in search engine to study the conduct of spams, then we developed a technique of relational closure analysis, to protect normal-mail from being mis-deleted, and keep the mis-deletion at the lowest rate.
Regarding the system application, besides the delicate design of algorithm and data structure, with our pursuit of high-speed performance, we use C language for programming, (please note: there are lots of anti-spam system programmed by high-end language like Perl) in our test reports of performance, nopam can handle more than 1 million emails in Intel Pentium 4 computers per day.
The independence and extension of nopam system
At the beginning of system design, we thought of two alternatives,
1) adopt Open Source like Postfix or Amavis as our communication system,
2) develop program modules by ourselves. After we considered the maximizing system performance and the extension of future system, we decided to develop all the program modules by ourselves, so that we could completely dominate the critical knowledge of every dimension.
On the future extension, QS could archive all email passing through, built-in database for the following advanced functions, such as long-term backup, searching, important archive protection, data-mining analysis or knowledge base management etc..
Looking forward, Nopam will not only release the pain(abolish the harmful) of users, it will bring users happiness in the email world (initiate the useful).