Spam filtering best practice

Most email users are delighted when, after receiving junk emails, they tell their ISP who throws a switch and suddenly voila no more spam. But the problems are then not apparent for some months as the ISPs clients don't know what wanted emails they have missed. "Hands up who isn't here!" is clearly not the best manner of handling wanted email.

The AntEspam team have unique insights into spam and stopping junk email in the UK and Europe. They reverse-engineer the spammers' softwares which are responsible for sending out much of junk emails and analyse the linguistics of non-automated spam. Their unique filters and methods derive from analysing junk emails on the edge of determinability, constantly identifying common features to incorporate into spam stopping algorithms.

The common mistakes - how not to block spam

Many ISPs rely on block lists. Osirusoft was a particularly bad example which mis-identified so many sources of emails as spam that it was forced out of business. The service relied on a well meaning but misguided individual whose methods were flawed enough for people to become angry. Those ISPs who thought and think still that they can simply identify certain servers as only ever sending spam are mistaken. However, perhaps Osirusoft was shut down more on account of inappropriate use of his data than the data itself. The fallacy of IP blocking, and those inetrnet providers who use it is that there is certainty in genuine email communication and spam. Other services identify "open relays" and technical attributes likely to lead to a server being misused for spam, but again IPs who block emails absolutely on the basis of those results and services are misguided.

The only compilation of IP black-lists that has validity is Spamcop. This is because the data is compiled in real time as a result of spam sending servers pinging spam-trap addresses. We used to provide such facilities to Spamcop but the forged addresses of viral emails are potentially a concern to which we did not want to contribute.

So people are beginning to learn that the Hands up who isn't here! approach, together with approaches that assume certainty, are not good ideas.

Some consider the answer to spam to be in regulation. This is capable of threatening the right to free speech and is not justified by the nuisance of spam.

Many people in business think that they can install a desktop solution to spam.

Whilst such software is not a really helpful solution in itself when used on email addresses used for business, when combined with the power of the AntEspam.co.uk filters its purpose and usefulness could take on a new dimension of usefulness . . .

Another method of spam filtering, also operated by the Qurb system is a "Challenge-Response" system, known as C-R in the trade, in which all emails from non-whitelisted sources are sent a challenging email to confirm their source. The system is undesirable and does not work, leading to unnecessary emails being sent, to missed wanted email, and potentially to your email address being blacklisted by the DNS Blacklists including Spamcop who comment:

Using systems such as these can get your email address blocked, so none of your emails will be received by others using DNS blocking systems . . .

In contrast, spam can be tackled effectively by intelligently applied filters and the reality is that widespread adoption of decent spam filtering practices. The effective blocking of spam will cause spammers to lose interest and to go away. . . In fact, we foresee the end of spam . . .

The three principles of good spam filtering practice


One should not try to produce a one-size-fits-all anti-spam protection system that handles normal private emails together with emails for public use, often exposed on webpages. Exposed emails require the cutting edge of spam protection which is entirely inappropriate to mailboxes known only within an organisation or a small group of people.

The AntEspam .co.uk approach to spam filtering

The secret of spam filtering is in quantum physics, cosmology and set theory.

The solution depends on the appreciation of the understanding of the finite and the infinite. Do we live in a closed or open system? Is our universe finite, infinite - or circular? Are we deceived by what we see?

SPAM FILTERING BEYOND BAYES Many have extolled the virtues of automatic self learning Bayes programs to stop spam whilst many have appreciated the fallacies and many debate them. Bayes provides a lazy solution relying on infinite number crunching rather than the engagement of applied intelligence, but it can take thousands of CPU resources to process in server farms, requiring significant energy consumption with implications for global warming. It can be successful, but in common with Vector Machine Analysis, spammers can and do attack the statistical approach of these systems. The attraction of Bayes is in automation and the thought that you can install it and forget about it. "Hands up who's not here!"

The reality is that complexity breeds complexity and any automatic process is only as clever as a robot. Underlying the whole issue of spam, there are assumptions. These include


We challenge these assumptions: We have libraries all the current techniques the spammers use to fool the Bayes and other filters. We gather libraries of material from emails on the edge of determinability and so are constantly sensitive to the spammers' latest ruses. We use the spammers' use of those techniques to identify the spam, if it is not obvious from the direct spam content. We rate the email and if the score is borderline but with spammy indications, we have another library of further hallmarks against which an absolute judgment can be made. We classify this library as "PASSIONFRUITS" and whilst we do not belive others to have come close in the detail of our libraries, we believe this library of criteria to be unique. It is the fruit of a passion for being accurate in the identification of spam . . .

OUR METHODOLOGY:- SPAM EQUALS PLUTONIUM

Like many, we started by using SpamAssassin on our server. The system examines email characteristics and assigns probabilities to them which are added together to provide a result. The system looks at all features of emails from technical aspects to specific content. It also provides opportunities for enhancement and our configuration file is in excess of 700k.

After applying the standard system to protect a growing number of clients who share a common business and who have email addresses exposed on webpages we embarked upon systems of enhancements which have far outstripped the capabilities of the vanilla system.

As the webmaster needed to monitor the effectiveness of the promotional work he was doing for the clients, and those addresses were used not for private correspondence purposes - relating only to the business - the addresses copied to the webmaster. Our experience and filters that we have developed results from working for five years or so upon the hundreds of spam-traps which resulted.

We were able to examine both the spam traps for wanted and unwanted emails together with the spam bin for unwanted emails, which inevitably would include some wanted emails from time to time.

A normal configuration of clients and spam bins results in the webmaster seeing none of the false positives and the clients each having to waste time having to re-invent the wheel trying to work out individual alterations to filters as solutions to the same problems, with varying levels of perception and successes.

In contrast, we were in a uniquely priviledged position overseeing email to a global community of email boxes sharing a common spam-bin within which spam characteristics became obvious.

It soon transpired that up to 90% of spam

Once one has cracked the nature of these, identifying spam reliably is not simple but at least becomes easier.

In essence we analyse emails and then apply sets of filters on quantum physical principles. We have a number of libraries of words, phrases, linguistics and grammar against which we test the email. These are split into rules named after fruits which are broadly themed and overlap, all carrying very different classes of values.

Having identified the spammers' methodology or linguistic theme and assigned a probability to it having arisen from any one of them, one can then apply a multiplication procedure with the probability of the email having been sent from a known spam source or group of sources. Rather than adding these probabilities together, they can be effectively multiplied and this probability multiplication approach enables us to be particularly certain about what is spam.

The resulting certainty of real spam identification on the basis of probability combination enables us to be confident. The combination of probabilities derived from different types and classes of test add to the confidence level, akin to atomic structure in which electrons reside in defined shells according to their number and energy level. If, in the analysis, we see enough layers of probability as electron shells, we find a spam as massive as a dangerous Plutonium atom and know that we don't have to go near it. This means that we only need to inspect the lighter atoms in our spam bins so that we increase the probability of finding that tiny wanted hydrogen atom which needs to rise above the rest to be forwarded to the intended recipient.

RESPONSIBLE ISPs HERALD THE BEGINNING OF THE END OF SPAM

The spam problem is becoming gently eased by certain ISPs of the "big players" who are making it easier to identify emails that are sent genuinely through their systems. As ISPs harden to spam, the spammers' automated systems are gradually becoming marginalised as to the sources from which and through which they can send.

AOL gets the prize for spam reduction:

  1. we can confidently rely on no spam coming from AOL
  2. AOL have adopted consistency through their servers which eases identification of real senders
  3. Whilst AOL's spam filtering processes are less than perfect, they cooperate with other ISPs in identifying reliable sources, ensuring better services between AOL customers and between them and other reliable ISPs

After responsible ISPs who help with identifying outgoing mail, it's up to the spam filters to do the rest.

The nightmare ISP has been Yahoo which hosts the bulk proportion of genuine free email accounts. Much spam has originated from or through Yahoo, their reduction initiatives having been flawed and many of our clients have lost business on account of Yahoo wrongly diagnosing wanted emails as spam. Yahoo do not adopt methods consistently through their servers and supply services to third parties such as BT Internet. Sensible filtration based on the Yahoo initiatives is made impossible. They have tried to make it easier to identify a genuine Yahoo email but not adopted it universally throughout their domains. The theory is that the only spam we should get from Yahoo cannot be greatly automated and is usually sent by overpayment and 419 scammers but attempts to implement filtering based on Yahoo email structures have resulted in many false positives on account of their lack of consistency. Content filtering is the conerstone of defence against junk email.

Yahoo, however, are now taking a zero tolerance approach to scammers and closing their accounts very rapidly and this means that we are beginning to be able to take a more relaxed approach to filtering Yahoo emails.

HOW WE ACHIEVE RADIOACTIVE REPROCESSING OF SPAM

Many of our Spam diagnoses are given on our example spam email pages. If you are familiar with spams and conventional scoring systems, you will see how our filters pick up emails which pass through normal configurations.

Ordinary spam filters commonly block between 1% and 10% of wanted emails. These are the sort of filters crudely applied to "Hands up who'se not here" systems. Although partially successful and academically sound Bayesian heuristics often cannot work effectively. They are often only applied to words rather than groups of words which are more easily and often more effectively spotted by human rather than by machine. They contain assumptions that the spammers have circumvented. We look at the ways in which spammers avoid the conventional filters and because the commercial computer industry merely mass markets mass production. The mass involved becomes such a beheamouth that the little innovation appears to be desired. In contrast the spammers are constantly innovating and so avoid the conventional filters endlessly peddling their wares. Only when spamming ceases to be effective will spammers cease to bother to spam . . .

However, the failure of the industry at large to take on the level and speed of innovation of the spammers makes our task easier as the spammers have become more complacent. Even where we see the spammers evolving, the number of available permutations available to them is decreasing, so whilst many see it getting worse, we foresee that if properly dealt with, the end of spam is nigh . ..

A training in physics drums in the maxim "Spot the Assumption" and like bloodhounds we follow the trail and publicise them to our SpamInsights subscribers . . . The latest of our discoveries is our PASSIONFRUIT filters which enable a positive identification to be confirmed absolutely and a marginal email to be determined confidently.

You or your company could follow the trail . . . but with the advantage of our background of long experience you'll end up spending fortunes reinventing and duplicating the wheels. Over the years we have accumulated libraries of filters, words and phrases together with safe and effective probability ratings to apply.

Spam filtering can be a cloak and dagger affair between spammers and the filters. Our services of filtering and advice take us to the cutting edge of knowing what the spammers are doing.

OUR CONCEPTS AND CLASSES OF FILTER

Looking through our resources of example spam emails you'll see banks of colourful and appetizing filters that feed our engines - APPLES, CHERRIES, ORANGES, PIPS and PRICKLYPEARS to name a few, which sometimes get combined into TARTS and a GLUTTON or two at a FEAST.

If we don't like what we see we blast it into outer cyberspace with our PHAZOR - you'll see OUTPHAZORED - but these emails which end up in the spam bin have to be pretty bad to be blasted beyond the bin into oblivion. We cease to look at emails scoring beyond 800 but we have some control spam-traps to monitor example emails scoring beyond 4000.

Beyond our gluttonous tarts, and phazors, we identify a further layer of consideration for analysis by yet further sophisticated banks of PASSIONFRUIT filters which work on wholly new criteria.

Care and experience are necessary, together with knowledge of underlying assumptions inherently behind the application of each component test, analysis and ruleset. The fuzzy logic combinational multiple probabilistic approach is littered with checks and balances so that, as a matter of probability, no one series of factors leads to a series of unfortunate events leading to the losss of a wanted email.


(Please note: the naming of classes of filters as part of our sorting and categorisatioin process FRUIT of various kinds and PHAZORS is our Copyright)


We see so much spam and have developed filters for so many years that we are able to supply expert consultancy advice on spam reduction issues. Please contact us to enquire.
CONTACT US


We are not a marketing company run on American principles that we see in Spam so we are not going to tell you what we are going to say, say what we want to tell you and then tell you what we have said.

We hope that having read about how we go about what we do and by browsing our pages of example emails that we catch you'll appreciate that we have not bothered to waste your time on the Viagra type of easily caught rubbish, that we are forcing the spammers into the margins and that we know what we are doing!


Further links for interesting reading: