Most email users are delighted when, after receiving junk emails, they tell their ISP who throws a switch and suddenly voila no more spam. But the problems are then not apparent for some months as the ISPs clients don't know what wanted emails they have missed. "Hands up who isn't here!" is clearly not the best manner of handling wanted email.
The AntEspam team have unique insights into spam and stopping junk email in the UK and Europe. They reverse-engineer the spammers' softwares which are responsible for sending out much of junk emails and analyse the linguistics of non-automated spam. Their unique filters and methods derive from analysing junk emails on the edge of determinability, constantly identifying common features to incorporate into spam stopping algorithms.
The common mistakes - how not to block spam
Many ISPs rely on block lists. Osirusoft was a particularly bad example which mis-identified so many sources of emails as spam that it was forced out of business. The service relied on a well meaning but misguided individual whose methods were flawed enough for people to become angry. Those ISPs who thought and think still that they can simply identify certain servers as only ever sending spam are mistaken. However, perhaps Osirusoft was shut down more on account of inappropriate use of his data than the data itself. The fallacy of IP blocking, and those inetrnet providers who use it is that there is certainty in genuine email communication and spam. Other services identify "open relays" and technical attributes likely to lead to a server being misused for spam, but again IPs who block emails absolutely on the basis of those results and services are misguided.
The only compilation of IP black-lists that has validity is Spamcop. This is because the data is compiled in real time as a result of spam sending servers pinging spam-trap addresses. We used to provide such facilities to Spamcop but the forged addresses of viral emails are potentially a concern to which we did not want to contribute.
So people are beginning to learn that the Hands up who isn't here! approach, together with approaches that assume certainty, are not good ideas.
Some consider the answer to spam to be in regulation. This is capable of threatening the right to free speech and is not justified by the nuisance of spam.
Many people in business think that they can install a desktop solution to spam.
Much desktop software operates nothing more sophisticated than a methodised whitelisting system in which people only receive the sort of emails they would expect from the people they already know. Hands up who isn't here!
For people in business who receive business enquiries, each one is put into the junk from where they need to fish it out from all the viagra. In deleting the increasing numbers of junk emails, they'll delete their incoming business too.
No one method of spam protection is perfect - were the viagra and other junk emails to have been removed, the desktop software would identify the new enquiries!
Whilst such software is not a really helpful solution in itself when used on email addresses used for business, when combined with the power of the AntEspam.co.uk filters its purpose and usefulness could take on a new dimension of usefulness . . .
Another method of spam filtering, also operated by the Qurb system is a "Challenge-Response" system, known as C-R in the trade, in which all emails from non-whitelisted sources are sent a challenging email to confirm their source. The system is undesirable and does not work, leading to unnecessary emails being sent, to missed wanted email, and potentially to your email address being blacklisted by the DNS Blacklists including Spamcop who comment:
"Description: This "selfish" method of spam filtering replies to all email with a "challenge" - a message only a living person can (theoretically) respond to. There are several problems with this method which have been well known for many years. . . . Solution: Do not use challenge/response filtering. Although it may stop most unwanted email for the person shielded by it, it generates more unwanted email for others. Since more and more sites will rightly block these challenge emails, you can never be sure they will reach their target even when they are not misdirected themselves. So these systems will lose legitimate mail in an attempt to stop unwanted mail."
Using systems such as these can get your email address blocked, so none of your emails will be received by others using DNS blocking systems . . .
In contrast, spam can be tackled effectively by intelligently applied filters and the reality is that widespread adoption of decent spam filtering practices. The effective blocking of spam will cause spammers to lose interest and to go away. . . In fact, we foresee the end of spam . . .
The three principles of good spam filtering practice
Rejected email should be sent to a spam bin for inspection.
The nature of emails should be treated as uncertainties rather than certainties.
No one method of analysis is likely to be wholly correct. Any particular method is likely to be only partially probably right. No single method used alone should be relied upon.
One should not try to produce a one-size-fits-all anti-spam protection system that handles normal private emails together with emails for public use, often exposed on webpages. Exposed emails require the cutting edge of spam protection which is entirely inappropriate to mailboxes known only within an organisation or a small group of people.
The AntEspam .co.uk approach to spam filtering
The secret of spam filtering is in quantum physics, cosmology and set theory.
The solution depends on the appreciation of the understanding of the finite and the infinite. Do we live in a closed or open system? Is our universe finite, infinite - or circular? Are we deceived by what we see?
SPAM FILTERING BEYOND BAYES Many have extolled the virtues of automatic self learning Bayes programs to stop spam whilst many have appreciated the fallacies and many debate them. Bayes provides a lazy solution relying on infinite number crunching rather than the engagement of applied intelligence, but it can take thousands of CPU resources to process in server farms, requiring significant energy consumption with implications for global warming. It can be successful, but in common with Vector Machine Analysis, spammers can and do attack the statistical approach of these systems. The attraction of Bayes is in automation and the thought that you can install it and forget about it. "Hands up who's not here!"
The reality is that complexity breeds complexity and any automatic process is only as clever as a robot. Underlying the whole issue of spam, there are assumptions. These include
That the spammers are infinitely adaptable in being able to get their messages across.
That because of the infinite adaptability of spammers, complex machines have to be employed to perform complex calculations
That spammers' subject matter is infinite
That word frequencies have to be calculated in order to provide a comprehensive overview of the email throughput to provide for infinite variations. Looking for infinite variations is a distraction from the task of identifying the particular output of the spammers within the finite.
We challenge these assumptions:
The subject matter of spammers' activities is limited and confines itself to readily defined categories. Emails outside those categories do not need their word counts analysed (Bayseically or at all) to be passed as genuine emails.
there are only a fixed and finite way of spammers being able to convey their message. This means that even if Nigerian Scammers reword their schemes our linguistic analysis described below still catches the re-wording.
Because message permutations are limited, non-self learning algorithms are adequate and cannot be skewed.
Systems relying on Bayes filters have loopholes and are unnecessarily complex. Even "difficult" emails can be cracked using some simple filters.
Word frequencies by themselves give only a partial picture of the message. Beyond known phrases, grammar and linguistic analyses are often better clues.
It is a myth that the spammers are adaptable and inventive.
Much spam is sent by off-the-shelf systems bought or hired by idiots and the same old rubbish is pumped out all the time. So much spam is flooded out produced by these systems and by zombied computers that we think it would be hard to find a "good", inventive, spammer nowadays. It is the product of the same mass-production, mass-availability and mass-marketing syndromes as the standard solutions which are pumped out by big companies to stop it. So the spammers win.
But the truth is that even those spammers at the cutting edge of their trade have a problem: the spammers are running out of variations that they can turn to, their permutations are ever diminishing. Their flaw is inherent in what they do - the nature of spam is finite and can be defined whilst the nature of genuine emails is infinite.
Because we identify the old rubbish so accurately and junk it, we only see the emails of unknown characteristics putting us at the cutting edge. We rapidly develop remedies to these and notify our SpamInsights subscribers.
We have libraries all the current techniques the spammers use to fool the Bayes and other filters. We gather libraries of material from emails on the edge of determinability and so are constantly sensitive to the spammers' latest ruses. We use the spammers' use of those techniques to identify the spam, if it is not obvious from the direct spam content. We rate the email and if the score is borderline but with spammy indications, we have another library of further hallmarks against which an absolute judgment can be made. We classify this library as "PASSIONFRUITS" and whilst we do not belive others to have come close in the detail of our libraries, we believe this library of criteria to be unique. It is the fruit of a passion for being accurate in the identification of spam . . .
OUR METHODOLOGY:- SPAM EQUALS PLUTONIUM
Like many, we started by using SpamAssassin on our server. The system examines email characteristics and assigns probabilities to them which are added together to provide a result. The system looks at all features of emails from technical aspects to specific content. It also provides opportunities for enhancement and our configuration file is in excess of 700k.
After applying the standard system to protect a growing number of clients who share a common business and who have email addresses exposed on webpages we embarked upon systems of enhancements which have far outstripped the capabilities of the vanilla system.
As the webmaster needed to monitor the effectiveness of the promotional work he was doing for the clients, and those addresses were used not for private correspondence purposes - relating only to the business - the addresses copied to the webmaster. Our experience and filters that we have developed results from working for five years or so upon the hundreds of spam-traps which resulted.
We were able to examine both the spam traps for wanted and unwanted emails together with the spam bin for unwanted emails, which inevitably would include some wanted emails from time to time.
A normal configuration of clients and spam bins results in the webmaster seeing none of the false positives and the clients each having to waste time having to re-invent the wheel trying to work out individual alterations to filters as solutions to the same problems, with varying levels of perception and successes.
In contrast, we were in a uniquely priviledged position overseeing email to a global community of email boxes sharing a common spam-bin within which spam characteristics became obvious.
It soon transpired that up to 90% of spam
comes from no more than around a dozen sources and
utilises no more than half a dozen methodologies.
Once one has cracked the nature of these, identifying spam reliably is not simple but at least becomes easier.
In essence we analyse emails and then apply sets of filters on quantum physical principles. We have a number of libraries of words, phrases, linguistics and grammar against which we test the email. These are split into rules named after fruits which are broadly themed and overlap, all carrying very different classes of values.
Having identified the spammers' methodology or linguistic theme and assigned a probability to it having arisen from any one of them, one can then apply a multiplication procedure with the probability of the email having been sent from a known spam source or group of sources. Rather than adding these probabilities together, they can be effectively multiplied and this probability multiplication approach enables us to be particularly certain about what is spam.
The resulting certainty of real spam identification on the basis of probability combination enables us to be confident. The combination of probabilities derived from different types and classes of test add to the confidence level, akin to atomic structure in which electrons reside in defined shells according to their number and energy level. If, in the analysis, we see enough layers of probability as electron shells, we find a spam as massive as a dangerous Plutonium atom and know that we don't have to go near it. This means that we only need to inspect the lighter atoms in our spam bins so that we increase the probability of finding that tiny wanted hydrogen atom which needs to rise above the rest to be forwarded to the intended recipient.
RESPONSIBLE ISPs HERALD THE BEGINNING OF THE END OF SPAM
The spam problem is becoming gently eased by certain ISPs of the "big players" who are making it easier to identify emails that are sent genuinely through their systems. As ISPs harden to spam, the spammers' automated systems are gradually becoming marginalised as to the sources from which and through which they can send.
AOL gets the prize for spam reduction:
we can confidently rely on no spam coming from AOL
AOL have adopted consistency through their servers which eases identification of real senders
Whilst AOL's spam filtering processes are less than perfect, they cooperate with other ISPs in identifying reliable sources, ensuring better services between AOL customers and between them and other reliable ISPs
After responsible ISPs who help with identifying outgoing mail, it's up to the spam filters to do the rest.
The nightmare ISP has been Yahoo which hosts the bulk proportion of genuine free email accounts. Much spam has originated from or through Yahoo, their reduction initiatives having been flawed and many of our clients have lost business on account of Yahoo wrongly diagnosing wanted emails as spam. Yahoo do not adopt methods consistently through their servers and supply services to third parties such as BT Internet. Sensible filtration based on the Yahoo initiatives is made impossible. They have tried to make it easier to identify a genuine Yahoo email but not adopted it universally throughout their domains. The theory is that the only spam we should get from Yahoo cannot be greatly automated and is usually sent by overpayment and 419 scammers but attempts to implement filtering based on Yahoo email structures have resulted in many false positives on account of their lack of consistency. Content filtering is the conerstone of defence against junk email.
Yahoo, however, are now taking a zero tolerance approach to scammers and closing their accounts very rapidly and this means that we are beginning to be able to take a more relaxed approach to filtering Yahoo emails.
HOW WE ACHIEVE RADIOACTIVE REPROCESSING OF SPAM
Many of our Spam diagnoses are given on our example spam email pages. If you are familiar with spams and conventional scoring systems, you will see how our filters pick up emails which pass through normal configurations.
Ordinary spam filters commonly block between 1% and 10% of wanted emails. These are the sort of filters crudely applied to "Hands up who'se not here" systems. Although partially successful and academically sound Bayesian heuristics often cannot work effectively. They are often only applied to words rather than groups of words which are more easily and often more effectively spotted by human rather than by machine. They contain assumptions that the spammers have circumvented. We look at the ways in which spammers avoid the conventional filters and because the commercial computer industry merely mass markets mass production. The mass involved becomes such a beheamouth that the little innovation appears to be desired. In contrast the spammers are constantly innovating and so avoid the conventional filters endlessly peddling their wares. Only when spamming ceases to be effective will spammers cease to bother to spam . . .
However, the failure of the industry at large to take on the level and speed of innovation of the spammers makes our task easier as the spammers have become more complacent. Even where we see the spammers evolving, the number of available permutations available to them is decreasing, so whilst many see it getting worse, we foresee that if properly dealt with, the end of spam is nigh . ..
A training in physics drums in the maxim "Spot the Assumption" and like bloodhounds we follow the trail and publicise them to our SpamInsights subscribers . . . The latest of our discoveries is our PASSIONFRUIT filters which enable a positive identification to be confirmed absolutely and a marginal email to be determined confidently.
You or your company could follow the trail . . . but with the advantage of our background of long experience you'll end up spending fortunes reinventing and duplicating the wheels. Over the years we have accumulated libraries of filters, words and phrases together with safe and effective probability ratings to apply.
Spam filtering can be a cloak and dagger affair between spammers and the filters. Our services of filtering and advice take us to the cutting edge of knowing what the spammers are doing.
OUR CONCEPTS AND CLASSES OF FILTER
Looking through our resources of example spam emails you'll see banks of colourful and appetizing filters that feed our engines - APPLES, CHERRIES, ORANGES, PIPS and PRICKLYPEARS to name a few, which sometimes get combined into TARTS and a GLUTTON or two at a FEAST.
APPLES are library of measurements of up to 280,000 entropic features within the email .
CHERRIES are nice things
ORANGES are libraries of ambiguities which can be nice but are acidic or dry depending on the conditions. Cart loads of APPLES and ORANGES are strewn across the road if spammers are in hot pursuit behind us...
PIPS are libraries of things we don't want, downright nuisances and
PRICKLYPEARS are easily identified and are best handled with care.
Further libraries and areas of linguistic analysis include items relating to Nigerian 419 scams (NIGER and WIDOW), viruses (DOOM), phishing and other financial scams (BANKSCAM).
Deliberate inexactitudes are included to take advantage of better accuracy, wider identification and control provided by Fuzzy Logic feeding into banks of multiplicational matrices of combinational rulesets.
If we don't like what we see we blast it into outer cyberspace with our PHAZOR - you'll see OUTPHAZORED - but these emails which end up in the spam bin have to be pretty bad to be blasted beyond the bin into oblivion. We cease to look at emails scoring beyond 800 but we have some control spam-traps to monitor example emails scoring beyond 4000.
Beyond our gluttonous tarts, and phazors, we identify a further layer of consideration for analysis by yet further sophisticated banks of PASSIONFRUIT filters which work on wholly new criteria.
Care and experience are necessary, together with knowledge of underlying assumptions inherently behind the application of each component test, analysis and ruleset. The fuzzy logic combinational multiple probabilistic approach is littered with checks and balances so that, as a matter of probability, no one series of factors leads to a series of unfortunate events leading to the losss of a wanted email.
(Please note: the naming of classes of filters as part of our sorting and categorisatioin process FRUIT of various kinds and PHAZORS is our Copyright)
We see so much spam and have developed filters for so many years that we are able to supply expert consultancy advice on spam reduction issues. Please contact us to enquire.
We are not a marketing company run on American principles that we see in Spam so we are not going to tell you what we are going to say, say what we want to tell you and then tell you what we have said.
We hope that having read about how we go about what we do and by browsing our pages of example emails that we catch you'll appreciate that we have not bothered to waste your time on the Viagra type of easily caught rubbish, that we are forcing the spammers into the margins and that we know what we are doing!
Further links for interesting reading:
http://www.oreillynet.com/pub/wlg/3841 : "Content-based spam filtering is a dead-end path" Someone rather depressed by poor attempts at spam filtering - these result only from poor analysis and techniques
http://www.scanmail-software.com/support/spam.html An example of a confiuguration file defining spam on a naive content based filter - why re-invent the wheel when www.AntEspam.co.uk has years of experience in defining filters?
http://www.poornam.com/articles_spam.php Proposed methods of providing email signatures - difficult to achieve universal compatibility - Yahoo has not been able to implement such a system universally on its systems, making their attempts worse than useless.