Spam remains a growing problem in cyberspace. According to Ferris Research, which studies messaging and content control, 40 trillion spam messages will be sent in 2008, compared to 18 trillion in 2006 and 30 trillion in 2007.
In theory, email filtering software and appliances allow 'good' or 'true' email messages to pass through while prohibiting spam. But filters are not fool-proof and can on occasions mistakenly allow spam to pass through, believing it to be true email (known as a false negative), or they can mistakenly block true email, believing it to be spam (a false positive).
Typically, after identifying a message as spam, the filtering software either blocks it outright or places it in a quarantine folder, allowing the recipient to review it later. Although the latter method provides a chance to retrieve false positives, it requires time and effort from the user, and some users never bother to check their quarantine folders at all.
Deleting spam costs $.04 (2p) per message, according to Ferris Research. But Ferris analyst Richi Jennings points out that the cost to locate missing true email is far greater than that of deleting spam, about $3.50 (£1.75) per message. (Ferris developed these figures using published data on such factors as labour size and hourly labor costs, then applied its own estimates, such as the percentage of workforces having email access and volumes of spam messages.)
Even worse, Jennings says, organisations incur potentially greater costs through missed opportunities because of false positives that they never see - for example, a consulting firm that fails to receive a request for proposal.
To minimise the false positives caused by spam filters, it helps to know a bit about how they work. To keep up with ever more sophisticated spam, filters have used a variety of techniques over the years, often used in combination with one another. Here is a bird's-eye view of some popular techniques, in rough chronological order:
Keyword-based and Bayesian filters
The earliest filters searched a subject line and message body for particular words, such as Viagra. More sophisticated versions employ Bayesian analysis, which combine keyword searches with techniques such as determining ratios of 'good' to 'bad' words and assigning probability scores based on these ratios.
Unrecognised senders receive a reply asking them to validate themselves by supplying letters and characters that appear in images onscreen, a technique also known as Completely Automated Public Turing Test to Tell Computers and Humans apart (CAPTCHA). This test is based on the idea that humans can detect and input certain patterns, while computers are unable to do so. Once a sender has been validated, his email messages are sent straight through without the challenge step.
NEXT PAGE: More techniques employed by spam filters