Technology antispam
Means which spammers mask undesirable mail constantly become complicated. One of the most difficult cases for spam filters is graphic spam which may contain also the noise complicating text recognition. How do producers spam filters fight with this and other modern tricks of spammers?
- Directory of solutions and cybersecurity projects - Antispam
Content |
Main articles:
2020: Mintsifrazvitiya suggested to develop the mechanism of a withdrawal of consent to personal data processing
On February 6, 2020 it became known that the Ministry of digital development, communication and mass communications of the Russian Federation suggested to develop the mechanism allowing citizens to withdraw the consent to personal data processing. In addition, it will allow them to spam-protect itself, consider in the ministry. Read more here.
2010
Methods of fight against spam
Fight against spam begins on the server which is engaged in transfer of messages that it is more profitable in terms of economy of traffic gives the big accuracy of work and more effectively, than to configure spam filter of an e-mail client.
The most widespread and old method of fight against spam – use of DNSBL (DNS Black Lists). The principle of its action is simple and consists in blocking of all mail coming from the IP address entered in the black list. Another settled and long ago the got accustomed method of fight against spam – content filtering. Potentially undesirable letter is checked for existence of the specific words, text fragments, pictures and other lines characteristic of spam. The third method – a greylisting – is based on a temporary failure. After the suspicious letter came, the answer with an error code, clear for all mail systems automatically goes to it. After a while a system repeatedly sends the letter that is not done by the programs sending spam.
The described methods have both pluses, and essential shortcomings. For example, the method of content filtering can recognize important information mailing as spam therefore letters will not reach receivers. The DNSBL method though guarantees to 100% blocking of a flow from the specified IP addresses, but it is simple to spammer to replace the address and to continue mailing.
The special complexity for recognition is represented by graphic spam which share in the general flow of undesirable messages makes about 10%. In this case the text is written on images. For fight against graphic spam the solution - application of the systems of optical character recognition (OCR) was found. But such approach has essential shortcomings. First, OCR is extremely resource-intensive system and requires productive machines. Secondly, the similar systems do not provide the due accuracy of determination. And, thirdly, in response to application of the programs recognizing the image for spam filtering new "garbage" letters in the form of images with a large number of noise began to appear. Noise is shown in the form of characters of the different size, splittings the text by tables and lines. All this makes impracticable registration of spam by the OCR method.
Muffle "noise", sort the text
However and spam filters do not stand still. For filtering of the graphic spam containing noise the probabilistic statistical technique is used. In this case the decision on whether the text contains the image, is accepted on the basis of the nature of arrangement of probable graphic images of words and lines and also contents in them the revealed images of letters and words. In other words, the program analyzes the sequences of pixels in the image, predicting the probability of detection of letters or words, and at certain will recognize the image as spam. Conditions can length of words, number of characters and others serves. Unlike the systems of optical character recognition, the probabilistic method works with different options of the inclined or distorted letters and words that increases detecting accuracy. Besides, the new method processes images quicker.
In the field of fight against text spam new methods and methods of the analysis and fight against undesirable mail also appear.
All methods of content filtering can be separated into 2 classes. The methods based on the analysis of contents – content get to the first - a classical example is search of regular phrases and expressions. In the second – the methods based on the analysis of a context – metadata, for example, the analysis of investments or other file attributes (the size, type, etc.). Characteristic of any engine of content filtering is connected with quality of the made decisions. 2 types of errors are possible: the "good" decision is made by the "bad" letter and, on the contrary, the "bad" solution at "good". Old methods of implementation of methods of contextual filtering were slow, required big libraries, often glitched and could not work with the new, not recognizable manually yet spam. Methods of new generation use certain rules – heuristics. Advantages of such approach in fall forward of processing of the letter, increase in reliability and – essential plus - opportunities to register the new, not recognizable yet "garbage" letters.
IBM X-Force: the domain.ru came out on top by quantity of the websites extending spam
The IBM corporation published the report of 2010 Mid-Year Trend and Risk Report prepared by group of research and development in the field of information protection in September IBM X-Force. According to results of a research, since February, 2010 the Russian Internet domain (.ru) is in the first place on quantity of the inappropriate content (spam) registered on it, having overtaken such domains as.com,.net,.cn (China) and.info (see Table 1). On the geographic location sources of spam were distributed as follows: The USA (9.7% of spam), Brazil (8.4%), India (8.1%), Russia (5.3%), Vietnam (4.6%, see Table 2). At the same time more than 60% of the URL addresses registered in China containing spam have the domain of the highest level.ru. Thus, according to a research, typical spam message is sent from the computer which is physically located in the USA, India or Brazil has the URL address on the domain.ru, and its hosting is in China.
Table 1. The most widespread domains of the highest level containing spam, 1 half of 2010.
Table 2. Geographical distribution of sources of spam, 1 half of 2010.