Ten years ago, antivirus companies the world over were about to face an important milestone in their history: the arrival of the millionth piece of malware known to man.

Back in the day, each so-called “virus” was important: it had to be isolated, documented formally and dealt with by dozens of researchers gathered around a desk. Today, anti-malware companies get a million pieces of malware or more every two days.

So it’s easy to understand that these teams of reverse engineers now hardly have the time or resources to manually sift through this flood of malware. Collecting, inspecting and fixing things up for the customer are tedious tasks now offloaded to automated systems controlled by artificial intelligence. These systems, running machine learning algorithms, are currently a must-have for any antimalware company.

Machine learning technologies are everywhere around us, from driverless vehicles to personal assistants to powering Google searches. In the security industry, Bitdefender started integrating machine-learning technologies in its detection systems in 2009. In a world where roughly 400,000 new malicious programs emerge daily, traditional, signature-based antivirus systems can’t be truly proactive.

How does the magic happen?

Security companies use dozens of trained machine learning algorithms designed to individually tackle specific types of malware.

Training an algorithm involves feeding it a large amount of information for it to cluster and classify later. In our case, the 400,000 pieces of malware received daily must be divided into groups based on individual features that describe the respective file. This process, called “clustering,” is unsupervised. One major advantage of clustering is that it can overcome obfuscation – a common technique malware creators use to “scramble” the code and trick the antivirus.

Classification, on the other hand, is a supervised learning method that aims at building models that will detect currently unknown malware. The classifier (better known as a neural network) is trained in order to create a model that is then used in the security product. To be effective, it has to crunch large amounts of data and finish its training in a reasonable amount of time.

Why do we need artificial intelligence in a security product?

To date, the number of known malware samples in the world surpass 600 million. Evrey day, more than 400,000 new variants pile up. That’s 300 new pieces of malware a minute. Computer security has become so fast paced that there is no time for human analysis or “signature updates”: by the time we’d have them, you’d already be infected and your financials or private information would end up in a bad neighbourhood on the internet.

Computer security has traditionally been regarded as a cat-and mouse game between security researchers and cyber-criminals. But machine learning algorithms allow for the possibility to predict the mouse’s next 10 moves, so the cat can catch the mouse long before it enters the room.

Bitdefender has been constantly developing and training machine learning algorithms for cyber-security purposes since 2009. With more than 6 patents for our machine learning technologies, it is precisely the knowledge of how malware behaves and how machine learning complement security researchers that makes a lot of difference  in offering the best protection against malware.

Sponsored by Bitdefender