For many years, information-security technology that detected and removed computer viruses relied upon “signatures” to identify malicious code on machines. While leveraging various technologies to improve efficiency, security software essentially maintained a database of what many strains of malware looked like, and would scan filesystems and memory for specific patterns of code that matched known signatures in the database.
The signature approach worked well in the early days of computer malware – there were only a small number of new strains created each day, viruses were not polymorphic, and the spreading of viruses took quite some time (in those days, malware spread by disk and BBS-download, not over the Internet), allowing anti-virus vendors to remain relatively up to date with their signature databases.
Of course, as time marched on there were an increasing number of occasions in which malware was too new to be caught using signatures – such situations occurred increasingly often as the Internet came into its current commercial iteration, and hackers crafted newer strains more frequently. Such a scenario happened to me in the late 1990s – I literally walked several blocks in downtown Manhattan to Egghead Software (remember it?) to buy anti-virus software because malware had gotten onto my then-employer’s network – with the package from another vendor in use within the office unable to yet detect the new strain.
Ultimately, to combat unknown malware, security software began to look for malware-like activities – over time morphing to some degree with the approach of intrusion detection systems. In fact, today, it is quite common for businesses seeking to prevent information-security disasters to utilize anomaly detection tools as an important layer of defense in case a hostile party penetrates into their infrastructure. Looking for unusual activity can prevent malware, and hackers, from stealing data.
But, anomaly detection also introduces a problem.
Often, the definition of an anomaly is simply something that deviates from what is expected as “normal” – security professionals either configure rules defining normal, or machine learning “learns” over time what normal usage of systems and networks looks like. Of course, however, user and system behavior evolve over time, and “normal” is a moving target, often with many valid exceptions. As such, anomaly detection can sometimes produce large numbers of “false positives” – numerous “abnormal”, yet benign, activities that trigger suspicious activity alerts. Due to resource limitations, the flood of alerts may overwhelm information-security personnel, who may not only be unable to look into every alert, but, as a result of being overwhelmed, may also miss those alerts that are most critical – i.e., those that represent actual hostile activity.
Keep in mind that training systems to understand anomalous activity is not simple – there is no single database that can be loaded in, as there is, for example, for authentication, or absolute set of rules, as exists on a firewall. There may be huge differences between environments as to what should be considered anomalous – and vis-à-vis which anomalous activities are likely indicators of a problem.
As such, one area where artificial intelligence may offer both security and efficiency benefits is in the analysis of anomalies – helping to determine what is truly hostile, irregular behavior, and what is simply acceptable, yet anomalous, activity. Integrating contextual information from threat intelligence feeds can also help clarify which unusual activities are likely symptomatic of a cyberattack, and which are more likely to represent perfectly acceptable activities.
Of course, incorporating after-the-fact information – such as that a particular anomaly was researched and the alert about it was determined to be a false positive – can also diminish the number of false positives in the futures.
Remember, detecting anomalous activity, and understanding anomalous activity, are two different things. For security, we need both.