Can you reliably detect malware threats from a tiny data set of malware samples in the wild—even if malware authors are constantly generating mutated or modified malware samples?
A new malware research paper by cybersecurity company Trend Micro and Federation University Australia reveals this is possible because of Machine Learning (ML).
The malware study collected 3,254 in-the-wild OS X malware samples and produced a fascinating result because of what it reveals as it looks beyond typical mutation patterns:
"An observation in the malware battlefront is that malware mutates over time to bypass static signature based detection by either upgrading its functions or applying new metamorphic (or obfuscation) techniques.
The downside for attackers is that malware mutation requires time and effort to do.
Due to this developmental cost, minor tactical modification to the original malware code frequently occurs and arrives in the form of an outbreak, while a major strategic code change rarely occurs across a longer period of time."
And that lack of strategic code change is exactly where researchers found a crucial clue to detecting malware that might normally evade signature based analysis.
The Federation University Australia and Trend Micro researchers found that the malware samples they analyzed share something in common that reveals it is likely malware, even when the malware is trying to "hide" from detection.
There is a unique pattern of instruction sequence in the malware samples of a campaign, regardless of whether or not that sample forms unpacking routines, metamorphic components, or pure functional modification.
In simple terms: Rregardless of how the malware samples alter or mutate, the pattern of instructions it follows is a red flag that this is, in fact, malware.
Researchers presented this chart (click to expand):
And researchers reached the following conclusion, which will become more apparent after you read the chart's interpretation:
"We noticed that all variants of the MAC.OSX.CallMe malware have had the same identical instruction sequences until a variation was introduced at instruction 5250.
Even after this variation was introduced, parts of the instructions for samples 2 and 3 simply moved from the instructions for the rest of the samples. This shows us that the instruction sequences for these different MAC.OSX.CallMe variants remain very much alike.
This proves that the program instruction sequence of malware samples is a vital component in identifying malware variants during outbreaks."
The research appears to be a breakthrough that proves Machine Learning can aid in analyzing a malware outbreak even with a small data set.
The new 2019 malware research is carefully detailed; read the Generative Malware Outbreak Detection report for yourself.