DeepLocker: How AI Can Power a Stealthy New Breed of Malware
Cybersecurity is an arms race, where attackers and defenders play a constantly evolving cat-and-mouse game. Every new era of computing has served attackers with new capabilities and vulnerabilities to execute their nefarious actions.
In the PC era, we witnessed malware threats emerging from viruses and worms, and the security industry responded with antivirus software. In the web era, attacks such as cross-site request forgery (CSRF) and cross-site scripting (XSS) were challenging web applications. Now, we are in the cloud, analytics, mobile and social (CAMS) era — and advanced persistent threats (APTs) have been on the top of CIOs’ and CSOs’ minds.
But we are on the cusp of a new era: the artificial intelligence (AI) era. The shift to machine learning and AI is the next major progression in IT. However, cybercriminals are also studying AI to use it to their advantage — and weaponize it. How will the use of AI change cyberattacks? What are the characteristics of AI-powered attacks? And how can we defend against them?
At IBM Research, we are constantly studying the evolution of technologies, capabilities and techniques in order to identify and predict new threats and stay ahead of cybercriminals.
IBM Research developed DeepLocker to better understand how several existing AI models can be combined with current malware techniques to create a particularly challenging new breed of malware. This class of AI-powered evasive malware conceals its intent until it reaches a specific victim. It unleashes its malicious action as soon as the AI model identifies the target through indicators like facial recognition, geolocation and voice recognition.
You can think of this capability as similar to a sniper attack, in contrast to the “spray and pray” approach of traditional malware. DeepLocker is designed to be stealthy. It flies under the radar, avoiding detection until the precise moment it recognizes a specific target. This AI-powered malware is particularly dangerous because, like nation-state malware, it could infect millions of systems without being detected. But, unlike nation-state malware, it is feasible in the civilian and commercial realms.
A Bit of Evasive Malware History
The DeepLocker class of malware stands in stark contrast to existing evasion techniques used by malware seen in the wild. While many malware variants try to hide their presence and malicious intent, none are as effective at doing so as DeepLocker.
Let’s recap the evolution of evasive malware:
- In the late 1980s and early 1990s, the first variants of polymorphic and metamorphic viruses were designed to disrupt and destroy data. By means of obfuscation and mutating payloads, malware authors were avoiding antivirus systems that could easily screen files for known patterns using static signatures. Consequently, the antivirus industry gradually developed static code and malware-analysis capabilities to analyze obfuscated code and infer the malicious intent of code or files running on the endpoints they protected.
- In the 1990s, malware authors started to encrypt the malicious payload (using so-called packers), such that the malicious code would only be observable when it was decrypted into memory before its execution. The security industry responded with dynamic malware analysis, building initial versions of malware sandboxes, such as virtualized systems, in which suspicious executables (called samples) are run, their activities monitored and their nature deemed benign or malicious.
- Of course, attackers would not give in. In the 2000s, the first forms of evasive malware — malware trying to actively avoid analysis — were captured in the wild. For example, the malware used checks to identify whether it was running in a virtualized environment and whether other processes known to run in malware sandboxes were present. If any were found, the malware would stop executing its malicious payload in order to avoid analysis and keep its secrets encrypted. This approach is still prevalent today, as a May 2018 Security Week study found that 98 percent of the malware samples analyzed uses evasive techniques to varying extents.
- As malware sandboxes have become increasingly more sophisticated in the past few years — for example, using bare metal analysis systems, according to the Computer Security Group at the University of California, Santa Barbara, that run on real hardware and avoiding virtualization — adversaries have moved to a different strategy: targeted attacks. They section their infection routines to have an initial step to carefully inspect the environment they run in for any predefined “suspicious” features, such as usernames and security solution processes. Only if the target endpoint is found “clear” would the malware be fetched and executed, unleashing its nefarious activity. One well-known example of evasion is the Stuxnet worm, which was programmed to target and seek out only specific industrial control systems (ICS) from a particular manufacturer, and only with certain hardware and software configurations.
Nevertheless, although malware evasion keeps evolving, even very recent forms of targeted malware require predefined triggers that can be exposed by defenders by checking the code, packed code, configuration files or network activity. All of these triggers are observable to skilled malware analysts with the appropriate tools.
DeepLocker: Ultra-Targeted and Evasive Malware
DeepLocker has changed the game of malware evasion by taking a fundamentally different approach from any other current evasive and targeted malware. DeepLocker hides its malicious payload in benign carrier applications, such as a video conference software, to avoid detection by most antivirus and malware scanners.
What is unique about DeepLocker is that the use of AI makes the “trigger conditions” to unlock the attack almost impossible to reverse engineer. The malicious payload will only be unlocked if the intended target is reached. It achieves this by using a deep neural network (DNN) AI model.
The AI model is trained to behave normally unless it is presented with a specific input: the trigger conditions identifying specific victims. The neural network produces the “key” needed to unlock the attack. DeepLocker can leverage several attributes to identify its target, including visual, audio, geolocation and system-level features. As it is virtually impossible to exhaustively enumerate all possible trigger conditions for the AI model, this method would make it extremely challenging for malware analysts to reverse engineer the neural network and recover the mission-critical secrets, including the attack payload and the specifics of the target. When attackers attempt to infiltrate a target with malware, a stealthy, targeted attack needs to conceal two main components: the trigger condition(s) and the attack payload.
DeepLocker is able to leverage the “black-box” nature of the DNN AI model to conceal the trigger condition. A simple “if this, then that” trigger condition is transformed into a deep convolutional network of the AI model that is very hard to decipher. In addition to that, it is able to convert the concealed trigger condition itself into a “password” or “key” that is required to unlock the attack payload.
Technically, this method allows three layers of attack concealment. That is, given a DeepLocker AI model alone, it is extremely difficult for malware analysts to figure out what class of target it is looking for. Is it after people’s faces or some other visual clues? What specific instance of the target class is the valid trigger condition? And what is the ultimate goal of the attack payload?
Figure 1. DeepLocker – AI-Powered Concealment
To demonstrate the implications of DeepLocker’s capabilities, we designed a proof of concept in which we camouflage a well-known ransomware (WannaCry) in a benign video conferencing application so that it remains undetected by malware analysis tools, including antivirus engines and malware sandboxes. As a triggering condition, we trained the AI model to recognize the face of a specific person to unlock the ransomware and execute on the system.
Imagine that this video conferencing application is distributed and downloaded by millions of people, which is a plausible scenario nowadays on many public platforms. When launched, the app would surreptitiously feed camera snapshots into the embedded AI model, but otherwise behave normally for all users except the intended target. When the victim sits in front of the computer and uses the application, the camera would feed their face to the app, and the malicious payload will be secretly executed, thanks to the victim’s face, which was the preprogrammed key to unlock it.
It’s important to understand that DeepLocker describes an entirely new class of malware — any number of AI models could be plugged in to find the intended victim, and different types of malware could be used as the “payload” that is hidden within the application.