Network-based Malware Detection 2.0: Scaling NBMD
It is time to return to our Network-based Malware Detection (NBMD) 2.0 series. We have already covered how the attack space has changed over the past 18 months and how you can detect malware on the network. Let’s turn our attention to another challenge for this quickly evolving technology: scalability.
Much of the scaling problem has to do with the increasing sophistication of attackers and their tools. Even unsophisticated attackers can buy sophisticated malware on the Internet. There is a well-developed market for packaged malware and suppliers are capitalizing on it. Market-based economies are a two-edged sword. And that doesn’t even factor in advanced attackers, who routinely discover and weaponize 0-day attacks to gain footholds in victim networks. All together, this makes scalability a top requirement for a network-based malware detection.
So why is it hard to scale up? There are a few issues:
- Operating systems: Unless you have a homogenous operating system environment you need to test each malware sample against numerous vulnerable operating systems. The one-to-many testing requirement means that every malware sample requires 3-4 (or more) virtual machines, running different operating systems, to adequately test the file.
- VM awareness: Even better, attackers now check whether their malware is executing within a virtual machine. If so the malware either goes dormant or waits a couple hours, in hopes it will be cleared through the testbed and onto vulnerable equipment before it starts executing for real. So to fully test malware the sandbox needs to let it cook for a while. So you to spin up multiple VMs and need to let them run for a while – very resource intensive.
- Network impact: Analyzing malware isn’t just about determining a file is malicious. You also need to understand how it uses the network to connect to command and control infrastructure and perform internal reconnaissance to detect lateral movement. That requires watching the network stack on every VM and parsing network traffic patterns.
- Analyze everything: You can’t restrict your heavy analysis to only files that look obviously bad based on simple file characteristics. With the advanced obfuscation techniques in use today you need to analyze all unknown files. Given the number of files entering a typical enterprise network daily, you can see how the analysis requirements scale up quickly.
As you can see the computing requirements to fully test inbound files are substantial and growing exponentially. Of course many people choose to reduce their analysis. You could certainly make a risk-based decision not even to try detecting VM-aware malware, and just pass or block each file instantly. You might decide not to analyze documents or spreadsheets for macros. You may not worry about the network characteristics of malware. These are all legitimate choices to help network-based malware detection scale without a lot more iron. But each compromise weakens your ability to detect malware. Everything comes back to risk management and tradeoffs. But, for what it’s worth, we recommend not skipping malware tests.
Scaling the Malware Analysis Mountain
Historically the answer to most scaling problems has been to add computing power – generally more and/or bigger boxes. The vendors selling boxes love that answer, of course. Enterprise customers not as much. Scaling malware detection hardware raises two significant issues. First is cost. We aren’t just referring to the cost of the product – each box requires a threat update subscription and maintenance. Second is the additional operational cost of managing more devices. Setting and maintaining policies on multiple boxes can be challenging; ensuring the device is operational, properly configured, and patched is more overhead. You need to keep each device within the farm up to date. New malware indicators appear pretty much daily and need to be loaded onto each device to remain current.
We have seen this movie before. There was a time when organizations ran anti-spam devices within their own networks using enterprise-class (expensive) equipment. When the volume of spam mushroomed enterprises needed to add devices to analyze all the inbound mail and keep it flowing. This was great for vendors but made customers cranky. The similarities to network-based malware detection are clear. We won’t keep you in suspense – the anti-spam story ends in the cloud. Organizations realized they could make scaling someone else’s problem by using a managed email security service. So they did, en masse. This shifted the onus on providers to keep up with the flood of spam, and to keep devices operational and current. We expect a similar end to the NBMD game.
We understand that many organizations have already committed to on-premise devices. If you are one of them you need to figure out how to scale your existing infrastructure. This requires central management from your vendor and a clear operational process for updating devices daily. At this point customer premise NBMD devices are mature enough to have decent central management capabilities, allowing you to configure policies and deploy updates throughout the enterprise.
Keeping devices up to date requires a strong operational process. Some vendors offer the ability to have each device phone home to automatically download updates. Or you could use a central management console to update all devices. Either way you will want some human oversight of policy updates because most organizations remain uncomfortable with having policies and other device configurations managed and changed by a vendor or service provider. With good reason – it doesn’t happen often but bundled endpoint protection signature updates can brick devices. Bad network infrastructure updates don’t brick devices, but how useful is an endpoint without network access?
As we mentioned earlier, we expect organizations to increasingly consider and choose cloud-based analysis, in tandem with an on-premise enforcement device for collection and blocking. This shifts responsibility for scaling and updating onto the provider. That said, accountability cannot be outsourced, so you need to ensure both detection accuracy (next post) and reasonable sample analysis turnaround times. Make sure to build this oversight into your processes.
Another benefit of the cloud-based approach is the ability to share intelligence so any malicious file found in any protected customer network can be recognized by every other participant. This offers tremendous leverage, especially to smaller organizations, because security providers see far more malware than their customers. Benefiting from others’ misfortune makes good business sense – not least for threat intelligence.
As great as this cloud stuff sounds, there are legitimate concerns with cloud-based malware analysis. Let’s start with latency. The laws of physics insist it will always take time to ship a malware file up to the cloud, have it analyzed, and then receive the verdict. This creates a potential exploit window which must be managed. You need an organizational decision on whether to hold each new file until a determination is made or let it past, and clean up the messes later when files turns out to be malicious.
Our research shows that users can be understanding if they are notified a potentially malicious file is undergoing analysis before passed along. They don’t want their devices compromised (assuming they have been trained on why compromise is bad) and they certainly don’t want downtime while their devices are reimaged, so we expect they can largely accept reasonable delivery delays. But adding substantial latency without any kind of notification triggers batches of calls to the help desk complaining that the network is slow and work is suffering. The key to success here, as in so many things, is to effectively manage expectations.
Another issue is information sharing. Some organizations (such as the military and other high-security groups) remain reluctant to share information or send malware outside organizational boundaries. They may never be comfortable with the cloud so they stick to on-premise options. We expect vendors espousing cloud-based approaches to eventually recognize the importance of this use case and offer a customer-premise variant of their cloud technology. These malware-analysis private clouds are based around a central analysis device, and interact with enforcement points throughout the network. They have much less demand for NBMD devices at ingress points and can still share some intelligence internally. We also expect vendors to offer tightly controlled inbound-only updates of new indicators for the on-premise equipment.
We actually expect on-premise vendors to shift toward a hybrid approach as well, taking advantage of the cloud as appropriate, because small companies simply cannot afford to deploy everything they need internally. But even unbounded scalability doesn’t help if the device can’t identify malware. So our next post will talk about accuracy – both false positives and negatives create risk, and the growing sophisticated of attacks demands an evolution in detection.