Network-based Malware Detection 2.0: Scaling NBMD
It is time to return to our Network-based Malware Detection (NBMD) 2.0 series. We have already covered how the attack space has changed over the past 18 months and how you can detect malware on the network. Let’s turn our attention to another challenge for this quickly evolving technology: scalability. Much of the scaling problem has to do with the increasing sophistication of attackers and their tools. Even unsophisticated attackers can buy sophisticated malware on the Internet. There is a well-developed market for packaged malware and suppliers are capitalizing on it. Market-based economies are a two-edged sword. And that doesn’t even factor in advanced attackers, who routinely discover and weaponize 0-day attacks to gain footholds in victim networks. All together, this makes scalability a top requirement for a network-based malware detection. So why is it hard to scale up? There are a few issues: Operating systems: Unless you have a homogenous operating system environment you need to test each malware sample against numerous vulnerable operating systems. The one-to-many testing requirement means that every malware sample requires 3-4 (or more) virtual machines, running different operating systems, to adequately test the file. VM awareness: Even better, attackers now check whether their malware is executing within a virtual machine. If so the malware either goes dormant or waits a couple hours, in hopes it will be cleared through the testbed and onto vulnerable equipment before it starts executing for real. So to fully test malware the sandbox needs to let it cook for a while. So you to spin up multiple VMs and need to let them run for a while – very resource intensive. Network impact: Analyzing malware isn’t just about determining a file is malicious. You also need to understand how it uses the network to connect to command and control infrastructure and perform internal reconnaissance to detect lateral movement. That requires watching the network stack on every VM and parsing network traffic patterns. Analyze everything: You can’t restrict your heavy analysis to only files that look obviously bad based on simple file characteristics. With the advanced obfuscation techniques in use today you need to analyze all unknown files. Given the number of files entering a typical enterprise network daily, you can see how the analysis requirements scale up quickly. As you can see the computing requirements to fully test inbound files are substantial and growing exponentially. Of course many people choose to reduce their analysis. You could certainly make a risk-based decision not even to try detecting VM-aware malware, and just pass or block each file instantly. You might decide not to analyze documents or spreadsheets for macros. You may not worry about the network characteristics of malware. These are all legitimate choices to help network-based malware detection scale without a lot more iron. But each compromise weakens your ability to detect malware. Everything comes back to risk management and tradeoffs. But, for what it’s worth, we recommend not skipping malware tests. Scaling the Malware Analysis Mountain Historically the answer to most scaling problems has been to add computing power – generally more and/or bigger boxes. The vendors selling boxes love that answer, of course. Enterprise customers not as much. Scaling malware detection hardware raises two significant issues. First is cost. We aren’t just referring to the cost of the product – each box requires a threat update subscription and maintenance. Second is the additional operational cost of managing more devices. Setting and maintaining policies on multiple boxes can be challenging; ensuring the device is operational, properly configured, and patched is more overhead. You need to keep each device within the farm up to date. New malware indicators appear pretty much daily and need to be loaded onto each device to remain current. We have seen this movie before. There was a time when organizations ran anti-spam devices within their own networks using enterprise-class (expensive) equipment. When the volume of spam mushroomed enterprises needed to add devices to analyze all the inbound mail and keep it flowing. This was great for vendors but made customers cranky. The similarities to network-based malware detection are clear. We won’t keep you in suspense – the anti-spam story ends in the cloud. Organizations realized they could make scaling someone else’s problem by using a managed email security service. So they did, en masse. This shifted the onus on providers to keep up with the flood of spam, and to keep devices operational and current. We expect a similar end to the NBMD game. We understand that many organizations have already committed to on-premise devices. If you are one of them you need to figure out how to scale your existing infrastructure. This requires central management from your vendor and a clear operational process for updating devices daily. At this point customer premise NBMD devices are mature enough to have decent central management capabilities, allowing you to configure policies and deploy updates throughout the enterprise. Keeping devices up to date requires a strong operational process. Some vendors offer the ability to have each device phone home to automatically download updates. Or you could use a central management console to update all devices. Either way you will want some human oversight of policy updates because most organizations remain uncomfortable with having policies and other device configurations managed and changed by a vendor or service provider. With good reason – it doesn’t happen often but bundled endpoint protection signature updates can brick devices. Bad network infrastructure updates don’t brick devices, but how useful is an endpoint without network access? As we mentioned earlier, we expect organizations to increasingly consider and choose cloud-based analysis, in tandem with an on-premise enforcement device for collection and blocking. This shifts responsibility for scaling and updating onto the provider. That said, accountability cannot be outsourced, so you need to ensure both detection accuracy (next post) and reasonable sample analysis turnaround times. Make sure to build this oversight into your processes. Another benefit of the cloud-based approach is the ability to share intelligence