Advanced Endpoint and Server Protection: Detection/Investigation
Our last AESP post covered a number of approaches to preventing attacks on endpoints and servers. Of course prevention remains the shiny object most practitioners hope to achieve. If they can stop the attack before the device is compromised there need be no clean-up. We continue to remind everyone that hope is not a strategy, and counting on blocking every attack before it reaches your devices always ends badly. As we detailed in the introduction, you need to plan for compromise because it will happen. Adversaries have gotten much better, attack surface has increased dramatically, and you aren’t going to prevent every attack. So pwnage will happen, and what you do next is critical, both to protecting the critical information in your environment and to your success as a security professional. So let’s reiterate one of our core security philosophies: Once the device is compromised, you need to shorten the window between compromise and when you know the device has been owned. Simple to say but very hard to do. The way to get there is to change your focus from prevention to a more inclusive process, including detection and investigation… Detection Our introduction described detection: You cannot prevent every attack, so you need a way to detect attacks after they get through your defenses. There are a number of different options for detection – most based on watching for patterns that indicate a compromised device. The key is to shorten the time between when the device is compromised and when you discover it has been compromised. To be fair, there is a gray area between detection and prevention, at least from an endpoint and server standpoint. With the exception of application control, the prevention techniques described in the last post depend on actually detecting the bad activity first. If you are looking at controls using advanced heuristics, you detect the malicious behavior first – then you block it. In an isolation context you run executables in the walled garden, but you don’t really do anything until you detect bad activity – then you kill the virtual machine or process under attack. But there is more to detection than just figuring out what to block. Detection in the broader sense needs to include finding attacks you missed during execution because: You didn’t know it was malware at the time, which happens frequently – especially given how quickly attackers innovate. Advanced attackers have stockpiles of unknown exploits (0-days) they use as needed. So your prevention technology could be working as designed, but still not recognize the attack. There is no shame in that. Alternatively, the prevention technology may have missed the attack. This is common as well because advanced adversaries specialize in evading known preventative controls. So how can you detect after compromise? Monitor other data sources for indicators that a device has been compromised. This series is focused on protecting endpoints and servers, but looking at devices is insufficient. You also need to monitor the network for a full perspective on what’s really happening, using a couple techniques: Network-based malware detection: One of the most reliable ways to identify compromised devices is to watch for communications with known botnets. You can look for specific traffic patterns, or for communications to known botnet IP addresses. We covered these concepts in both the NBMD 2.0 and TI+SM series. Egress/Content Filtering: You can also look for content that should not be leaving the confines of your network. This may involve a broad DLP deployment – or possibly looking for sensitive content on your web filters, email security gateways, and next generation firewalls. Keep in mind that every endpoint and server device has a network stack of some sort. Thus a subset of this monitoring can be performed within the device, by looking at traffic that enters and leaves the stack. As mentioned above, threat intelligence (TI) is making detection much more effective, facilitated by information sharing between vendors and organizations. With TI you can become aware of new attacks, emerging botnets, websites serving malware, and a variety of other things you haven’t seen yet and therefore don’t know are bad. Basically you leverage TI to look for attacks even after they enter your network and possibly compromise your devices. We call this retrospective searching. This works by either a) using file trajectory – tracking all file activity on all devices, looking for malware files/droppers as they appear and move through your network; or b) looking for attack indicators on devices with detailed activity searching on endpoints – assuming you collect sufficient endpoint data. Even though it may seem like it, you aren’t really getting ahead of the threat. Instead you are looking for likely attacks – the reuse of tactics and malware against different organizations gives you a good chance to see malware which has hit others before long. Once you identify a suspicious device you need to verify whether the device is really compromised. This verification involves scrutinizing what the endpoint has done recently for indicators of compromise or other such activity that would confirm a successful attack. We’ll describe how to capture that information later in this post. Investigation Once you validate the endpoint has been compromised, you go into incident response/containment mode. We described the investigation process in the introduction as: Once you detect an attack you need to verify the compromise and understand what it actually did. This typically involves a formal investigation, including a structured process to gather forensic data from devices, triage to determine the root cause of the attack, and searching to determine how broadly the attack has spread within your environment. As we described in React Faster and Better, there are a number of steps in a formal investigation. We won’t rehash them here, but to investigate a compromised endpoint and/or server you need to capture a bunch of forensic information from the device, including: Memory contents Process lists Disk images (to capture the state of the file system) Registry values Executables (to support malware analysis and reverse engineering) Network activity logs As part of the investigation you also need to understand the attack timeline. This enables you