Our last post defined what is needed to Really Respond Faster, so now let’s peel back the next layer of the onion to delve into collecting data that will be useful for investigation, both internally and externally. This starts with gathering threat intelligence to cover the external side. It also involves a systematic effort to gather forensic information from networks and endpoints while leveraging existing security information sources including events, logs, and configurations.

External View: Integrating Threat Intel

In the last post we described the kinds of threat intelligence at your disposal and how they can assist your response. But that doesn’t explain how you can gather this information or where to put it so it’s useful when you are knee-deep in response.

First let’s discuss the aggregation point. In Early Warning System we described a platform to aggregate threat intelligence. Those concepts are still relevant to what you need the platform to do. You need the platform to aggregate third-party intelligence feeds, and be able to scan your environment for indicators to find potentially compromised devices. To meet these goals a few major capabilities stand out:

  1. Open: The first job of any platform is to facilitate and accelerate investigation so you need the ability to aggregate threat intelligence and other security data quickly, easily, and flexibly. Intelligence feeds are typically just data (often XML), and increasingly distributed in industry-standard formats such as STIX – making integration relatively straightforward.
  2. Scalable: You will collect a lot of data during investigation, so scalability is essential. Keep in mind the difference between data scalability (the amount of stuff you can store) and computational scalability (your ability to analyze and search the collected data).
  3. Flexible search: Investigations still involve quite a bit of art, rather than being pure formal science. As tools improve and integrated threat intelligence helps narrow down targets for investigation, you will be less reliant on forensic ‘artists’. But you will always be mining collected data and searching for attack indications, regardless of the capabilities of the person with their hands on the keyboard. So your investigation platform must make it easy to search all your data sources, and then identify assets at risk based on what you found.

The key to making this entire process run is automation. Yes, we at Securosis talk about automation a whole lot these days, and there is a reason for that. Things are happening too quickly for you to do much of anything manually, especially in the heat of an investigation. You need the ability to pull threat intelligence in a machine-readable format, and then pump it into an analysis platform without human intervention. Simple, right? So let’s dig into the threat intelligence sources to provide perspective on how to integrate that data into your platform.

  1. Compromised devices: The most actionable intelligence you can get is still a clear indication of compromised devices. This provides an excellent place to begin your investigation and manage your response. There are many ways you might conclude a device is compromised. The first is by seeing clear indicators of command and control traffic in the device’s network traffic, such as DNS requests whose frequency and content indicate a domain generating algorithm (DGA) for finding botnet controllers. Monitoring traffic from the device can also show files or other sensitive data being transmitted by the device, indicating exfiltration or (via network traffic analysis) a remote access trojan.
  2. Malware indicators: As described in our Malware Analysis Quant research, you can build a lab and perform both static and dynamic analysis of malware samples to identify specific indicators of how the malware compromises devices. This is not for the faint of heart – thorough and useful analysis requires significant investment, resources, and expertise. The good news is that numerous commercial services now offer those indicators in a format you can use to easily search through collected security data.
  3. Adversary networks: Using IP reputation data broken down into groups of adversaries can help you determine the extent of compromise. If during your initial investigation you find malware typically associated with Adversary A, you can then look for traffic going to networks associated with that adversary. Effective and efficient response requires focus, so knowing which of your compromised devices may have been compromised in a single attack helps you isolate and dig deeper into that attack.

Given the demands of gathering sufficient information to analyze, and the challenge of detecting and codifying appropriate patterns and indicators of compromise, most organizations look for a commercial provider to develop and provide this threat intelligence. It is typically packaged as a feed for direct integration into incident response/monitoring platforms. Wrapping it all together we have the process map below. The map encompasses profiling the adversary as discussed in the last post, collecting intelligence, analyzing threats, and then integrating threat intelligence into the incident response process.

Internal View: Collecting Forensics

The other side of the coin is making sure you have sufficient information about what’s happening in your environment. We have researched selecting and deploying SIEM and Log Management extensively, and that information tends to be the low-hanging fruit for populating your internal security data repository. To aid investigation you should monitor the following sources (preferably continuously):

  • Perimeter networks and devices: The bad guys tend to be out there, meaning they need to cross your perimeter to achieve their mission. So look for issues on devices between them and their targets.
  • Identity: Who is as important as what, so analyze access to specific resources – especially within a privileged user context.
  • Servers: We are big fans of anomaly detection, configuration assessment, and whitelisting on critical servers such as domain controllers and app servers, to alert you to funky stuff to investigate at the server level.
  • Databases: Likewise, correlating database anomalies against other types of traffic (such as reconnaissance and network exfiltration) can indicate a breach in progress. Better to know that before your credit card brand notifies you.
  • File integrity: Most attacks change key system files, so by monitoring their integrity you can pinpoint when an attacker tries to make changes. You can even block these attacks using technology like HIPS, but that’s a story for another day.
  • Applications: Finally, you should be able to profile normal transactions and user interactions for your key applications (those accessing protected data) and watch for non-standard activities. That doesn’t necessarily indicate a problem, but does help prioritize investigation.

Network Forensics

Now let’s go one level deeper, into what’s happening in your environment, and we can start at the lowest level of the stack: the network. Network forensics tools basically capture all traffic on a given network segment, because the only way to really piece together exactly what happened is to review real network traffic. In a forensic investigation this is absolutely crucial, providing detail you cannot get from log records. Capturing all network traffic isn’t really practical in any organization of scale, but perimeter traffic should be feasible.

Along with traffic into and out of the perimeter, we recommend capturing packets on critical internal segments as well, typically within the data center. Eventually adversaries need to get to the important data to achieve their mission, so if you capture data from key internal networks as well as perimeter traffic – what we call the full packet capture sandwich – you have a better chance to piece together what happened.

What about less critical internal networks? You can minimize the amount of data collected from them by focusing on smaller data streams like IDS alerts, device logs, and NetFlow records; together those provides sufficient detail to pinpoint egregious issues for investigation and subsequent full packet capture. All other things being equal, it would be ideal to collect all traffic, but that is often not practical. This approach enables you to focus on areas where attackers are sure to be active (data center and perimeter) without going beyond the point of diminishing returns.

Endpoint Forensics

The hard truth of today’s malware is that malicious code might not look malicious when it enters your network. But at some later point you might determine it is malicious based on new threat intelligence, and then you will want to know where (if anywhere) that file has been active in your environment. The objective of endpoint forensics, like network forensics, is to track activity on each endpoint at a very granular level at all times, so you can pinpoint what malware did in your environment and on which devices. Yes, this imposes a compute burden on devices and generates a large amount of data, but fortunately we have fancy big data and cloud analytics technologies to ingest and analyze all that data, right? During incident response, if you have a malware profile, you can query endpoint forensic data to identify devices which show indications of that infection – regardless of when they were infected.

This kind of information is critical in trying to contain the damage of an attack. Having the ability to search all devices for indicators found on other compromised devices, or from a threat intelligence feed, can get you past WHAC-A-MOLE, finding and dealing with a single compromised device at a time.

Single Point of Aggregation

We have used the word aggregation multiple times in multiple contexts in this series. Given that incident response/management is driven by data, you shouldn’t be surprised by the need to aggregate data at multiple layers of this process. You are aggregating threat intelligence and internal security data. At some point you will probably need to aggregate both internal and external somewhere you can mine it to accelerate your investigation.

To clarify this a bit, there is physical integration – putting all your data into a single repository, and then using it as a central repository for response. There is also logical integration, where valuable pieces of threat intelligence are then used to search for issues within your environment, but within disparate systems for internal and external data. As with most things, we are not religious about how you do this. Clearly there are advantages to having all data centralized in one place. But as long as you can do your job, which is to collect TI and use it to focus investigation, either way works. Vendors providing security big data environments all want to be that physical aggregation point, but it’s about results, not where your data resides.

For sizing data collection efforts, it is important to be able to analyze log data over at least a 90-day period, with network and endpoint forensic data for 30 days or more. Today’s attackers are patient and persistent, meaning they aren’t just trying to do smash-and-grab attacks – they stretch attack timelines to 30, 60, and even 90 days. So you have two vectors for sizing your system: the number of critical segments to analyze, and how long to keep the data. We prefer greater retention for more critical resources such as perimeter and data center devices, rather than analyzing everything quickly and only retaining briefly in favor of newer records.

Now that you have both internal and external data to help accelerate incident response and management, it is time to actually respond and manage incidents. Our next post will lay out an incident response/management process map to leverage all this internal and external data.