Incident Response in the Cloud Age: More Data, No Data, or Both?
As we discussed in the first post of this series, incident response needs to change, given disruptions such as cloud computing and the availability of new data sources, including external threat intelligence. We wrote a paper called Leveraging Threat Intelligence in Incident Response (TI+IR) back in 2014 to update our existing I/R process map. Here is what we came up with: So what has changed in the two years since we published that paper? Back then the cloud was nascent and we didn’t know if DevOps was going to work. Today both the cloud and DevOps are widely acknowledged as the future of computing and how applications will be developed and deployed. Of course we will take a while to get there, but they are clearly real already, and upending pretty much all the existing ways security currently works, including incident response. The good news is that our process map still shows how I/R can leverage additional data sources and the other functions involved in performing a complete and thorough investigation. Although it is hard to get sufficient staff to fill out all the functions described on the map. But we’ll deal with that in our next post. For now let’s focus on integrating additional data sources including external threat intelligence, and handling emerging cloud architectures. More Data (Threat Intel) We explained why threat intelligence matters to incident response in our TI+IR paper: To really respond faster you need to streamline investigations and make the most of your resources, a message we’ve been delivering for years. This starts with an understanding of what information would interest attackers. From there you can identify potential adversaries and gather threat intelligence to anticipate their targets and tactics. With that information you can protect yourself, monitor for indicators of compromise, and streamline your response when an attack is (inevitably) successful. You need to figure out the right threat intelligence sources, and how to aggregate the data and run the analytics. We don’t want to rehash a lot of what’s in the TI+IR paper, but the most useful information sources include: Compromised Devices: This data source provides external notification that a device is acting suspiciously by communicating with known bad sites or participating in botnet-like activities. Services are emerging to mine large volumes of Internet traffic to identify such devices. Malware Indicators: Malware analysis continues to mature rapidly, getting better and better at understanding exactly what malicious code does to devices. This enables you to define both technical and behavioral indicators, across all platforms and devices to search for within your environment, as described in gory detail in Malware Analysis Quant. IP Reputation: The most common reputation data is based on IP addresses and provides a dynamic list of known bad and/or suspicious addresses based data such as spam sources, torrent usage, DDoS traffic indicators, and web attack origins. IP reputation has evolved since its introduction, and now features scores comparing the relative maliciousness of different addresses, factoring in additional context such as Tor nodes/anonymous proxies, geolocation, and device ID to further refine reputation. Malicious Infrastructure: One specialized type of reputation often packaged as a separate feed is intelligence on Command and Control (C&C) networks and other servers/sources of malicious activity. These feeds track global C&C traffic and pinpoint malware originators, botnet controllers, compromised proxies, and other IP addresses and sites to watch for as you monitor your environment. Phishing Messages: Most advanced attacks seem to start with a simple email. Given the ubiquity of email and the ease of adding links to messages, attackers typically find email the path of least resistance to a foothold in your environment. Isolating and analyzing phishing email can yield valuable information about attackers and tactics. As depicted in the process map above, you integrate both external and internal security data sources, then perform analytics to isolate the root cause of the attacks and figure out the damage and extent of the compromise. Critical success factors in dealing with all this data are the ability to aggregate it somewhere, and then to perform the necessary analysis. This aggregation happens at multiple layers of the I/R process, so you’ll need to store and integrate all the I/R-relevant data. Physical integration is putting all your data into a single store, and then using it as a central repository for response. Logical integration uses valuable pieces of threat intelligence to search for issues within your environment, using separate systems for internal and external data. We are not religious about how you handle it, but there are advantages to centralizing all data in one place. So as long as you can do your job, though – collecting TI and using it to focus investigation – either way works. Vendors providing big data security all want to be your physical aggregation point, but results are what matters, not where you store data. Of course we are talking about a huge amount of data, so your choices for both data sources and I/R aggregation platform are critical parts of building an effective response process. No Data (Cloud) So what happens to response now that you don’t control a lot of the data used by your corporate systems? The data may reside with a Software as a Service (SaaS) provider, or your application may be deployed in a cloud computing service. In data centers with traditional networks it’s pretty straightforward to run traffic through inspection points, capture data as needed, and then perform forensic investigation. In the cloud, not so much. To be clear, moving your computing to the cloud doesn’t totally eliminate your ability to monitor and investigate your systems, but your visibility into what’s happening on those systems using traditional technologies is dramatically limited. So the first step for I/R in the cloud has nothing to do with technology. It’s all about governance. Ugh. I know most security professionals just felt a wave of nausea hit. The G word is not what anyone wants to hear. But it’s pretty much the only way to establish the rules of engagement with cloud service providers. What kinds of things need to be defined? SLAs: One