Applied Threat Intelligence: Use Case #2, Incident Response/ManagementBy Mike Rothman
As we continue with our Applied Threat Intelligence series, let us now look at the next use case: incident response/management. Similar to the way threat intelligence helps with security monitoring, you can use TI to focus investigations on the devices most likely to be impacted, and help to identify adversaries and their tactics to streamline response.
TI + IR/M
As in our last post, we will revisit the incident response and management process, and then figure out which types of TI data can be most useful and where.
You can get a full description of all the process steps in our full Leveraging TI in Incident Response/Management paper.
Trigger and escalate
The incident management process starts with a trigger kicking off a response, and the basic information you need to figure out what’s going on depends on what triggered the alert. You may get alerts from all over the place, including monitoring systems and the help desk. But not all alerts require a full incident response – much of what you deal with on a day-to-day basis is handled by existing security processes.
Where do you draw the line between a full response and a cursory look? That depends entirely on your organization. Regardless of the criteria you choose, all parties (including management, ops, security, etc.) must be clear on which situations require a full investigation and which do not, before you can decide whether to pull the trigger. Once you escalate an appropriate resource is assigned and triage begins.
Before you do anything, you need to define accountabilities within the team. That means specifying the incident handler and lining up resources based on the expertise needed. Perhaps you need some Windows gurus to isolate a specific vulnerability in XP. Or a Linux jockey to understand how the system configurations were changed. Every response varies a bit, and you want to make sure you have the right team in place.
As you narrow down the scope of data needing analysis, you might filter on the segments attacked or logs of the application in question. You might collect forensics from all endpoints at a certain office, if you believe the incident was contained. Data reduction is necessary to keep the data set to investigate manageable.
You may have an initial idea of who is attacking you, how they are doing it, and their mission based on the alert that triggered the response, but now you need to prove that hypothesis. This is where threat intelligence plays a huge role in accelerating your response. Based on indicators you found you can use a TI service to help identify a potentially responsible party, or more likely a handful of candidates. You don’t need legal attribution, but this information can help you understand the attacker and their tactics.
Then you need to size up and scope out the damage. The goal here is to take the initial information provided and supplement it quickly to determine the extent and scope of the incident. To determine scope dig into the collected data to establish the systems, networks, and data involved. Don’t worry about pinpointing every affected device at this point – your goal is to size the incident and generate ideas for how best to mitigate it. Finally, based on the initial assessment, use your predefined criteria to decide whether a formal investigation is in order. If yes, start thinking about chain of custody and using some kind of case management system to track the evidence.
Quarantine and image
Once you have a handle (however tenuous) on the situation you need to figure out how to contain the damage. This usually involves taking the device offline and starting the investigation. You could move it onto a separate network with access to nothing real, or disconnect it from the network altogether. You could turn the device off. Regardless of what you decide, do not act rashly – you need to make sure things do not get worse, and avoid destroying evidence. Many malware kits (and attackers) will wipe a device if it is powered down or disconnected from the network, so be careful.
Next you take a forensic image of the affected devices. You need to make sure your responders understand how the law works in case of prosecution, especially what provides a basis for reasonable doubt in court.
All this work is really a precursor to the full investigation, when you dig deep into the attack to understand what exactly happened. We like timelines to structure your investigation, as they help you understand what happened and when. Start with the initial attack vector and follow the adversary as they systematically moved to achieve their mission. To ensure a complete cleanup, the investigation must include pinpointing exactly which devices were affected and reviewing exfiltrated data via full packet capture from perimeter networks.
It turns out investigation is more art than science, and you will never actually know everything, so focus on what you do know. At some point a device was compromised. At another subsequent point data was exfiltrated. Systematically fill in gaps to understand what the attacker did and how. Focus on completeness of the investigation – a missed compromised device is sure to mean reinfection somewhere down the line. Then perform a damage assessment to determine (to the degree possible) what was lost.
There are many ways to ensure the attack doesn’t happen again. Some temporary measures include shutting down access to certain devices via specific protocols or locking down traffic in and out of critical servers. Or possibly blocking outbound communication to certain regions based on adversary intelligence. Also consider more ‘permanent’ mitigations, such as putting in place a service or product to block denial of service attacks.
Once you have a list of mitigation activities you marshal operational resources to work through it. We favor remediating affected devices in one fell swoop (big bang), rather than incremental cleaning/reimaging. We have found it more effective to eradicate the adversary from your environment as quickly as possible because a slow cleanup provides opportunity for them to dig deeper.
The mitigation is complete once you have halted the damage and regained the ability to continue operations. Your environment may not be pretty as you finish the mitigation, with a bunch of temporary workarounds to protect information and make sure devices are no longer affected. Make sure to always favor speed over style because time is of the essence.
Now take a step back and clean up any disruptions to normal business operations, making sure you are confident that particular attack will never happen again. Incident managers focus on completing the investigation and cleaning out temporary controls, while Operations handles updating software and restoring normal operations. This could mean updating patches on all systems, checking for and cleaning up malware, restoring systems from backup and bringing them back up to date, etc.
Your last step is to analyze the response process itself. What can you identify as opportunities for improvement? Should you change the team or your response technology (tools)? Don’t make the same mistakes again, and be honest with yourselves about what needs to improve.
You cannot completely prevent attacks, so the key is to optimize your response process to detect and manage problems as quickly and efficiently as possible, which brings us full circle back to threat intelligence. You also need to learn about your adversary during this process. You were attacked once and will likely be attacked again. Use threat intelligence to drive the feedback loop and make sure your controls change as often as needed to be ready for adversaries.
Now let’s delve into collecting the external data that will be useful to streamline investigation. This involves gathering threat intelligence, including the following types:
- Compromised devices: The most actionable intelligence you can get is a clear indication of compromised devices. This provides an excellent place to begin investigation and manage your response. There are many ways you might conclude a device is compromised. The first is clear indicators of command and control traffic coming from the device, such as DNS requests whose frequency and content indicate a domain generating algorithm (DGA) to locate botnet controllers. Monitoring network traffic from the device can also catch files or other sensitive data being transmitted, indicating exfiltration or a remote access trojan.
- Malware indicators: You can build a lab and perform both static and dynamic analysis of malware samples to identify specific indications of how malware compromises devices. This is a major commitment (as we described in Malware Analysis Quant) – thorough and useful analysis requires significant investment, resources, and expertise. The good news is that numerous commercial services now offer those indicators in formats you can use to easily search through collected security data.
- Adversary networks: IP reputation data can help you determine the extent of compromise, especially if it is broken up into groups of adversaries. If during your initial investigation you find malware typically associated with Adversary A, you can look for traffic going to networks associated with that adversary. Effective and efficient response requires focus, and knowing which devices may have been compromised in a single attack helps isolate and dig deeper into that attack.
Given the reality of scarce resources on the security team, many organizations select a commercial provider to develop and provide this threat intelligence, or leverage data provided as part of a product or service. Stand-alone threat intelligence is typically packaged as a feed for direct integration into incident response/monitoring platforms. Wrapping it all together produces the process map above. This map encompasses profiling the adversary, collecting intelligence, analyzing threats, and then integrating threat intelligence into incident response.
Action requires automation
The key to making this entire process run is automation. We talk about automation a lot these days, with good reason. Things happen too quickly in technology infrastructure to do much of anything manually, especially in the heat of an investigation. You need to pull threat intelligence in a machine-readable format, and pump it into an analysis platform without human intervention.
But that’s not all. In our next post we will discuss how to use TI within active controls to proactively block attacks. What? That was pretty difficult to write, given our general skepticism about really preventing attacks, but new technology is beginning to make a difference.