Incident Response Fundamentals: Contain, Investigate, and Mitigate
In our last post, we covered the first steps of incident response – the trigger, escalation, and size up. Today we’re going to move on to the next three steps – containment, investigation, and mitigation. Now that I’m thinking bigger picture, incident response really breaks down into three large phases. The first phase covers your initial response – the trigger, escalation, size up, and containment. It’s the part when the incident starts, you get your resources assembled and responding, and you take a stab at minimizing the damage from the incident. The next phase is the active management of the incident. We investigate the situation and actively mitigate the problem. The final phase is the clean up; where we make sure we really stopped the incident, recover from the after effects, and try and figure out why this happened and how we can prevent it in the future. This includes the mop up (cleaning the remnants of the incident and making sure there aren’t any surprises left behind), your full investigation and root cause analysis, and Q/A (quality assurance) of your response process. Now since we’re writing this as we go, I technically should have included containment in the previous post, but didn’t think of it at the time. I’ll make sure it’s all fixed before we hit the whitepaper. Contain Containing an ongoing incident is one of the most challenging tasks in incident response. You lack a complete picture of what’s going on, yet you have to take proactive actions to minimize damage and potential incident growth. And you have to do it fast. Adding to the difficulty is the fact that in some cases your instincts to stop the problem may actually exacerbate the situation. This is where training, knowledge, and experience are absolutely essential. Specific plans for certain major incident categories are also important. For example: For “standard” virus infections and attacks your policy might be to isolate those systems on the network so the infection doesn’t spread. This might include full isolation of a laptop, or blocking any email attachments on a mail server. For those of you dealing with a well-funded persistent attacker (yeah, APT), the last thing you want to do is start taking known infected systems offline. This usually leads the attacker to trigger a series of deeper exploits, and you might end up with 5 compromised systems for every one you clean. In this case your containment may be to stop putting new sensitive data in any location accessed by those compromised systems (this is just an example, responding to these kinds of attackers is most definitely a complex art in and of itself). For employee data theft, you first get HR, legal, and physical security involved. They may direct you to you to instantly lock them out or perhaps just monitor their device and/or limit access to sensitive information while they build a case. For compromise of a financial system (like credit card processing), you may decide to suspend processing and/or migrate to an alternative platform until you can determine the cause later in your response. These are just a few quick examples, but the goal is clear – make sure things do not get worse. But you have to temper this defensive instinct with any needs for later investigation/enforcement, the possibility that your actions might make the situation worse, and the potential business impact. And although it’s not possible to build scenarios for every possible incident, you want to map out your intended responses for the top dozen or so, to make sure that everyone knows what they should be doing to contain the damage. Investigate At this point you have a general idea of what’s going on and have hopefully limited the damage. Now it’s time to really dig in and figure out exactly what you are facing. Remember – at this point you are in the middle of an active incident; your focus is to gather just as much information as you need to mitigate the problem (stop the bad guys, since this series is security-focused) and to collect it in a way that doesn’t preclude subsequent legal (or other) action. Now isn’t the time to jump down the rabbit hole and determine every detail of what occurred, since that may draw valuable resources from the actual mitigation of the problem. The nature of the incident will define what tools and data you need for your investigation, and there’s no way we can cover them all in this series. But here are some of the major options, some of which we’ll discuss in more detail as we discuss deeper investigation and root cause analysis later in the process. Network security monitoring tools: This includes a range of network security tools such as network forensics, DLP, IDS/IPS, application control, and next-generation firewalls. The key is that the more useful tools not only collect a ton of information, but also include analysis and/or correlation engines that help you quickly sift through massive volumes of information quickly. Log Management and SIEM: These tools collect a lot of data from heterogenous sources you can use to support investigations. Log Management and SIEM are converging, which is why we include both of them here. You can check out our report on this technology to learn more. System Forensics: A good forensics tool(s) is one of your best friends in an investigation. While you might not use it to its complete capabilities until later in the process, the forensics tool allows you to collect forensically-valid images of systems to support later investigations while providing valuable immediate information. Endpoint OS and EPP logs: Operating systems collect a fair bit of log information that may be useful to pinpoint issues, as does your endpoint protection platform (most of the EPP data is likely synced to its server). Access logs, if available, may be particularly useful in any incident involving potential data loss. Application and Database Logs: Including data from security tools like Database Activity Monitoring and Web Application Firewalls. Identity, Directory and DHCP logs: To determine who