React Faster and Better: Contain and Respond
In our last post, we covered the first level of incident response: validating and filtering the initial alert. When that alert triggers and your frontline personnel analyze the incident, they’ll either handle it on the spot or gather essential data and send it up the chain. These roles and responsibilities represent a generalization of best practices we have seen across various organizations, and your process and activities may vary. But probably not too much. Tier 2: Respond and contain The bulk of your incident response will happen within this second tier. While Tier 1 deals with a higher number of alerts (because they see everything), anything that requires any significant response moves quickly to Tier 2, where an incident manager/commander is assigned and the hard work begins. In terms of process, Tier 2 focuses on the short-term, immediate response steps: Size-up: Rapidly scope the incident to determine the appropriate response. If the incident might result in material losses (something execs need to know about), require law enforcement and/or external help, or require specialized resources such as malware analysis, it will be escalated to Tier 3. The goal here is to characterize the incident and gather the information to support containment. Contain: Based on your size-up, try to prevent the situation from getting worse. In some cases this might mean not containing everything, so you can continue to observe the bad guys until you know exactly what’s happening and who is doing it, but you’ll still do your best to minimize further damage. Investigate: After you set the initial incident perimeter, dig in to the next level of information to better understand the full scope and nature of the incident and set up your remediation plan. Remediate: Finish closing the holes and start the recovery process. The goal at this level is to get operations back up and running (and/or stop the attack), which may involve workarounds or temporary measures. This is different than a full recovery. If an incident doesn’t need to escalate any higher, at this level you’ll generally also handle the root cause analysis/investigation and manage the full recovery. This depends on on resources, team structure, and expertise. The Team If Tier 1 represent your dispatchers, Tier 2 are the firefighters who lead the investigation. They are responsible for more-complex incidents that involve unusual activity beyond simple signatures, multi-system/network issues, and issues with personnel that might result in HR/legal action. Basically, any kind of non-trivial incident ends up in the lap of Tier 2. While these team members may still specialize to some degree, it’s important for them to keep a broad perspective because any incident that reaches this level involves the complexity of multiple systems and factors. They focus more on incident handling and less on longer, deeper investigations. Primary responsibilities: Primary incident handling. More advanced investigations that may involve multiple factors. For example, a Tier 1 analyst notes egress activity; and the Tier 2 analyst then takes over and coordinates a more complete network analysis; as well as checking endpoint data where the egress originated, to identify/characterize/prioritize any exfiltration. This person has overall responsibility for managing the incident and pulling in specialist resources, as needed. They are completely dedicated to incident response. As the primary incident handlers, they are responsible for quickly characterizing and scoping the incident (beyond what they got from Tier 1), managing containment, and escalating when required. They are the ones who play the biggest role in closing the attacker’s window of malicious opportunity. Incidents they manage: Multi-system/factor incidents and investigations of personnel. Incidents are more complex and involve more coordination, but don’t require direct executive team involvement. When they escalate: Any activities involving material losses, potential law enforcement involvement, or specialized resources; and those requiring an all-hands response. They may even still play the principal management and coordination role for these incidents, but at that point senior management and specialized expertise needs to be in the loop and potentially involved. The Tools These responders have a broader skill set, but generally rely on a variety of monitoring tools to classify and investigate incidents as quickly as possible. Most people we talk with focus more on network analysis at this level because it provides the broadest scope to identify the breadth of the incident via “touch points” (devices involved in the incident). They may then delve into log analysis for deeper insight into events involving endpoints, applications, and servers; although they often work with a platform specialist – who may not be formally part of the incident response team – when they need deeper non-security expertise. Full packet capture (forensics): As in a Tier 1 response, the network is the first place to look to scope intrusions. The key difference is that in Tier 2 the responder digs deeper, and may use more specialized tools and scripts. Rather than looking at IDS for alerts, they mine it for indications of a broader attack. They are more likely to dig into network forensics tools to map out the intrusion/incident, as that provides the most data – especially if it includes effective analysis and visualization (crawling through packets by hand is a much slower process, and something to avoid at this level if possible). As discussed in our last post, simple network monitoring tools are helpful, but not sufficient to do real analysis of incident data. So full package capture is one of the critical pieces in the response toolkit. Location-specific log management: We’re using this as a catch-all for digging into logs, although it may not necessarily involve a centralized log management tool. For application attacks, it means looking at the app logs. For system-level attacks, it means looking at the system logs. This also likely involves cross-referencing with authentication history, or anything else that helps characterize the attack and provide clues as to what is happening. In the size-up, the focus is on finding major indicators rather than digging out every bit of data. Specialized tools: DLP, WAF, DAM, email/web security gateways, endpoint