Everyone’s process is a bit different, but through our research we have found that the best teams tend to gear themselves through three general levels of response, each staffed with increasing expertise. Once the alert triggers, your goal is to filter out the day-to-day crud junior staffers are fully capable of handling, while escalating the most serious incidents through the response levels as quickly as possible. Having a killer investigation team doesn’t do any good if an incident never reaches them, or if their time is wasted on the daily detritus that can be easily handled by junior folks.
As mentioned in our last post, Organizing for Response, these tiers should be organized by skills and responsibilities, with clear guidelines and processes for moving incidents up (and sometimes down) the ladder. Using a tiered structure allows you to more quickly and seamlessly funnel incidents to the right handlers – keeping those with the most experience and skills from being distracted by lower-level events.
An incident might be handled completely at any given level, so we won’t repeat the usual incident response fundamentals, but instead focus on what to do at each level, who staffs it, and when to escalate.
Tier 1: Validate and filter
After an incident triggers, the first step is to validate and filter. This means performing a rapid analysis of the alert and either handling it on the spot or passing it up the chain of command. While incidents might trigger off the help desk or from another non-security source, the initial analysis is always performed by a dedicated security analyst or incident responder. The analyst receives the alert and it’s his or her job to figure out whether the incident is real or not, and if it is real, how severe it might be.
These folks are typically in your Security Operations Center and focus on “desk analysis”. In other words they handle everything right then and there, and aren’t running into data centers or around hallways. The alert comes in, they perform a quick analysis, and either close it out or pass it on. For simple or common alerts they might handle the incident themselves, depending on your team’s guidelines.
The team
These are initial incident handlers, who may be dedicated to incident response or, more frequently, carry other security responsibilities (e.g., network security analyst) as well. They tend to be focused on one or a collection of tools in their coverage areas (network vs. endpoint) and are the team monitoring the SIEM and network monitors. Higher tiers focus more on investigation, while this tier focuses more on initial identification.
- Primary responsibilities: Their main responsibility is initial incident identification, information gathering, and classification. They are the first human filter, and handle smaller incidents and identify problems that need greater attention. It is far more important that they pass information up the chain quickly than try to play Top Gun and handle things over their heads on their own. Good junior analysts are extremely important for quickly identifying more serious incidents for rapid response.
- Incidents they handle themselves: Basic network/SIEM alerts, password lockouts/failures on critical systems, standard virus/malware. Typically limited to a single area – e.g., network analyst.
- When they escalate: Activity requiring HR/legal involvement, incidents which require further investigation, alerts that could indicate a larger problem, etc.
The tools
The goal at this level is triage, so these tools focus on collecting and presenting alerts, and providing the basic investigative information we discussed in the fundamentals series.
- SIEM: SIEMs aren’t always very useful for full investigations, but do a good job of collecting and presenting top-level alerts and factoring in data from a variety of sources. Many teams use the SIEM as their main tool for initial reduction and scoping of alerts from other tools and filtering out the low-level crud, including obvious false positives. Central management of alerts from other tools helps to identify what’s really happening, even though the rest of the investigation and response will be handled at the original source. This reduces the number of eyeballs needed to monitor everything and makes the team more efficient.
- Network monitoring: A variety of network monitoring tools are in common use. They tend to be pretty cheap (and there are a few good open source options) and provide good bang for the buck, so you can get a feel for what’s really happening on your network. Network monitoring typically includes NetFlow, collected device logs, and perhaps even your IDS. Many organizations use these monitoring tools either as an extension of their SIEM environment or as a first step toward deeper network monitoring.
- Full packet network capture (forensics): If network monitoring represents baby steps, full packet capture is your first bike. A large percentage of incidents involves the network, so capturing what happens on the wire is the linchpin of any analysis and response. Any type of external attack, and most internal attacks, eventually involve the network. The more heavily you monitor, the greater your ability to characterize incidents quickly, because you have the data to reconstruct exactly what happened. Unlike endpoints, databases, or applications; you can monitor a network deeply, passively and securely, using tools that (hopefully) aren’t involved in the successful compromise (less chance of the bad guys erasing your network logs). You’ll use the information from your network forensics infrastructure to scope the incident and identify “touch points” for deeper investigation. At this level you need a full packet capture tool with good analysis capabilities – especially given the massive amount of data involved – even if you feed alerts to a SIEM. Just having the packets to look it, without some sort of analysis of them, isn’t as useful. Getting back to our locomotion example, deep analysis of full packet capture data is akin to jumping in the car.
- Endpoint Protection Platform (EPP) management console: This is often your first source for incidents involving endpoints. It should provide up-to-date information on the endpoint as well as activity logs.
- Data Loss Prevention (DLP): While not deployed as commonly as SIEM or network monitors, DLP is one of the top tools for identifying potential incidents involving sensitive information and unauthorized insider/employee activity.
- Web Application Firewalls (WAF) and Database Activity Monitoring (DAM): These are your go-to tools for monitoring applications and databases. Alerts are often fed off to a SIEM, but even at this level your analysts will often go directly to the WAF or DAM for more initial information on the incident.
The name of the game at Tier 1 is speed: timely alerts, rapid assessment, and seamless handling or escalation depending on the nature of the incident. It’s more important here to identify something that might be more severe and get the next tier of expertise involved, than to waste time digging in with a deep investigation that may be beyond the capabilities of the Tier 1 team.
You also probably noticed that we didn’t mention containment. That’s because at this point the handlers are closing basic incidents and anything that requires containment would be handed off to Tier 2. This is another reason quickly validating and filtering incidents is so important – you can’t start containment until this step is complete.
The other goal at this step is to collect the most valuable and pertinent information to pass on to the higher-level handlers. The more informed they are with the right data, the more quickly they can focus the next level of investigation and response, which we will discuss in the next post.
Comments