Incident Response Fundamentals: Trigger, Escalate, and Size upBy Rich
Okay, your incident response process is in place, you have a team, and you are hanging out in the security operations center, watching for Bad Things to happen. Then, surprise surprise, an alert triggers: what’s next?
Trigger and Escalate
The first thing you need to do is determine the basic parameters of the incident, and assign resources (people) to investigate and manage it. This is merely a quick and dirty step to get the incident response process kicked off, and the basic information you gather will vary based on what triggered the alert.
Not all alerts require a full incident response – much of what you already deal with on a day to day basis is handled by your existing security processes. Incident response is for situations that cannot be adequately handled by your standard processes. Most IDS, DLP, or other alerts/help desk calls don’t warrant any special response – this series is about incidents that fall outside the range of your normal background noise.
Where do you draw the line? That depends entirely on your organization. In a small business a single system infected with malware might lead to a response, while the same infection in a larger company could be handled as a matter of course. Technically these smaller issues (in smaller companies) are “incidents” and follow the full response process, but that entire process would be managed by a single individual with a few clicks. Regardless of where the line is drawn, communication is still critical. All parties must be clear on the specifics of which situations require a full incident investigation and which do not.
For any incident, you will need a few key pieces of information early to guide next steps. These include:
- What triggered the alert?
- If someone was involved or reported it, who are they?
- What is the reported nature of the incident?
- What is the reported scope of the incident? This is basically the number and nature of systems/data/people involved.
- Are any critical assets involved?
- When did the incident occur, and is it ongoing?
- Are there any known precipitating events for the incident? In other words, is there a clear cause?
All this should be collected in a matter of seconds or minutes through the alerting process, and provides your initial picture of what’s going on.
When an incident does look more serious, it’s time to escalate. We suggest you have guidelines for initiating this escalation, such as:
- Involvement of designated critical data or systems.
- Malware infecting a certain number of systems.
- Sensitive data detected leaving the organization.
- Unusual traffic/behavior that could indicate an external compromise.
Once you escalate it’s time to assign an appropriate resource, request additional resources (if needed), and begin the response. Remember that per our incident response principles, whoever first detects and evaluates the incident is in charge of it until they hand it off to someone else of equal or greater authority.
The term size up comes from the world of emergency services. It refers to the initial impressions of the responder as they roll up to the scene. They may be estimating the size of a cloud of black smoke coming out of a house, or a pile of cars in the middle of a highway. The goal here is to take the initial information provided and expand on it as quickly as possible to determine the true nature of the incident.
For an IT response, this involves determining specific criteria – some of which you might already know:
- Scope: Which systems/networks/data are involved? While the full scope of an IT incident may take some time to determine, right now we need to go beyond the initial information provided and learn as much about the extent of the incident as possible. This includes systems, networks, and data. Don’t worry about getting all the details of each of them yet – the goal is merely to get a handle on how big a problem you might be facing.
- Nature: What kind of incident are you dealing with? If it’s on the network, look at packets or reports from your tools. For an endpoint, start digging into the logs or whatever triggered the alert. If it involves data loss, what data might be involved? Be careful not to assume it’s only what you detected going out, or what you think was inappropriately accessed.
- People: If this is some sort of an external attack, you probably aren’t going to spend much time figuring out the home address of the people involved at this stage. But for internal incidents it’s important to put names to IP addresses for both suspected perpetrators and victims. You also want to figure out what business units are involved. All of this affects investigation and response.
Yes, I could have just said, “who, what, when, where, and how”.
We aren’t performing more than the most cursory examination at this point, so you’ll need to limit your analysis to basics such as security tool alerts, and system and application logs. The point here is to get a quick analysis of the situation, and that means relying on tools and data you already have.
The OODA Loop
Many incident responders are familiar with the OODA Loop originally developed by Colonel Boyd of the US Air Force. The concept is that in an incident, or any decision-making process, we follow a recurring cycle of Observe, Orient, Decide, and Act. These cycles, some of which are nearly instantaneous and practically subconscious, describe the process of continually collecting, evaluating, and acting on information.
The OODA Loop maps well to our Incident Response Fundamentals process. While we technically follow multiple OODA cycles in each phase of incident response, at a macro level trigger and escalate are a full OODA loop (gathering basic information and deciding to escalate), while size up maps to a somewhat larger loop that increases the scope of our observations, and closes with the action of requesting additional resources or moving on to the next response phase.
Once you have collected and reviewed the basics, you should have a reasonable idea of what you’re dealing with. At this point it’s time to call in additional resources (if necessary) and move on to containment, which we will discuss next. This is often the point where you engage with non-security operational personnel because further actions are likely to start affecting their systems.