We have been through all the pieces of our advanced incident response method, React Faster and Better, so it is time to wrap up this series. The best way to do that is to actually run through a sample incident with some commentary to provide the context you need to apply the method to something tangible. It’s a bit like watching a movie while listening to the director’s commentary. But those guys are actually talented.
For brevity we will use an extremely simple high-level example of how the three response tiers evaluate, escalate, and manage incidents:
The alert
- It’s Wednesday morning and the network analyst has already handled a dozen or so network/IDS/SIEM alerts. Most indicate probing from standard network script-kiddie tools and are quickly blocked and closed (often automatically). He handles those himself, just another day in the office.
- The network monitoring tool pings an alert for an outbound request on a high port to an IP range located in a country known for intellectual property theft. The analyst needs to validate the origin of the packet, so he looks and sees the source IP is in Engineering. Ruh-roh.
- The tier 1 analyst passes the information along to a tier 2 responder. Important intellectual property may be involved and he suspects malicious activity, so he also phones the on-call handler to confirm the potential seriousness of the incident. Tier 2 takes over, and the tier 1 analyst goes back to his normal duties.
This is the first indication that something may be funky. Probing is nothing new and tier 1 needs to handle that kind of activity itself. But the outbound request very well may indicate an exfiltration attempt. And tracing it back to a device that does have access to sensitive data means it’s definitely something to investigate more closely. This kind of situation is why we believe egress monitoring and filtering are so important. Monitoring is generally the only way you can tell if data is actually leaking. At this point the tier 1 analyst should know he is in deep water. He has confirmed the issue and pinpointed the device in question. Now it’s time to hand it off to tier 2. Note that the tier 1 analyst follows up with a phone call to ensure the hand-off happens and that there is no confusion.
How bad is bad?
- The tier 2 analyst opens an investigation and begins a full analysis of network communications from the system in question. The system is no longer actively leaking data, but she blocks any traffic to that destination on the perimeter firewall by submitting a high priority request to the firewall management team. After that change is made, she verifies that traffic is in fact being blocked.
- She sets an alert for any other network traffic from that system and calls or visits the user, who predictably denies knowing anything about it. She also learns that system normally doesn’t have access to sensitive intellectual property, which may indicate privilege escalation – another bad sign. Endpoint protection platform (EPP) logs for that system don’t indicate any known malware.
- She notifies her tier 3 manager of the incident and begins a deeper investigation of previous network traffic from the network forensics data. She also starts looking into system logs to begin isolating the root cause.
- Once the responder notices outbound requests to a similar destination from other systems on the same subnet, she informs incident response leadership that they may be experiencing a serious compromise.
- Then she finds that the system in question connected to a sensitive file server it normally doesn’t access, and transferred/copied some entire directories. It’s going to be a long night.
As we have been discussing, tier 2 tends to focus on network forensics because it’s usually the quickest way to pinpoint attack proliferation and severity. The first step is to contain the issue, which entails blocking traffic to the external IP – this should temporarily eliminate any data leakage. Remember, you might not actually know the extent of the compromise, but that shouldn’t stop you from taking decisive action to contain the damage as quickly as possible. At this point, tier 3 is notified – not necessarily to take action, but so they are aware there might be a more serious issue. It’s this kind of proactive communication that streamlines escalation between response tiers.
Next, the tier 2 analyst needs to determine how much the issue has spread within the environment. So she searches through the logs and finds a similar source, which is not good. That means more than one device is compromised and it could represent a major breach. Worst yet, she sees that at least one of the involved systems purposely connected to a sensitive file store and removed a big chunk of content. So it’s time to escalate and fully engage tier 3. Not that it hasn’t been fun thus far, but now the fun really begins.
Bring in the big guns
- Tier 3 steps in and begins in-depth analysis of the involved endpoints and associated network activity. They identify the involvement of custom malware that initially infected a user’s system via drive-by download after clicking a phishing link. No wonder the user didn’t know anything – they didn’t have a chance against this kind of attack.
- An endpoint forensics analyst then discovers what appears to be the remains of an encrypted RAR file on one of the affected systems. The network analysis shows no evidence the file was transferred out. It seems they dodged a bullet and detected the command and control traffic before the data exfiltration took place.
- The decision is made to allow what appears to be encrypted command and control traffic over a non-standard port, while blocking all outbound file transfers (except those known to be part of normal business process). Yes, they run the risk of blocking something legit, but senior management is now involved and has decided this is a worthwhile risk, given the breach in progress.
- To limit potential data loss through the C&C channels left open, they carefully monitor bandwidth usage. Due to the advanced nature of the attack they are trying to contain the problem without tipping off the attackers that they know what’s going on. Prior experience tells then that merely cutting off all communications will only escalate the attack before they can identify and clean involved systems.
- Sensitive data is slowly removed and replaced from the servers on that subnet. Forensics investigators turn over an infected system to the malware analysts to reverse engineer. The goal is to prepare a coordinated cleaning method and expulsion of the attacker, but to do this they need to fully understand the depth of compromise and identify all involved systems and malware variants.
- IDS/IPS specialists write a new network alert signature to identify similar traffic, and create a new malware signature for evaluating endpoints in the future.
Tier 3 immediately starts to analyze what the attack is and how it works. Once you have identified custom malware you know you aren’t dealing with amateurs. So the decision to allow C&C traffic but block file transfers is not surprising, albeit a little risky. Until the malware analysts understand how to eliminate the threat it doesn’t make sense to give away any hints that you know about the attack.
At the same time outbound transfers are stopped, the response team acts decisively to remove sensitive data from the reach of the attackers. This again serves to contain the damage until the threat can be neutralized, which involves a set of custom network rules to block this particular attack.
To be clear, sophisticated attacks in the real world are rarely this cut and dried, but the response team’s tactics are consistent. The objective is always to contain the damage while figure out the extent of the compromise. Then you have options for how to ultimately remediate it.
Post-mortem
To reiterate: the key points in the scenario above are rapid identification of a serious issue (outbound IP exfiltration), quick escalation to tier 2 for scoping and initial investigation, and rapid coordinated investigation and response with top-level resources once it becomes clear this is a sophisticated and advanced attack. The initial handler did a good job of recognizing the problem and understanding he couldn’t handle it himself. The second level responder didn’t fall into the trap of focusing too much on the first device, and therefore missing the bigger picture. The containment plan provided breathing space for a full cleansing of the incident without tipping off the attackers to attempt an emergency deeper penetration or allowing additional loss of important assets.
We need to React Faster and Better because we now face sophisticated attacks we have not seen before. That makes detecting every attack before it happens a pipe dream. By focusing on shortening the window between attack and detection, and having a solid plan to contain and then remediate the attack, you give yourself the best chance to live to fight another day. That’s one of the most significant epiphanies security folks can have. You cannot win, so success is about minimizing the damage. Yeah, that’s crappy, but it is realistic.
To position yourself most effectively to RFaB, we advocate an institutional commitment to data collection at all levels of the computing stack. Given the usefulness of network-level data through the incident response process; we believe monitoring tools such as full packet capture, Database Activity Monitoring, and Data Leak Prevention provide the best chance of being able to detect, contain, isolate, and remediate today’s sophisticated attacks.
But no collection of tools will ever replace a skilled team of incident handlers and investigators. Get the right people, establish the right processes, and then give them the tolls and support to do what they do best.
Reader interactions
2 Replies to “React Faster and Better: Piecing It Together”
@Loner, where to start. First off, Bounty is the quicker picker upper. Oy.
To address your questions:
1) Depends on which tier you are talking about. Tier 3 needs to have access to everything, and then some. Tiers 1 and 2 probably don’t need access to systems, since they are trying to isolate the issue and gauge proliferation. And godlike access to anything is something you want to restrict where possible.
2) You definitely could have short timer syndrome in tier 1. I’d address that by looking at it from a career progression standpoint. You don’t get to tier 2 w/o going through tier 1 and doing a good job. Don’t see it as much different than entry level sysadmins, who have to do lots of grunt work before they can exercise their kung fu.
3) Ops is usually brought into the process to implement the specific remediations proscribed by the team. The IR team tends not to control the fix, so they need to get help from ops (more likely a pre-defined specialist within ops) to make the changes needed STAT.
And yes, your observation is on the money. You’d like to think outbound ports would be blocked and egress filtering happening, but we can’t assume anything and we also wanted to make the example simple enough to follow.
Mike.
Movie with director’s commentary? You understate! This is more like watching a hot porn while a mistress goes down on you! Actually, if I were manager of a team/system like this, I don’t think I’d need a mistress…but I would need a constant supply of towels under my desk…
(Excellent series of posts, by the way!)
question 1: You have a lot of network visibility here, but do you think a team of analysts like this should also have godlike access to the systems under their purview? In other words, the ability to dive into the files, processes, kernel, and configs of equipment in question? I’m not sure I can ask that question properly without feeling like I’m leading it…but there it is.
question 2: Do you see this as possibly suffering from similar issues that typical customer support suffers from, namely tier 1 is lower-paid and thus a significant drop in quality? Then again, maybe that’s the point, and we’re assuming pretty darn good tools/alerts for the tier 1 to interpret.
question 3: Does this team have much integration with ops? This is a crazy vague question, but wanted to shotgun it out in case there is some easy “duh” answer.
silly observation 1: I’d hope that someone with this mature a process would block outbound ports that don’t have a pre-defined business need. That aside, I understand the point of the scenario regardless. 🙂