ESF: Endpoint Incident ResponseBy Mike Rothman
Nowadays, the endpoint is the path of least resistance for the bad guys to get a foothold in your organization. Which means we have to have a structured plan and process for dealing with endpoint compromises. The high level process we’ll lay out here focuses on: confirming the attack, containing the damage, and then performing a post-mortem.
To be clear, incident response and forensics is a very specialized discipline, and hairy issues are best left to the experts. But that being said, there are things you as a security professional need to understand, to ensure the forensics guys can do their jobs.
Confirming the attack
There are lots of ways your spidey-sense should start tingling that something is amiss. Maybe it’s the user calling up and saying their machine is slow. Maybe it’s your SIEM detecting some weird log records. It could be your configuration management system reverting inexplicable changes or noting the presence of strange executables. Or perhaps your network flow analysis shows some reconnaissance activities from the device. A big part of the security management process is about being able to fire alerts when something suspicious is happening.
Then we make like bloodhounds and investigate the issue. We’ve got to find the machine and isolate it. Yes, that usually means interrupting the user and ‘inviting’ them to grab a cup of coffee, while you figure out what a mess they’ve made. The first step is likely to do a scan and compare with your standard builds (you remember the standard build, right?). Basically we look for obvious changes that cause issues.
If it’s not an obvious issue (think tons of pop-ups), then you’ve got to go deeper. This usually requires forensics tools, including stuff to analyze disks and memory to look for corruption or other compromise. There are lots of good tools – both open source and commercial – available for your forensics toolkit.
We do recommend you take a course in simple forensics as you get started, for a simple reason. You can really screw up an investigation by doing something wrong, in the wrong order, or using the wrong tools. If it’s truly an attack, your organization may want to prosecute at some point, and that means you have to maintain chain of custody on any evidence you gather. You should consult a forensics expert and probably your general counsel to identify the stuff you need to gather from a prosecution standpoint.
Containing the damage
“Houston, we have a problem…” Yup, your fears were justified and an endpoint or 200 have been compromised – so what to do? First off, you should inherently know what to do because you have a documented incident response plan, and you’ve practiced the process countless times, and your team springs into action without prompting, right? OK, this is the real world, so hopefully you have a plan and your team doesn’t look at you like an alien when you take it to DEFCON 4.
In all seriousness, you need to have an incident response plan. And you need to practice it. The time to figure out your plan stinks is not while a worm is proliferating through your innards at an alarming rate. We aren’t going to go into depth on that process (we’ll be doing a series later this year on incident response), but the general process is as follows:
- Quarantine – Bad stuff doesn’t spread through osmosis – you need a network in place to allow malware to find new targets and spread like wildfire, so first isolate the compromised device. Yes, user grumpiness may follow, but whatever. They got pwned, so they can grab a coffee while you figure out how to contain the damage.
- Assess – How bad is it? How far has it spread? What are your options to fix it? The next step in the process is to understand what you are dealing with. When you confirm the attack, you probably have a pretty good idea what’s going on. But now you have to figure out what the best option(s) is to fix it.
- Workaround – Are there settings that can be deployed on the perimeter or at the network layer that can provide a short term fix? Maybe it’s blocking communication to the botnet’s command and control. Or possibly blocking inbound traffic on a certain port or some specific non-standard protocol that is the issue. Obviously be wary of the ripple effect of any workaround (what else does it break?), but allowing folks to get back to work quickly is paramount, so long as you can avoid the risk of further damage.
- Remediate – Is it a matter of changing a setting or uninstalling some bad stuff? That would be optimistic, eh? Now is when you figure out how to fix the issue, and increasingly these days re-imaging is the best answer. Today’s malware hides so well it’s almost impossible to entirely inoculate a compromised device, and impossible to know you got it all. Which means part of your incident response plan should be a leveraged way to re-image machines.
At some point you have to figure out if this is an incident you can handle yourself, or if you need to bring in the artillery, in the form of forensics experts or law enforcement. Your IR plan needs to be identify scenarios which call for experts, and which call for the law. You don’t want that to be a judgement call in the heat of battle. So define the scenarios, establish the contacts (at both forensics firms and law enforcement), and be ready. That’s what IR is all about.
Once most folks get done cleaning up an incident, they think the job is done. Well, not so much. The reality is that the job has just begun, since you need to figure out what happened and make sure it doesn’t happen again. It’s OK to get nailed by something you haven’t seen before (fool me once, shame on you). It’s not OK to get nailed by the same thing again (fool me twice, shame on me). So you’ve got to schedule a post-mortem.
The post-mortem is not about laying blame – it’s about making sure it doesn’t happen again. So you need someone to candidly and in great detail understand what happened and where the existing defenses failed. Again, it is what it is and it’s critical that the organization can accept failures and move on. But not before you figure out whether process, controls, product, or people need to change.
By the way, it’s very hard to fight human nature and build a culture where failure is OK and post mortems are learning experiences, as opposed to a venue for everyone to cover their respective asses. But we don’t believe you can be successful at security without a strong incident response plan and that requires unemotional post-mortem analysis.
And with that, we come to the conclusion of the Endpoint Security Fundamentals series. We’ll be packaging it up in white paper form over the next week, and it will then be posted to the research library. As always, if there are things we missed or ideas you disagree with, please continue to contribute. Securosis research is an ongoing process, so things will change and we’ll update the documents as required.
Other posts in the Endpoint Security Fundamentals Series
- Prioritize: Finding the Leaky Buckets
- Triage: Fixing the Leaky Buckets
- Controls: Update and Patch
- Controls: Secure Configurations
- Controls: Anti-Malware
- Controls: Firewall, HIPS, and Device Control
- Controls: Full Disk Encryption
- Building the Endpoint Security Program
- Endpoint Compliance Reporting