Incident Response in the Cloud Age: In Action
When we do a process-centric research project, it works best to wrap up the series with a scenario that really illuminates the concepts we’ve discussed throughout the series and make things a bit more tangible. In this situation, imagine you work for a mid-sized retailer that uses a mixture of in-house technology, SaaS, and has recently moved a key warehousing system into an IaaS provider upon rebuilding the application for cloud computing. You’ve got a modest sized security team of 10, which is not enough, but a bit more than many of your peers have. Senior management understands why security is important (to a point) and gives you decent leeway, especially relative to the new IaaS application. In fact, you were consulted during the IaaS architecture phase and provided some guidance (with some help from your friends at Securosis) as to building a Resilient Cloud Network Architecture and how to secure the cloud control plane. You also had the opportunity to integrate some orchestration and automation technology into the cloud technology stack. ##The Trigger You have your team on pretty high alert because a number of your competitors have recently been targeted by an organized crime ring that has gained a foothold with the competitors and proceeded to steal a ton of information about customers, pricing, and merchandising strategies. Given that this isn’t your first rodeo, you know when there is smoke there is usually fire, you decide to task one of your more talented security admins to do a little proactive _hunting_ in your environment. Just to make sure there isn’t anything going on. The admin starts to poke around by searching internal security data with some of the more recent samples of malware found in the attacks on the other retailers. The malware sample was provided by the retail industry’s ISAC (information sharing and analysis center). The analyst got a hit on one of the samples, confirming what your gut told you. You’ve got an active adversary on the network. So now you need to engage the incident response process. ##Job 1: Initial Triage Now that you know there is a _situation_, you assemble the response team. There aren’t a lot of you and half of the team has to pay attention to operational tasks, since taking down the systems wouldn’t make you popular with senior management or the investors. You also don’t want to jump the gun until you know what you’re dealing with, so you inform the senior team of the situation, but don’t take any systems offline. Yet. Since the adversary is active on the internal network, they most likely entered via a phishing or other social engineering attack. The admin’s searches showed 5 devices showing indications of the malware, so those devices are taken off the network immediately. Not shut down, but put on a separate network with Internet access to not tip off the adversary to your discovery of their presence on your network. Then you check the network forensics tool, looking for indications that data has been leaking. There are a few suspicious file transfers and luckily you integrated the egress filtering capability on the firewall with the network forensics tool. So once the firewall showed that some anomalous traffic was being sent to known bad sites (via a threat intelligence integration on the firewall), you started capturing the network traffic originating from the devices triggering the firewall alert. Automatically. That automation stuff sure makes things easier than having to manually do everything. As part of your initial triage, you’ve got endpoint telemetry telling you there are issues and network forensics data to get a clue as to what’s leaking. This is enough to know that you not only have an active adversary, but also that you more than likely have lost data. So you fire up the case management system, which will structure the investigation and then store all the artifacts of the investigation. The team is tasked with their responsibilities and sent on their way to get things done. You make the trek to the executive floor to keep senior management updated on the incident. ##Check the Cloud The attack seems to have started on the internal network, but you don’t want to take chances and need to make sure the new cloud-based application isn’t at risk. A quick check of the cloud console shows strange activity on one of the instances. A device within the presentation layer of the cloud stack was flagged by the monitoring system of the IaaS provider because there was an unauthorized change on that specific instance. Looks like the time you spent setting up the configuration monitoring service was time well spent. Since security was involved in the architecture of the cloud stack, you are in good shape. The application was built to be isolated. Even though it seems the presentation layer has been compromised, the adversaries can’t get to anything of value. And the clean-up has _already happened_. Once the IaaS monitoring system threw an alert, the instance in question was taken offline, and put into a special security group only accessible by the investigators. A forensic server was spun up and some other analysis was done. Another example of orchestration and automation really facilitating the incident response process. The presentation layer has large variances in traffic it needs to handle, so it was built using auto-scaling technology and immutable servers. Once the (potentially) compromised instance was removed from the group, another instance with a clean configuration was spun up and took on the workloads. But it’s not clear if this attack is related to the other incident, so you take the information about the cloud attack and pull it down to feed it into the case management system. But the reality is that this attack, even if related, isn’t presenting danger at this point, so it’s put to the side so you can focus on the internal attack and probably exfiltration. ##Building the Timeline Now that you’ve done the initial triage, it’s