Applied Network Security Analysis: The Forensics Use Case
Most organizations don’t really learn about the limitations of event logs, until forensic investigators hold up their hands and explain they know what happened, but aren’t really sure how. Huh? How could that happen? It’s pretty simple: logs are a backward-looking indicator. They can help you piece together what happened, but you can only infer how. In a forensic investigation inferring anything is suboptimal. You want to know, especially given the needs to isolate the root cause of the attack and to establish remediations to ensure it doesn’t happen again. So we need to look at additional data sources to fill in gaps in what the logs tell you. Let’s take a look at a simplified scenario to illuminate the issues. We’ll look at the scenario both from the standpoint of a log-only analysis and then with a few other data sources added. For a more detailed incident response scenario, check out our React Faster and Better paper. The Forensic Limitations of Logs It’s the call you never want to get. The Special Agent on the other end of the line called to give you a heads-up: they found some of your customer data as part of another investigation into some cyber-crime activity that helps fund a domestic terrorist ring. Normally the Feds aren’t interested in giving you a heads-up until their investigation is done, but you have a relationship with this agent from your work together in the local InfraGard chapter. So he did you a huge favor. The first thing you need to do is figure out what was lost and how. To the logs! You aren’t sure how it happened, but you see some strange log records indicating changes on a application server in the DMZ. Given the nature of the data your agent friend passed along, you check the logs on the database server where that data resides as well. Interestingly enough, you find a gap in the logs on the database server, where your system collected no log records for a five-minute period a few days ago. You aren’t sure exactly what happened, but you know with reasonable certainty that something happened. And it probably wasn’t good. Now you work backwards and isolate the additional systems compromised as the attackers made their way through the infrastructure to reach their target. It’s pretty resource intensive, but by searching in the log manager you can isolate devices with gaps in their logs during the window you identified. The attackers were pretty effective, taking advantage of unpatched vulnerabilities (Damn, Ops!) and covering their tracks by turning off logging where necessary. At this point you know the attack path, and at least part of what was stolen, thanks to the FBI. Beyond that you are blind. So what can you do to make sure you aren’t similarly suprised somewhere down the line? You can set the logging system to alert if you don’t get any log records from critical assets in any 2-minute period. Again, this isn’t perfect and will result in a bunch more alerts, but at least you’ll know something is amiss before the FBI calls. With only log data you can identify what was attacked but probably not how the attack happened. Forensics Driven by Broader Data Let’s take a look at an alternative scenario with a few other data sources such as full network packet capture, network flow records, and configuration files. Of course it is still a bad day when you get the call from your pal the Special Agent. Of course Applied Network Security Analysis cannot magically make you omniscient, but how you investigate breaches changes. You still start with the logs on the perimeter server and identify the device that served as the attacker’s initial foothold. But you’ve implemented the Full Packet Capture Sandwich architecture described in the last post, so you are capturing the network traffic in your DMZ. You proceed to the network analysis console (using the full packet capture stream) and search all the traffic to and from the compromised server. Most sessions to that server are typical – standard application traffic. But you find some reconnaissance, and then something pretty strange: an executable injected into the server via faulty field validation on the web app (Damn, Developers!). Okay, this confirms the first point of exploit. Next we go to the target (keeping in mind what data was compromised) and do a similar analysis. Again, with our full packet capture sandwich in place, we captured traffic to/from the database server as well. As in the log-only scenario, we pinpoint the time period when logging was turned off, then perform a search in our analysis console to figure out what happened during that 5-minute period on that segment. Yep, a privileged account turned off logging on the database server and added an admin account to the database. Awesome. Using that account, the attacker dumped the database table and moved the data to a staging server elsewhere on your network. Now you know which data was taken, but how? You aren’t capturing all the traffic on your network (infeasible), so you have some blind spots, but with your additional data sources you are able to pinpoint the attack path. The NetFlow records coming from the compromised database server show the path to the staging server. The configuration records from the staging server indicate what executables were installed, which enabled the attacker to package and encrypt the payload for exfiltration. Further analysis of the NetFlow data shows the exfiltration, presumably to yet another staging server on another compromised network elsewhere. It’s not perfect, because you are figuring out what already happened. But now you can get back to your FBI buddy with a lot more information about what tactics the attacker used, and maybe even evidence that might be helpful in prosecution. Can’t Everyone Get Along? Clearly this is a simplified scenario that perfectly demonstrates the need to collect additional data sources to isolate the root cause and attack path of any