Most organizations don’t really learn about the limitations of event logs, until forensic investigators hold up their hands and explain they know what happened, but aren’t really sure how. Huh? How could that happen? It’s pretty simple: logs are a backward-looking indicator. They can help you piece together what happened, but you can only infer how.
In a forensic investigation inferring anything is suboptimal. You want to know, especially given the needs to isolate the root cause of the attack and to establish remediations to ensure it doesn’t happen again. So we need to look at additional data sources to fill in gaps in what the logs tell you. Let’s take a look at a simplified scenario to illuminate the issues. We’ll look at the scenario both from the standpoint of a log-only analysis and then with a few other data sources added. For a more detailed incident response scenario, check out our React Faster and Better paper.
The Forensic Limitations of Logs
It’s the call you never want to get. The Special Agent on the other end of the line called to give you a heads-up: they found some of your customer data as part of another investigation into some cyber-crime activity that helps fund a domestic terrorist ring. Normally the Feds aren’t interested in giving you a heads-up until their investigation is done, but you have a relationship with this agent from your work together in the local InfraGard chapter. So he did you a huge favor.
The first thing you need to do is figure out what was lost and how. To the logs! You aren’t sure how it happened, but you see some strange log records indicating changes on a application server in the DMZ. Given the nature of the data your agent friend passed along, you check the logs on the database server where that data resides as well. Interestingly enough, you find a gap in the logs on the database server, where your system collected no log records for a five-minute period a few days ago. You aren’t sure exactly what happened, but you know with reasonable certainty that something happened. And it probably wasn’t good.
Now you work backwards and isolate the additional systems compromised as the attackers made their way through the infrastructure to reach their target. It’s pretty resource intensive, but by searching in the log manager you can isolate devices with gaps in their logs during the window you identified. The attackers were pretty effective, taking advantage of unpatched vulnerabilities (Damn, Ops!) and covering their tracks by turning off logging where necessary. At this point you know the attack path, and at least part of what was stolen, thanks to the FBI. Beyond that you are blind. So what can you do to make sure you aren’t similarly suprised somewhere down the line? You can set the logging system to alert if you don’t get any log records from critical assets in any 2-minute period. Again, this isn’t perfect and will result in a bunch more alerts, but at least you’ll know something is amiss before the FBI calls.
With only log data you can identify what was attacked but probably not how the attack happened.
Forensics Driven by Broader Data
Let’s take a look at an alternative scenario with a few other data sources such as full network packet capture, network flow records, and configuration files. Of course it is still a bad day when you get the call from your pal the Special Agent. Of course Applied Network Security Analysis cannot magically make you omniscient, but how you investigate breaches changes. You still start with the logs on the perimeter server and identify the device that served as the attacker’s initial foothold. But you’ve implemented the Full Packet Capture Sandwich architecture described in the last post, so you are capturing the network traffic in your DMZ. You proceed to the network analysis console (using the full packet capture stream) and search all the traffic to and from the compromised server. Most sessions to that server are typical – standard application traffic. But you find some reconnaissance, and then something pretty strange: an executable injected into the server via faulty field validation on the web app (Damn, Developers!). Okay, this confirms the first point of exploit.
Next we go to the target (keeping in mind what data was compromised) and do a similar analysis. Again, with our full packet capture sandwich in place, we captured traffic to/from the database server as well. As in the log-only scenario, we pinpoint the time period when logging was turned off, then perform a search in our analysis console to figure out what happened during that 5-minute period on that segment. Yep, a privileged account turned off logging on the database server and added an admin account to the database. Awesome. Using that account, the attacker dumped the database table and moved the data to a staging server elsewhere on your network. Now you know which data was taken, but how?
You aren’t capturing all the traffic on your network (infeasible), so you have some blind spots, but with your additional data sources you are able to pinpoint the attack path. The NetFlow records coming from the compromised database server show the path to the staging server. The configuration records from the staging server indicate what executables were installed, which enabled the attacker to package and encrypt the payload for exfiltration.
Further analysis of the NetFlow data shows the exfiltration, presumably to yet another staging server on another compromised network elsewhere. It’s not perfect, because you are figuring out what already happened. But now you can get back to your FBI buddy with a lot more information about what tactics the attacker used, and maybe even evidence that might be helpful in prosecution.
Can’t Everyone Get Along?
Clearly this is a simplified scenario that perfectly demonstrates the need to collect additional data sources to isolate the root cause and attack path of any compromise. The real world is rarely so tidy. Our point here is that no one data source stands alone. We aren’t claiming that logs are not important – they certainly are. As are the full packet capture stream, the configuration files, the NetFlow records, and a bunch of other stuff. By harnessing all this data you can more effectively figure out what happened, contain the damage, and figure out how to properly remediate.
Security tools (and security people, for that matter) aren’t very good at sharing. But the whole industry needs to get collectively better at it. Bidirectional integration between the SIEM/Log Manager, Full Packet Capture gear, Network Behavioral Analysis, and Configuration Tracking products makes all the tools more powerful, and enable us to take better advantage of our data. The security team needs to figure out which tool and repository will be primary and use the other tools as needed, but without the ability to share data and build alerts leveraging many of these data sources, the job of a security professional gets much harder.
This is a good segue to our next post, where we’ll discuss how to leverage these additional data sources to provide advanced security alerting goodness. The forensics capability is great, but as our contrived scenario showed, it’s already too late and your data is gone. The next step is to figure out how to shorten that window between attack and detection.