The forensics use case we discussed previously is about taking a look at something that already happened. You presume the data is already lost, the horse is out of the barn, and Pandora’s Box is open. But what if we tried to look at some of these additional data types in terms of making security alerts better, with the clear goal of reducing the window between exploit and detection: reacting faster?

Can we leverage something like network full packet capture to learn sooner when something is amiss and to improve security? Yes, but this presents many of the same challenges as using log-based analysis to detect what is going on. You still need to know what you are looking for, and an analysis engine that can not only correlate behavior across multiple types of logs, but also analyze a massive amount of network traffic for signs of attack.

So when we made the point in Collection and Analysis that these Network Security Analysis platforms need to be better SIEMs than a SIEM, this is what we were talking about.

Pattern Matching and Correlation

Assuming that you are collecting some of these additional data sources, the next step is to turn said data into actionable information, which means some kind of alerting and correlation. We need to be careful when using the ‘C’ word (correlation), given the nightmare most organizations have when they try to correlate data on SIEM platforms. Unfortunately the job doesn’t get any easier when extending the data types to include network traffic, network flow records, etc. So we continue to advocate a realistic and incremental approach to analysis. Much of this approach was presented (in gory detail) in our Network Security Operations Quant project.

  • Identify high-value data: This is key – you probably cannot collect from every network, nor should you. So figure out the highest profile targets and starting with them.
  • Build a realistic threat model: Next put on your hacker hat and build a threat model for how you’d attack that high value data. It won’t be comprehensive but that’s okay. You need to start somewhere. Figure out how you would attack the data if you needed to.
  • Enumerate those threats in the tool: With the threat models, design rules to trigger based on the specific attacks you are looking for.
  • Refine the rules and thresholds: The only thing we can know for certain is that your rules will be wrong. So you will go through a tuning process to hone in on the types of attacks you are looking for.
  • Wash, rinse, repeat: Add another target or threat and build more rules as above.

With the additional traffic analysis you can look for specific attacks. Whether it’s looking for known malware (which we will talk about in the next post), traffic destined for a known command and control network, or tracking a buffer overflow targeted at an application residing in the DMZ, you get a lot more precision in refining rules to identify what you are looking for. Done correctly this reduces false positives and helps to zero in on specific attacks.

Of course the magic words are “done correctly”. It is essential to build the rule base incrementally – test the rules and keep refining the alerting thresholds – especially given the more granular attacks you can look for.


The other key aspect of leveraging this broader data collection capability is understanding how baselines change from what you may be used to with SIEM. Using logs (or more likely NetFlow), you can get a feel for what is normal behavior and use that to kickstart your rule building. Basically, you assume what is happening when you first implement the system is what should be happening, and alert if something varies too far from that normal. That’s not actually a safe assumption but you need to start somewhere.

As with correlation this process is incremental. Your baselines will be wrong when you start, and you adjust them over time based with operational experience responding to alerts. But the most important step is the start, and baselines help to get things going.

Revisiting the Scenario

Getting back to the scenario presented in the Forensics use case, how would some of this more pseudo-real-time analysis help reduce the window between attack and detection? To recap that scenario briefly, a friend at the FBI informed you that some of your customer data showed up as part of a cybercrime investigation. Of course by the time you get that call it is too late. The forensic analysis revealed an injection attack enabled by faulty field validation on a public-facing web app.

If you were looking at network full packet capture, you might find that attack by creating a rule to look for executables entered into the form fields of POST transactions, or some other characteristic signature of the attack. Since you are capturing the traffic on the key database segment, you could establish a content rule looking for content strings you know are important (as a poor man’s DLP), and alert when you see that type of data being sent anywhere but the application servers that should have access to it. You could also, for instance, set up alerts on seeing an encrypted RAR file on an egress network path. There are multiple places you could detect the attack if you know what to look for.

Of course that example is contrived and depends on your ability to predict the future, figuring out the vectors before the attack hits. But at lot of this discipline is based on a basic concept: “Fool me once, shame on you. Fool me twice, shame on me.” Once you have seen this kind of attack – especially if it succeeds – make sure it doesn’t work again. It’s a bit of solving yesterday’s problems tomorrow, but many security attacks use very similar tactics. So if you can enumerate a specific attack vector based on what you saw, there is an excellent chance that you will have another opportunity to recognize and block that attack again in the future. So it is worth looking, and using other controls to protect against attacks you haven’t seen before, as much as you can.

Next we will tackle the malware analysis use case. Traditional security defenses are mostly blind to malware until it makes it’s way to the endpoint (or server) – but by then it’s likely too late, given the capabilities of today’s anti-malware tools. So we will talk about detecting malware at the edge of the network.