Network Security Fundamentals: Looking for Not Normal
To state the obvious (as I tend to do), we all have too much to protect. No one gets through their list every day, which means perhaps the most critical skill for any professional is the ability to prioritize. We’ve got to focus on the issues that present the most significant risk to the organization (whatever you mean by risk) and act accordingly. I have’t explicitly said it, but the key to network security fundamentals is figuring out how to prioritize. And to be clear, though I’m specifically talking about network security in this series, the tactics discussed can (and need to) be applied to all the other security domains. To recap how the fundamentals enable this prioritization, first we talked about implementing default deny on your perimeter. Next we discussed monitoring everything to provide a foundation of data for analysis. In the last post, correlation was presented to start analyzing that data. By the way, I agree with Adrian, who is annoyed with having to do correlation at all. But it is what it is, and maybe someday we’ll get all the context we need to make a decision based on log data, but we certainly can’t wait for that. So to the degree you do correlate, you need to do it effectively. Pattern Matching Going hand in hand with prioritization is the ability to match patterns. Most of the good security folks out there do this naturally, in terms of consuming a number of data points, understanding how they fit together, and then making a decision about what that means, how it will change things and what action is required. The patterns help you to understand what you need to focus on at any given time. The first fundamental step in matching patterns involves knowing your current state. Let’s call that the baseline. The baseline gives you perspective on what is happening in your environment. The good news is that a “monitor everything” approach gives you sufficient data to establish the baseline. Let’s just take a few examples of typical data types and what their baselines look like: Firewall Logs: You’ll see attacks in the firewall logs, so your baseline consists of the normal number/frequency of attacks, time distribution, and origin. So if all of a sudden you are attacked at a different time from a different place, or much more often than normal, it’s time to investigate. Network Flows: Network flows show network traffic dynamics on key segments, so your baseline tells you which devices communicate with which other devices – both internal and external to your network. So if you suddenly start seeing a lot of flow from an internal device (on a sensitive network) to an external FTP site, it could be trouble. Device Configurations: If a security device is compromised, there will usually be some type of configuration and/or policy change. The baseline in this case is the last known good configuration. If something changes, and it’s not authorized or in the change log, that’s a problem. Again, these examples are not meant to be exhaustive or comprehensive, just to give an idea about the types of data you are already collecting and what the baseline could look like. Next you set up the set of initial alerts to detect attacks that you deem important. Each management console for every device (or class of devices) gives you the ability to set alerts. There is leverage in aggregating all this data (see the correlation post), but it’s not necessary. Now I’ll get back to something discussed in the correlation post, and that’s the importance of planning your use cases before implementing your alerts. You need to rely on those thresholds to tell you when something is wrong. Over time, you tune the thresholds to refine how and when you get alerted. Don’t expect this tuning process to go quickly or easily. Getting this right really is an art, and you’ll need to iterate for a while to get there – think months, not days. You can’t look for everything, so the use cases need to cover the main data sources you collect and set appropriate alerts for when something is outside normal parameters. I call this looking for not normal, and yes it’s really anomaly detection. But most folks don’t think favorably of the term “anomaly detection”, so I use it sparingly. Learning from Mistakes You can learn something is wrong in a number of ways. Optimally, you get an alert from one of your management consoles. But that is not always the case. Perhaps your users tell you something is wrong. Or (worst case) a third party informs you of an issue. How you learn you’ve been pwned is less important than what you do once you are pwned. Once you jump into action, you’re looking at the logs, jumping into management consoles, and isolating the issues. How quickly you identify the root cause has everything to with the data you collect, and how effectively you analyze it. We’ll talk more about incident response later this year, but suffice it to say your only job is to contain the damage and remediate the problem. Once the crisis ends, it’s time to learn from experience. The key, in terms of “looking for not normal”, is to make sure it doesn’t happen again. The attackers do their jobs well and you will be compromised at some point. Make sure they don’t get you the same way twice. The old adage, “Fool me once, shame on you – fool me twice, shame on me,” is very true. So part of the post-mortem process is to define what happened, but also to look for that pattern again. Remember that attackers are fairly predictable. Like the direct marketers that fill your mailbox with crap every holiday season, if something works, they’ll keep doing it. Thus, when you see an attack, you’ll need to expect to see it again. Build another set of rules/policies to make sure that same attack is detected quickly and accurately. Yes, I know this is a black list mindset, and there are limitations to this approach since you