Fact-based Network Security: Outcomes and Operational Data
In our first post on Fact-based Network Security, we talked about the need to make decisions based on data, as opposed to instinct. Then we went in search of the context to know what’s important, because in order to prioritize effectively you need to know what presents the most value to your organization. Now let’s dig a little deeper into the next step, which is determining the operational metrics on which to base decisions. But security metrics can be a slippery slope. First let’s draw a distinction between outcome-based metrics and operational metrics. Outcomes are the issues central to business performance, and as such are both visible and important to senior management. Examples may include uptime/availability, incidents, disclosures, etc. Basically, outcomes are the end results of your efforts. Where you are trying to get to, or stay away from (for negative outcomes). We recommend you start by establishing some goals for improvement of these outcomes. This gives you an idea of what you are trying to achieve and defines success. To illustrate this we can examine availability as an outcome – it’s never bad to improve availability of key business systems. Of course we are simplifying a bit – availability consists of more than just security. But we can think about availability in the context of security, and count issues/downtimes due to security problems. Obviously many types of activities impact availability. Device configuration changes can cause downtime. So can vulnerabilities that result in successful attacks. Don’t forget application problems that may cause performance anomalies. Traffic spikes (perhaps resulting from a DDoS) can also take down business systems. Even seemingly harmless changes to a routing table can open up an attack path from external networks. That’s just scratching the surface. The good news is that you can leverage operational data to isolate the root causes of these issues. What kinds of operational data do we need? Configuration data: Tracking configurations of network and security devices can yield important information about attack paths through your network and/or exploitable services running on these devices. Change information: Understanding when changes and/or patches take place helps isolate when devices need to be checked or scanned again to ensure new issues have not been not introduced. Vulnerabilities: Figuring out the soft spots of any device can yield valuable information about possible attacks. Network traffic: Keeping track of who is communicating with whom can help baseline an environment, which is important for detecting anomalous traffic and deciding whether it requires investigation. Obviously as you go deeper into the data center, applications, and even endpoints, there is much more operational data that can be gathered and analyzed. But remember the goal. You need to answer the core question of “what to do first,” establishing priorities among a infinite number of possible activities. We want to focus efforts on the activities that will yield the biggest favorable impact on security posture. A simple structure for this comes from the Securosis Data Breach Triangle. In order to have a breach, you need data that someone wants, an exploit to expose that data, and an egress path to exfiltrate it. If you break any leg of the triangle, you prevent a successful breach. Data (Attack Path) If the attacker can’t see the data, they can’t steal it, right? So we can focus some of our efforts on ensuring direct attack paths don’t make it easy for an attacker to access the data they want. Since you know your most critical business systems and their associated assets, you can watch to make sure attack paths don’t develop which expose this data. How? Start with proper network segmentation to separate important data from unauthorized people, systems, and applications. Then constantly monitor your network and security devices to ensure attack paths don’t put your systems at risk. Operational data such as router and firewall configurations is a key source for this analysis. You can also leverage network maps and ongoing discovery activities to check for new paths. Any time there is a change to a firewall setting or a network device, revisit your attack path analysis. That way you ensure there’s no ripple effect from a change that opens an exposure. Think of it as regression testing for network changes. Given the complexity of most enterprise-class networks, this isn’t something you can do manually, and it’s most effective in a visual context. Yes, in this case a picture is worth a million log records. A class of analysis tools has emerged to address this. Some look at firewall and network configurations to build and display a topology of your network. These tools constantly discover new devices and keep the topology up to date. We also see evolution of automated penetration testing tools, which focus on continuously trying to find attack paths to critical data, without requiring a human operator. There is no lack of technology to help model and track attack paths. Regardless of the technology you select to analyze the attack paths, this is key to understanding what to fix first. If a direct path to important data results from a configuration change, you know what to do (roll it back!). Likewise, if a rogue access point emerges on a critical network (with a direct path to important data), you need to get rid of it. These are the kind of activities that make an impact and need to be prioritized. Exploit Even if an attack path exists, it may not be practical to exploit the target device. This is where server configuration, as well as patch and vulnerability monitoring, are very useful. Changes that happen outside of authorized maintenance windows tend to be suspicious, especially on devices either containing or providing access to important data. Likewise, the presence of an exploitable critical vulnerability should bubble to the top of the priority list. Again, if there is no attack path to the vulnerable device, the priority of fixing the issue is reduced. But overall you must track what needs to be fixed on