Building an Early Warning System: Internal Data Collection and Baselining
Now that we have provided the reasons you need to start thinking about an Early Warning System, and a high-level idea of the process involved, it’s time to dig into the different parts of the process. Third-party intelligence, which we’ll discuss in the next post, will tell you what kinds of attacks you are more likely to see, based on what else is happening in the world. But monitoring your own environment and looking for variation from normal activity tell you whether those attacks actually ARE hitting you. Internal Data Collection The process starts with collecting data from your internal sources for analysis. Most of you already have data aggregated in a log management environment because compliance has been mandating log management for years. More advanced organizations may have a Security Operations Center (SOC) leveraging a SIEM platform to do more security-oriented correlation and forensics to pinpoint and investigate attacks. Either way, you are likely collecting data which will provide the basis for the internal side of your EWS. Let’s take a quick look at the kinds of data you are likely already collecting and their relevance to the EWS: Network Security Devices: Your firewalls and IPS devices generate huge logs of what’s blocked, what’s not, and which rules are effective. The EWS will be matching attack patterns and traffic to what is known about other attacks, so recognizing port/protocol/destination combinations, or application identifiers for next-generation firewalls, will be helpful. Identity: Similarly, information about logins, authentication failures, and other identity-related data is useful for matching against attack profiles received from the third-party threat intelligence providers. Application/Database Logs: Application specific logs are generally less relevant, unless they come from standard applications or components likely to be specifically targeted by attackers. Database transaction logs are generally more useful for identifying unusual database transactions – which might represent bulk data removal, injection attempts, or efforts to bring applications down. Database Activity Monitoring (DAM) logs are useful for determinining the patterns of database requests, particularly when monitoring traffic within the database (or on the database server) consumes too many resources. NetFlow: Another data type commonly used in SIEM environments is NetFlow – which provides information on protocols, sources, and destinations for network traffic as it traversing devices. NetFlow records are similar to firewall logs but far smaller, making them more useful for high-speed networks. Network flows can identify lateral movement by attackers, as well as large file transfers. Vulnerability Scans: Scans offer an idea of which devices are vulnerable to specific attacks, which is critical for the EWS to help pinpoint which devices would be potential targets for which attacks. You don’t need to to worry about Windows exploits against Linux servers so this information enables you to focus monitoring, investigations, and workarounds on the devices more likely to be successfully attacked. Configuration Data: The last major security data category is configuration data, which provides information on changes to monitored devices. This is also critical for an EWS, because one of the most important intelligence types identifies specific malware attacks by their signature compromise indications. Matching these indicators against your configuration database enables you to detect successful (and even better, in-progress) attacks on devices in your environment. After figuring out which data you will collect, you need to decide where to put it. That means selecting a platform for your Early Warning System. You already have a SIEM/Log Management offering, so that’s one possibility. You also likely have a vulnerability management platform, so that’s another choice. We are not religious about which technology gets the nod, but a few capabilities are essential for an EWS. Let’s not put the cart before the horse, though – we don’t yet have enough context on other aspects of the process to understand which platform(s) might make sense. So we will defer the platform decision until later in this series. Baseline Once the data is collected, before it is useful to the EWS you need to define normal. As we mentioned earlier, ‘normal’ does not necessarily mean secure. If you are anything like almost every other enterprise, you likely have malware and compromised devices on your network already. Sorry to burst your bubble. You need to identify indications of something different. Something that could represent an attack, an outbreak, or an exfiltration attempt. It might be another false positive, or it could represent a new normal to accept, but either way the baseline will need to adapt and evolve. Let’s highlight a simple process for building a baseline: Pick data source(s): Start by picking a single data source and collect some data. Then determine the ranges you see within the data set. As an example we will use firewall logs. You typically have the type of traffic (ports, protocols, applications, etc.), the destination IP address, the time of day, and whether the packet was blocked, from the log. You can pick numerous data sources and do sophisticated data mining, but we will keep it simple for illustration purposes. Monitor the patterns: Then collect traffic for a while, typically a few days to a week, and then start analyzing it. Get your inner statistician on and start calculating averages, means, medians, and frequencies for your data set. In our example you might determine that 15% of your inbound web traffic during lunchtime is SSL destined for your shipping cart application. Define the initial thresholds: From the initial patterns you can set thresholds, outside which traffic indicate a potential problem. Maybe you set the initial thresholds 2 standard deviations above the mean for a traffic type. You look at the medians and means to figure out which initial threshold makes sense. You don’t need to be precise with the initial threshold – you don’t yet have enough data or knowledge to know what represents an attack – but to give you a place to start. Getting back to our firewall example, a spike in outbound SSL traffic spike to 30% might indicate an exfiltration. Or it could indicate a bunch