In the introduction to our Applied Network Security Analysis series, we talked about monitoring everything and the limitations of a log-centric data collection approach, in our battle to improve security operational processes. Now let’s dig in a little deeper and understand what kind of data collection foundation makes sense, given the types of analysis we need to deal with our adversaries.
Let’s define the critical data types for our analysis. First are the foundational elements, which were covered ad nauseum in our Monitoring Up the Stack paper. These include event logs from the network, security, databases, and applications. We have already pointed out that log data is not enough, but you still need it. The logs provide a historical view of what happened, as well as the basis for the rule base needed for actionable alerts. Next we’ll want to add additional data commonly used by SIEM devices – that includes network flows, configuration data, and some identity information. These additional data types provide increased context to detect patterns of potential badness. But this is not enough – we need to look beyond these data types for more detail.
Full Packet Capture
As we wrote in the React Faster and Better paper:
One emerging advanced monitoring capability – the most interesting to us – is full packet capture. These devices basically capture all traffic on a given network segment. Why? The only way you can really piece together exactly what happened is to use the actual traffic. In a forensic investigation this is absolutely crucial, providing detail you cannot get from log records.
Going back to a concept we call the Data Breach Triangle, you need three components for a real breach: an attack vector, something to steal, and a way to exfiltrate it. It’s impossible to stop all potential attacks, and you can’t simply delete all your data, so we advocate heavy perimeter egress filtering and monitoring, to (hopefully) prevent valuable data from escaping your network.
So why is having the packet stream so important? It is a critical facet of heavy perimeter monitoring. The full stream can provide a smoking gun for an actual breach, showing whether data actually left the organization, and which data. If you look at ingress traffic, the network capture enables you to pinpoint the specific attack vector(s) as well. We will discuss both these use cases, and more, in additional detail later in this series, but for now it’s enough to say that full network packet capture data is the cornerstone of Applied Network Security Analysis.
Intelligence and Context
Two additional data sources bear mentioning: reputation and malware. Both these data types provide extra context to understand what is happening on your networks and are invaluable for refining alerts.
- Reputation: Wouldn’t it be great if you knew some devices and/or destinations were up to no good? If you could infer some intent from just an IP address or other identifying characteristics? Well you can, at least a bit. By leveraging some of the services that aggregate data on command and control networks, and on other known bad actors, you can refine your alerts and optimize your packet capture based on behavior, not just on luck. Reputation made a huge difference in both email and web security, and we expect a similar impact on more general network security. This data helps focus monitoring and investigation on areas likely to cause problems.
- Malware samples: A log file won’t tell you that a packet carried a payload with known malware. But samples of known malware are invaluable when scrutinizing traffic as it enters the network, before it has a chance to do any damage. Of course nothing is foolproof, but we are trying to get smarter and optimize our efforts. Recognizing something that looks bad as it enters the network would provide a substantial jump for blocking malware. Especially compared to other folks, whose game is all about cleaning up the messes after they fail to block it.
We will dive into how to leverage these data types by walking through the actual use cases where this data pays dividends later in the series. But for now our point is that more data is better than less, and without building a foundation of data collection analysis is likely futile.
Digesting Massive Amounts of Data
The challenge of collecting and analyzing a multi-gigabit network stream is significant, and each vendor is likely to have its own special sauce to collect, index, and analyze the data stream in real time. We won’t get into specific technologies or approaches – after all, beauty is in the eye of the beholder – but there are a couple things to look for:
- Collection Integrity: A network packet capture system that drops packets isn’t very useful, so the first and foremost requirement is the ability to collect network traffic at your speeds. Given that you are looking to use this data for investigation, it is also important to maintain traffic integrity to prove packets weren’t dropped.
- Purpose-built data store: Unfortunately MySQL won’t get it done as a data store. The rate of insertions required to deal with 10gbps traffic demand something built specifically that purpose. Again, there will be lots of puffery about this data store or that one. Your objective is simply to ensure the platform or product you choose will scale to your needs.
- High-speed indexing: Once you get the data into the store you need to make sense of it. This is where indexing and deriving metadata become critical. Remember this has to happen at wire speeds, is likely to involve identifying applications (like an application-aware firewall or IDS/IPS), and enriching the data with geolocation and/or identity information.
- Scalable storage: Capturing high-speed network traffic demands a lot of storage. And we mean a lot. So you need to calibrate onboard storage against archiving approaches, optimizing the amount of storage on the capture devices based on the number of days of traffic to keep. Keep in mind that the metadata you extract from the traffic doesn’t go away when you roll the network traffic, but you still want to size the system properly.
The collection foundation has to be a better SIEM than SIEM (since you are collecting many of the same data types), effectively an IDS/IPS, and a massive storage array. We don’t see many of these products on the market because of the technical challenge of providing all these capabilities. So do your homework and make sure any technology you look at will scale to your needs.
Nor do we expect full packet capture gear to supplant SIEMs or IDS/IPS gear any time soon. We are just pointing out that the analysis required is pretty similar, but must happen at wire speed with a much broader traffic stream.
Phases of Collection
We understand it’s unlikely that you’ll install this kind of collection infrastructure overnight. As valuable as full network capture data is, we are pragmatists. Nothing says you have to capture everything at the flip of a switch. Realistically you probably can’t, so how do you start? We suggest starting with what we call the Full Packet Capture Sandwich (FPCS).
The FPCS starts by capturing traffic from the perimeter. Most perimeters run at far lower rates than their core networks, so they serve as manageable starting points for traffic capture. And given the number of attacks originating from out there it’s good to track what we can – both coming in and going out. As you are ready, complement perimeter capture with traffic from the most critical internal segments as well. You know – those with the high-value assets (such as transaction systems, databases, intellectual property, etc.) – what attackers go for.
So if you capture data from key internal networks, as well as perimeter traffic (which is why we call this the sandwich), you have a better chance to piece together what happened. Over time you can capture more internal segments to get as broad a sampling of captured data as you can. That said, you can’t collect everything – so you need the ability to capture on an ad hoc basis. Basically it’s a SWAT capture capability, which may mean a portable packet capture rig or some network hocus-pocus to tap critical segments on demand. Either way, investigating something will likely involve packet capture.
Next we will examine an Applied Network Security Analysis forensics use case. This is typically the main driver for the first network packet capture implementation, and thus a key funding lever for investment in this technology. So we will start with forensics before moving on to other interesting use cases.