The key to dealing with advanced attackers is not closing off every window of vulnerability. As we have discussed throughout this series, advanced attackers will figure out a way to gain a foothold in your environment. Actually they will find multiple ways into your environment. So if you hope for any semblance of success, your goal cannot be to stop them – instead you need to work on shorteneing the window between compromise and detection. We have called that Reacting Faster and Better for years. 5 years to be exact, but who’s counting?
The general concept is that you want to monitor your environment, gathering key security information that can either identify typical attack patterns as they are happening (yes, a SIEM-like capability), or more likely searching for indicators identified via intelligence activities.
Collecting All the Security Data
We say “all the security data” a bit tongue-in-cheek, but not too much. We have been saying Monitor Everything almost as long as we have been talking about Reacting Faster, because if you fail to collect data you won’t have an opportunity to get it later. Unfortunately most organizations don’t realize their security data collection leaves huge gaps until the high-priced forensics folks let you know they can’t truly isolate the attack, or the perpetrator, or the malware, or much of anything, because you just don’t have the data.
Most folks only need to learn that lesson once.
So the first order of business is to lay down a collection infrastructure to store all your security data. The good news is that you have likely been collecting security data for quite some time, and your existing investment and infrastructure should be directly useful for dealing with advanced attackers. This means existing log management system may be useful after all. But perhaps not – you might have tools that aren’t at all suited to helping you find advanced attackers in your midst. One step at a time – now let’s delve into the data you need to collect.
- Network Security Devices: Your firewalls and IPS devices generate huge logs of what’s blocked, what’s not, and which rules are effective. You will receive intelligence that typically involves port/protocol/destination combinations or application identifiers for next-generation firewalls, which can identify potential attack traffic.
- Configuration Data: One key area to mine for indicators is the configuration data from your devices. It enables you to look for very specific files and/or configurations that have been identified as indicators of compromise.
- Identity: Similarly information about logins, authentication failures, and other identity-related data is useful for matching against attack profiles from third-party threat intelligence providers.
- NetFlow: This is another data type commonly used in SIEM environments; it provides information on protocols, sources, and destinations for network traffic as it traverses devices. NetFlow records are similar to firewall logs but far smaller, making them more useful for high-speed networks. Flows can identify lateral movement by attackers, as well as large exfiltration file transfers.
- Network Packet Capture: The next frontier for security data collection is actually to capture all network traffic on key segments. Forensics folks have been doing this for years during investigations, but proactive continuous full packet capture – for the inevitable incident responses which haven’t even started yet – is still an early market. For more detail on how full packet capture impacts security operations check out our Network Security Analytics research.
- Application/Database Logs: Application and database logs are generally less relevant, unless they come from standard applications or components likely to be specifically targeted by attackers. But you might be able to discover unusual application and/or database transactions – which might represent bulk data removal, injection attempts, or efforts to attack your critical data.
- Vulnerability Scans: This is another information source with limited value, detailing which devices are vulnerable to specific attacks. They help eliminate devices from your search criteria to streamline search activities.
Of course this isn’t an exhaustive list, and you are likely already capturing much of this data. That’s a good thing, but capturing and analyzing data within the context of a compliance audit is fundamentally different than trying to detect advanced attacker activity.
We are sticking to the CISO view for this series so we won’t dig into the technical nuances of the collection infrastructure. But they must be built on a strong analytical foundation which provides a threat-centric view of the world rather than one a focused on compliance reporting. More advanced organizations may already have a Security Operations Center (SOC) leveraging a SIEM platform for more security-oriented correlation and forensics to pinpoint and investigate attacks. That’s a start, but you will likely require some kind of Big Data thing, which should be clear after we discuss what we need this detection platform to do.
Attack Patterns FTW
As much as we have talked about the futility of blocking every advanced attack, that doesn’t mean we shouldn’t learn from both the past and the misfortune of others. We spent a time early in this process on sizing up the adversary for some insight into what is likely to be attacked, and perhaps even how. That enables you to look for those attack patterns within your security data – the promise of SIEM technology for years.
The ultimate disconnect with SIEM was the hard truth that you needed to know what you were looking for. Far too many vendors forgot to mention that little requirement when selling you a bill of goods. Perhaps they expected attackers to post their plans on Facebook or something? But once you do the work to model the likely attacks on your key information, and then enumerate those attack patterns in your tool, you can get tremendous value. Just don’t expect it to be fully automated.
The best case is that you receive an alert about a very likely attack because it’s something you were looking for. But the quickest way to get killed is to plan for the best case. So we also need to ensure we are ready for the worst case. That is advanced attackers using attacks you haven’t seen before, in ways you don’t expect. That’s when all of your gathered intelligence comes into play.
Mining for Indicator Gold
We have already listed a number of different threat intelligence feeds, which can be used to search for specific malware files, command and control traffic, DNS request patterns, and a variety of other indicators. Mining for indicators isn’t that much different from early gold prospecting. They were trying to find gold among millions of rocks moving down the stream. Their main tool was a metal strainer – a filter.
The advantage that we have today in security is that we can tune our filters to search through billions of ‘rocks’ at a pretty fast clip. So you can search your security data infrastructure for almost anything you are collecting – or even better, for a series of events and/or files within your environment – quickly and accurately to narrow down your searches to the most likely attacks.
Which brings us to the overhyped use of Big Data Analytics in the process of mining for attack indicators. Not that some of the hype isn’t justified. The use of tools to more effectively index and search huge security data sets is critical to finding advanced attackers quickly. But like SIEM, its predecessor technology, Big Data vendors are especially prone to hyperbole about the short-term value of their security analytics platforms.
Keeping this discussion at a high level, we recently summed up how Big Data will impact security analytics:
We have every confidence that big data holds promise for security intelligence, both because we have witnessed attacker behavior captured in event data just waiting to be pulled out, and because we have also seen miraculous ideas sprout from people just playing around with database queries. In the same way hackers stumble on vulnerabilities while playing with protocols, security analysts stumble on interesting data just by asking the same question (query) different ways. The data holds promise. The mining of most data, and all of the work that will be required in writing M-R (MapReduce) scripts to locate actionable intelligence, is not yet here. It will take years of dedicated work – and it’s will take script development on different data types for different NoSQL varieties.
In other words it still early days for this technology to solve these problems. You are clearly constrained in terms of internal capabilities (you will be looking for a lot of data scientists over the next few years), as well as the lack of maturity of technologies such as Hadoop, MapReduce, Pig, Hive, and a variety of others in the security context. So remain skeptical about promises that a magic box that will ingest scads of security data and pop out advanced attackers.
But companies seriously looking to detect advanced attackers within their environments will be capturing packets to supplement the other data they already collect, and subsequently starting to use Big Data technologies to mine it all. Sounds easy, right? Unfortunately it is thankless work, so make sure you swing by the cubes of your forensics folks to give them a big thank-you. They spend a lot of time chasing down false positives, all for the occasional times they find an active attack. That brings us to the next step in finding advanced attackers: verification of the attack.
Photo credit: “There’s Gold In Them There Hills!” originally uploaded by Podknox
Reader interactions
4 Replies to “The CISO’s Guide to Advanced Attackers: Mining for Indicators”
It looks like your link in item #5 goes to a blank page
Mike, thanks for your reply. Forgive me, but I can feel a incoherent rant is about to flow through my fingers.
One of the main obstacles I face at this organisation is a judicial requirement to archive all collected data for 5 years. Collecting everything is a huge problem storage wise, as you can probably imagine; especially when the client installation is around 20k computers, that is also excluding the hundreds upon hundreds of physical and virtual servers.
Even if we did continue to collect everything and kept it for say, a week, maybe two. What good does that do? Looking at breach reports such as DBIR intrusions aren’t discovered until months after initial compromise. The two weeks worth of stored data will be mostly useless?
Perhaps if the security program is extremely effective at discovery, then maybe two weeks is enough. Maybe I’m naive but does such an organisation even exist? Those that do collect everything, how do they measure the effectiveness? What attack patterns have they defined and which can they actually discover?
Guess I’m simply not convinced that we should collect everything if we can’t even establish what we are looking for. Then again, perhaps tools have matured to the point where they really are amazing at discovering anomalies, but well… can’t shake the feeling of wanting more human intelligence in the process.
Really wish I could vent this in person rather than this fairly ineffective message board!
Any who, I’ve been reading the blog for about 2 years now, and I read almost every post. Commenting is a rarity, but I guess one should, if anything, leave a message highlighting the fact that one has enjoyed the content.
How about implementing a feature such as Flattr to show some appreciation!
All good @Christoffer. The points you make are relevant here. We favor more data, rather than less data, simply because you can’t get it later. Obviously if the customer doesn’t have the ability to analyze much data at all, then more data isn’t going to help them. But that will really impact the ability of any third party forensics folks to do their job, when the inevitable breach happens.
There are tons of folks that actually collect EVERYTHING. They use full network packet capture on their key network segments and have the ability to parse through that when needed. They typically don’t keep more than a week’s worth of traffic, but if the security program is effective, that’s usually enough.
What you suggest is truly a retroactive view of the world. Not that it doesn’t work in certain instances, but it won’t give you much of a chance against advanced attackers. Though I guess the maturity level of the folks your referring to puts them clearly in the not very sophisticated tier of practitioners.
Go back and read our original Understand and Selecting SIEM paper. That goes through the process of setting up the threat models, which becomes the rule base for the data collection. That’s as good a place as any to start.
But obviously dealing with advanced attackers is another ballgame altogether.
Mike.
Apologies if this comment should have been associated with another of your often excellent posts.
I’m working with a client that is attempting to establish a capability for the discovery of intrusions. They’ve followed the principle of collecting everything, and well, failing pretty badly. Not only are they already drowning in data, they have absolutely no idea as to what they should be looking for. Say what you want, but tools aren’t all that magical…
My take on this whole debacle have been to actually reverse the collect everything principle. Instead I have suggested that they attempt to establish indicators of a potential compromise. For these indicators I’ve suggested they begin their data collection.
I guess what I’m trying to get at here is some kind of advice. What are your respective experiences with collecting security data? What has worked, what has not? I’ve always maintained that those who suggest collecting everything haven’t really done this for real before.
But, I’m afraid of attaching myself to ideals and ideas of the previous century and I guess the world have changed and perhaps now it really is possible to collect everything and magically discover anomalies and other indicators that may suggest a compromise.
Would be very much interested in learning more from you guys, if you’ve got time to flesh out some of your own experiences.