Building an Early Warning System: External Threat Feeds
So far we have talked about the need for Early Warning and the Early Warning Process to set the stage for the details. We started with the internal side of the equation, gaining awareness of your environment via internal data collection and baselining. This is a great beginning, but still puts you in a reactive mode. Even if you can detect an anomaly in your environment – it’s already happened and you may be too late to prevent data loss.
The next step for Early Warning is to look outside your own environment to figure out what’s happening externally. Leverage external threat intelligence for a sense of current attacks, and get an idea of the patterns you should be looking for in your internal data feeds. Of course these threat feeds aren’t a fancy crystal ball that will tell you about an attack before it happens. The attack has already happened, but not to you. We have never bought the idea that you can get ahead of an attack without a time machine. But you can become aware of an attack in the wild before it’s aimed at you, to ensure you are protected against it.
Types of threat intelligence
There are many different types of threat intelligence, and we are likely to see more emerge as the hype machine engages. Let’s quickly review the kinds of intel at your disposal and how they can help with the Early Warning process.
Threats and Malware
Malware analysis is maturing rapidly, and it is becoming commonplace to quickly and thoroughly understand exactly what a malicious code sample does and how to identify it’s behavioral indicators. We described this process in details in Malware Analysis Quant. For now, suffice it to say you aren’t looking for a specific file – but rather indicators that a file did something to a device. Fortunately a number of third parties have built information services that provide data on specific pieces of malware. You can get an analysis based on a hash of the malware file, or upload a file if it hasn’t been seen before. Then the service runs the malware through a sandbox to figure out what it does, profile it, and deliver that data back to you.
What do you do with indicators of compromise? Search your environment for evidence that the malware has executed in your environment. Obviously that requires a significant and intrusive search of the configuration files, executables, and registry settings on each device, which typically requires some kind of endpoint forensics agent on each device. If that kind of access is available, then malware intelligence can provide a smoking gun for identification of compromised devices.
Most folks never see the feed of new vulnerabilities that show up on a weekly or daily basis. Each scanner vendor updates their products behind the scenes and uses the most current updates to figure out whether devices are vulnerable to each new attack. But the ability to detect a new attack is directly related to how often the devices get scanned. A slightly different approach involves cross-referencing threat data (which attacks are being used) with vulnerability data to identify devices at risk. For example, if weaponized malware emerges that targets a specific vulnerability, it would be extremely useful to have an integrated way to dump out a list of devices that are vulnerable to the attack. Of course you can do this manually by reading threat intelligence and then searching vulnerability scanner output to manually create a list of impacted devices, but will you? Anything that requires additional effort all too often ends up not getting done. That’s why the Early Warning System needs to be driven by a platform integrating all this intelligence, correlating it, and providing actionable information.
Since its emergence as a key data source in the battle against spam, reputation data has rapidly become a component of seemingly every security control. For example, the ability to see an IP address in one of your partner networks is compromised should set off alarms, especially if that partner has a direct connection to your environment. Basically anything can (and should) have a reputation. Devices, IP addressees, URLs, and domains for starters. If you have traffic going to a known bad site, that’s a problem. If one of your devices gets a bad reputation – perhaps as a spam relay or DoS attacker – you want to know ASAP.
One specialization of reputation emerging as a separate intelligence feed is botnet intelligence. These feeds track command and control traffic globally and use that information to pinpoint malware originators, botnet controllers, and other IP address and sites your devices should avoid. Integrating this kind of feed with a firewall or web filter could prevent exfiltration traffic or communications with a controller, and identify an active bot. Factoring this kind of data into the Early Warning System enables you to use evidence of bad behavior to prioritize remediation activities.
It would be good to get a heads up if a hacktivist group targets your organization, or a band of pirates is stealing your copyrights, so a number of services have emerged to track mentions of companies on the Internet and infer deduce they are good or bad. Copyright violations, brand squatters, and all sorts of other shenanigans can be tracked and trigger alerts to your organization, hopefully before extensive damage is done. How does this help with Early Warning? If your organization is a target, you are likely to see several different attack vectors. Think of these services as providing the information to go from DEFCON 5 to DEFCON 3, which might involve tightening the thresholds on your other intelligence feeds and monitoring sources in preparation for imminent attack.
Managing the Overlap
With all these disparate data sources, it becomes a significant challenge to make sure you don’t getting the same alerts multiple times. Unless your organization has a money tree in the courtyard, you likely had to rob Peter to pay Paul to just get the first intelligence service in the first place. There isn’t any point in paying for the same stuff twice. The first step in determining overlap is to actually understand how the intelligence vendor gets their data. Do they use honeypots? Do they mine DNS traffic and track new domain registrations? Have they built a cloud-based malware analysis/sandboxing capability? Then you can categorize the intelligence vendors based on their tactics and pick the best.
In order to determine the best service, you’ll need to compare their services for comprehensiveness, timeliness, and accuracy. Yes, we’re talking about a bake-off. Sign up for trials for a number of services, and monitor their feeds for a week or so. Does one provider consistently identify new threats earlier? Is their information correct? Do they provide more detailed and actionable analysis?
Don’t fall for the marketing hyperbole about proprietary algorithms, Big Data analysis, and on-staff linguists penetrating hacker dens and other stories straight out of a spy novel. Ultimately it’s about the data and how useful it is to your Early Warning efforts. There are also more discreet ways to test these offerings. Send them a real malware sample you found. Do they identify it correctly? Quickly? And you can find a blacklisted IP (or a million) to test the reputation services. Buyer beware, and make sure you put each intelligence provider through its paces before you commit.
The good news is that given the level of analysis and resource requirements to manage these intelligence feeds aggregators will emerge sooner than later to take many of these different sources, do Q/A, deduplication, normalization, and validation in order to provide a (hopefully) cleaner feed. Test these aggregated services just like anything else. Put each through its paces and evaluate on the same criteria.
The last point to make here is the importance of short agreements, especially up front. You cannot know how these services will work for you until you actually start using them. Many of these intelligence companies are startups, and may not be around in 3-4 years. Once you identify a set of core intelligence feeds then longer deals can be cut, but we recommend not doing that until your Early Warning process matures and the intelligence vendor establishes a longer term track record.
As cool as this concept seems, we need to reiterate the limitations of threat intelligence. First, increasingly targeted attacks mean you may be facing something that has been custom built for your environment. A third-party feed cannot help with that. We are referring to tactics such as custom phishing emails targeting your CEO or someone in your finance department. If they fall for it and click the link, it’s game over. No threat intelligence service can tell you to monitor “Bob in Finance.”
Similarly, if your organization is targeted by a persistent attacker (you know who you are) you may be facing novel zero-day attacks. No intelligence service will see these before they hit you – they should send you a thank-you note for the brand-new malware sample.
But nobody’s perfect, so at times the intelligence feed will be wrong. They could create false positives that spin you around in circles and wastes precious time. If you are only buying data your downside risk is contained. Threat intelligence errors are more problematic if you have an active control that starts blocking traffic based on a false positive. We have already seen situations where one vendor identifies another vendor’s honeypot as a botnet controller and raises all sorts of alarms. Or where an endpoint protection product has flagged a legitimate application update (Skype) as malware. This is an emerging inexact science. Remember the old adage, trust but verify.
You will hear the term threat intelligence in a variety of contexts. Endpoint protection vendors talk about how their threat intelligence helps them defend against advanced malware; firewall vendors claim their reputation networks help block attackers at the perimeter. Once you stop laughing from those claims, the point is that unless you get a feed of threat information you can integrate into other tools and check against your internal data, the intelligence is useless to you for Early Warnings.
Threat intelligence is great, but it rarely provides a true smoking gun. You need to systematically interpret the intelligence within the context of your environment and situation to determine the appropriate course of action. There is no machine that does that so you need HUMINT (human intelligence) to get that done. The next post will talk about how to separate the signal from the noise with alerts and thresholds, and detail what your human analysts will need to do to apply the Early Warning concept in your environment.