As we discussed in Industrial Phishing Tactics, phishing is a precursor to many attacks in the wild. Phishing attacks are designed to get victims to click something, then to share the victim’s account credentials and download malware; and of course they leave a trail like everything else. Following that trail can help you prioritize remediation activities, identify adversaries, and ultimately take action to protect both your environment and your customers. But first you must be able to analyze the email to identify the patterns to look for. And that requires a lot of email – a whole lot. Sampling all the phish in the sea Email-based threat intelligence entails analyzing scads of spam emails using Big Data Analytics. You didn’t think we’d be able to resist that buzzword, did you? Of course not! But whether you call it big data or just “a lot of data,” the first step in implementing an email-based threat intelligence program is to aggregate as much email as you can. Which brings up a reasonable question: where do you get that kind of email? If you are a private enterprise it can be hard. There are various spam archives available on the Internet (Google is your friend), but not many fresh ones. Alternatively, you could establish partnerships with email service providers, who tend to have millions of blocked emails from internal spam filters lying around. Another source would be other consumer brands – perhaps some of them would be willing to swap. You give them a copy your spam mailbox in exchange for theirs. Besides the email addresses, there isn’t likely to be sensitive data in obvious spam, so this shouldn’t trip the general security aversion to sharing data. But you will likely need an intelligence feed from a third-party analysis provider. As discussed in both the Early Warning System and Network-based Threat Intelligence (NBTI) research, we see a market emerging for intelligence providers specializing in aggregating and analyzing these data sources. They provide intelligence that can be used by enterprises to shorten the window between attack and detection. Let’s dig into the kinds of intelligence we are looking for in phishing emails, and get back to the metaphor we introduced in the NBTI paper: the Who, What, Where, and When of phishing. Who? Establishing the ‘who’ behind phishing is probably the most important intelligence you can receive. Because a select few highly effective phishers (hundreds, not thousands) are behind many of the attacks you will see in the wild. So the ability to identify the author of attacks can yield all sorts of information, enabling you to profile and analyze your adversaries. Why is adversary analysis important? Motive: Is the phish part of a targeted attack (spear phishing)? Is it part of a widespread attack on a financial institution to harvest account information? Is it to steal intellectual property? Knowing your adversary allows you to determine his/her motives, and thus to more effectively judge the true threat of the attack to your organization. Tactics: Does this phisher use malware extensively? Do they just harvest account information? Is key logging their main technique? Understanding and profiling the adversary can indicate which controls to be implement to ensure protection. Also keep in mind that the ultimate target of the phishing attack is usually your customers, rather than your employees. So this helps you decide whether helping customers protect themselves is a worthwhile expense for you. Capabilities: Finally, isolating your adversary and tracking them over time (as discussed below) provides clues to their capabilities. Do they rely purely on commercial phishing kits? Are they able to package 0-day attacks? Is it something in the middle? The more you know about the attackers, the more effectively you can make decisions about how to react. The ultimate objective of adversary analysis is to more effectively prioritize remediation activities. Knowing who you are dealing with and what they are capable of is key, and can help you determine the urgency of response. What? So how do you find a specific attacker within a corpus of millions of spam and phishing messages? It all comes back to profiles. The links embedded in the phishing messages indicate the locations of phishing sites, and you will see patterns in the domains and IP addresses used in attacks. Working backwards you can analyze the phishing site to determine the attack(s) in use, the tactics and capabilities used, and if you get lucky perhaps the attacker’s identity. This next level of analysis involves looking at ‘what’ the attack does. A key development that made phishing far more accessible to unsophisticated attackers was phishing attack kits. These kits provide everything an attacker needs to launch a phishing campaign – including images, email copy, and specific tactics to capture account credentials. Of course phishing messages still need to evade an organization’s spam filters, but that tends to be reasonably straightforward given that phishing messages should look exactly like legitimate messages. That takes most of the sophisticated content analysis done by anti-spam filters out of play. But these kits leave a trail, in the form of the source code used to install the kit on a compromised server. If you get an actual phishing kit you can analyze it just like any other malware (as discussed ad nauseum in Malware Analysis Quant) for clues to the malware used and what is ultimately done with stolen account credentials – all of which can help identify the attacker’s email. Even better, profiling attack kits enables you to look for similar attack profiles, to identify the attacker far more quickly next time. Given a sufficient corpus of spam and phishing messages, you can mine the data for patterns of IP addresses and domains, to help identify the adversary and assist in identifying appropriate remediations. Where? As described above, phishing messages look like legitimate email, so much of anti-spam filters’ content analysis cannot detect them. But you can (and need to) analyze email headers to figure out where the messages come from, the path they take to your gateway, and for clues in links to phishing sites. That brings