As we discussed in Industrial Phishing Tactics, phishing is a precursor to many attacks in the wild. Phishing attacks are designed to get victims to click something, then to share the victim’s account credentials and download malware; and of course they leave a trail like everything else. Following that trail can help you prioritize remediation activities, identify adversaries, and ultimately take action to protect both your environment and your customers. But first you must be able to analyze the email to identify the patterns to look for. And that requires a lot of email – a whole lot.

Sampling all the phish in the sea

Email-based threat intelligence entails analyzing scads of spam emails using Big Data Analytics. You didn’t think we’d be able to resist that buzzword, did you? Of course not! But whether you call it big data or just “a lot of data,” the first step in implementing an email-based threat intelligence program is to aggregate as much email as you can. Which brings up a reasonable question: where do you get that kind of email? If you are a private enterprise it can be hard. There are various spam archives available on the Internet (Google is your friend), but not many fresh ones.

Alternatively, you could establish partnerships with email service providers, who tend to have millions of blocked emails from internal spam filters lying around. Another source would be other consumer brands – perhaps some of them would be willing to swap. You give them a copy your spam mailbox in exchange for theirs. Besides the email addresses, there isn’t likely to be sensitive data in obvious spam, so this shouldn’t trip the general security aversion to sharing data.

But you will likely need an intelligence feed from a third-party analysis provider. As discussed in both the Early Warning System and Network-based Threat Intelligence (NBTI) research, we see a market emerging for intelligence providers specializing in aggregating and analyzing these data sources. They provide intelligence that can be used by enterprises to shorten the window between attack and detection.

Let’s dig into the kinds of intelligence we are looking for in phishing emails, and get back to the metaphor we introduced in the NBTI paper: the Who, What, Where, and When of phishing.


Establishing the ‘who’ behind phishing is probably the most important intelligence you can receive. Because a select few highly effective phishers (hundreds, not thousands) are behind many of the attacks you will see in the wild. So the ability to identify the author of attacks can yield all sorts of information, enabling you to profile and analyze your adversaries. Why is adversary analysis important?

  • Motive: Is the phish part of a targeted attack (spear phishing)? Is it part of a widespread attack on a financial institution to harvest account information? Is it to steal intellectual property? Knowing your adversary allows you to determine his/her motives, and thus to more effectively judge the true threat of the attack to your organization.
  • Tactics: Does this phisher use malware extensively? Do they just harvest account information? Is key logging their main technique? Understanding and profiling the adversary can indicate which controls to be implement to ensure protection. Also keep in mind that the ultimate target of the phishing attack is usually your customers, rather than your employees. So this helps you decide whether helping customers protect themselves is a worthwhile expense for you.
  • Capabilities: Finally, isolating your adversary and tracking them over time (as discussed below) provides clues to their capabilities. Do they rely purely on commercial phishing kits? Are they able to package 0-day attacks? Is it something in the middle? The more you know about the attackers, the more effectively you can make decisions about how to react.

The ultimate objective of adversary analysis is to more effectively prioritize remediation activities. Knowing who you are dealing with and what they are capable of is key, and can help you determine the urgency of response.


So how do you find a specific attacker within a corpus of millions of spam and phishing messages? It all comes back to profiles. The links embedded in the phishing messages indicate the locations of phishing sites, and you will see patterns in the domains and IP addresses used in attacks. Working backwards you can analyze the phishing site to determine the attack(s) in use, the tactics and capabilities used, and if you get lucky perhaps the attacker’s identity. This next level of analysis involves looking at ‘what’ the attack does.

A key development that made phishing far more accessible to unsophisticated attackers was phishing attack kits. These kits provide everything an attacker needs to launch a phishing campaign – including images, email copy, and specific tactics to capture account credentials. Of course phishing messages still need to evade an organization’s spam filters, but that tends to be reasonably straightforward given that phishing messages should look exactly like legitimate messages. That takes most of the sophisticated content analysis done by anti-spam filters out of play.

But these kits leave a trail, in the form of the source code used to install the kit on a compromised server. If you get an actual phishing kit you can analyze it just like any other malware (as discussed ad nauseum in Malware Analysis Quant) for clues to the malware used and what is ultimately done with stolen account credentials – all of which can help identify the attacker’s email.

Even better, profiling attack kits enables you to look for similar attack profiles, to identify the attacker far more quickly next time. Given a sufficient corpus of spam and phishing messages, you can mine the data for patterns of IP addresses and domains, to help identify the adversary and assist in identifying appropriate remediations.


As described above, phishing messages look like legitimate email, so much of anti-spam filters’ content analysis cannot detect them. But you can (and need to) analyze email headers to figure out where the messages come from, the path they take to your gateway, and for clues in links to phishing sites. That brings us to the discussion of ‘where’ the victim is directed to, once they take the bait and click on a phishing link.

Remember that the ‘location’ of any web site is an IP address and likely a domain name; both are important when analyzing phishing messages. Phishers leverage compromised sites as phishing and malware distribution locations, so it is common to see multiple phishing sites reside on the same (usually shared hosting) server with a single IP address. If many sites reside on a single server and IP address, the urgency of taking the site down and analyzing its contents increases.

Likewise, the domain name structure can yield useful information in terms of the domain generation algorithms and other mechanisms attackers use to obfuscate their phishing domain to make it look legitimate. Again, this kind of intelligence enables you to identify useful patterns as you watch for and block new sites using similar domain names. We will discuss how to leverage this kind of information in the next post, when we look at Quick Wins.

We should also mention the pros/cons of reputation in the battle to identify phishing email. Of course with enough data phishing IP addresses and domains can receive negative reputation and be blocked by anti-spam and web filters. But many phishing sites initially appear on recently compromised servers, using clean domain names and IP addresses with good reputations. Over time reputation is invaluable for disrupting attacks, but its value is minimal in the first wave.


You can also examine the history of pretty much any aspects of the phishing message you’re evaluating – including the phishing kit, IP addresses, domains, and specific attackers. History can show you how tactics have changed, and inform guesses about what will happen next. As we discussed in the Early Warning research, this historical context is not something a tool can provide in an automated fashion. You need HUMINT (human intelligence) to do that. But understanding your adversaries can help you more effectively plan defenses and responses.

Keep in mind that the velocity of phishing attacks continues to increase, while the life span of the attacks declines. With better and more sophisticated phishing kits, we will see more attacks launched at common brands, and given the attackers’ need to stay one step ahead, sites will be mined and then abandoned more quickly. With a phishing site up for a matter of hours rather than days, sometimes historical information is all you will have access to, because the attacker will move on before you have enough information for a full analysis.

It is now time to put the wealth of intelligence gathered from phishing messages to use. So we will wrap up this series by looking for a Quick Win with Email-based Threat Intelligence. We will dig into what you can do with intelligence, as well as pros and cons of traditional tactics such as phishing site takedowns.