I first started tracking data breaches back in December of 2000 when I received my very first breach notification email, from Egghead Software. When Egghead wen bankrupt in 2001 and was acquired by Amazon, rather than assuming the breach caused the bankruptcy, I did some additional research and learned they were on a downward spiral long before their little security incident. This broke with the conventional wisdom floating around the security rubber-chicken circuit at the time, and was a fine example of the differences between correlation and causation.

Since then I’ve kept trying to translate what little breach material we’ve been able to get our collective hands on into as accurate a picture as possible on the real state of security. We don’t really have a lot to work with, despite the heroic efforts of the Open Security Foundation Data Loss Database (for a long time the only source on breach statistics). As with the rest of us, the Data Loss DB is completely reliant on public breach disclosures. Thanks to California S.B. 1386 and the mishmash of breach notification laws that have developed since 2005, we have a lot more information than we used to, but anyone in the security industry knows only a portion of breaches are reported (despite notification laws), and we often don’t get any details of how the intrusions occurred.

The problem with the Data Loss DB is that it’s based on incomplete information. They do their best, but more often than not we lack the real meat needed to make appropriate security and risk decisions. For example, we’ve seen plenty of vendor press releases on how lost laptops, backup tapes, and other media are the biggest source of data breaches. In reality, lost laptops and media are merely the greatest source of reported potential exposures. As I’ve talked about before, there is little or no correlation between these lost devices and any actual fraud. All those stats mean is a physical thing was lost or stolen… no more, no less, unless we find a case where we can correlate a loss with actual fraud.

On the research side I try to compensate for the statistics problem by taking more of a case study approach, at best I can using public resources. Even with the limited information released, as time passes we tend to dig up more and more details about breaches, especially once cases make it into court. That’s how we know, for example, that both CardSystems and Heartland Payment Systems were breached (5 years apart) using SQL injection against a web application (the xp_cmdshell command in a poorly configured version of SQL Server, to be specific).

In the past year or two we’ve gained some additional data sources, most notably the Verizon Data Breach Investigations Report which provides real, anonymized data regarding breaches. It’s limited in that it only reflects those incidents where Verizon participated in the investigation, and by the standardized information they collected, but it starts to give us better insight beyond public breach reports.

Yet we still only have a fraction of the information we need to make appropriate risk management decisions. Even after 20 years in the security world (if you count my physical security work), I’m still astounded that the bad guys share more real information on means and methods than we do.

We are thus extremely limited in assessing macro trends in security breaches. We’re forced to use far more anecdotal information than a skeptic like myself is comfortable with. We don’t even have a standard for assessing breach costs (as I’ve proposed, never mind more accurate crime and investigative statistics that could help craft our prioritization of security defenses.

Seriously – decades into the practice of security we don’t have any fracking idea if forcing users to change passwords every 90 days provides more benefit than burden.

All that said, we can’t sit on our asses and wait for the data. As unscientific as it may be, we still need to decide which security controls to apply where and when.

In the past couple weeks we’ve seen enough information emerging that I believe we now have a good idea of two major methods of attack:

  1. As we discussed here on the blog, SQL injection via web applications is one of the top attack vectors identified in recent breaches. These attacks are not only against transaction processing systems, but are also used to gain a toehold on internal networks to execute more invasive attacks.
  2. Brian Krebs has identified another major attack vector, where malware is installed on insecure consumer and business PCs, then used to gather information to facilitate illicit account transfers. I’ve seen additional reports that suggest this is also a major form of attack.

I’d love to back these with better statistics, but until those are available we have to rely on a mix of public disclosure and anecdotal information. We hear rumors of other vectors, such as customized malware (to avoid AV filters) and the ever-present-and-all-powerful insider threat, but there isn’t enough to validate those as a major trend quite yet.

If we look across all our sources, we see a consistent picture emerging. The vast majority of cybercrime still seems to take advantage of known vulnerabilities that are can be addressed using common practices. The Verizon report certainly calls out unpatched systems, configuration errors, and default passwords as the most common breach sources.

While we can’t state with complete certainty that patching systems, blocking SQL injection, removing default passwords, and enforcing secure configurations will prevent most breaches, the information we have does indicate that’s a reasonable direction. Combine that with following the Data Breach Triangle by reducing use of sensitive data (and using something like DLP to find it), and tightening up egress filtering on transaction processing networks and other sensitive data locations, and you are probably in pretty good shape.

For financial institutions struggling with their clients being breached, they can add out-of-band transaction verification (phone calls or even automated text messages), and/or consider using something like Trusteer that helps secure browser sessions (note – I only mention specifically them because I don’t know of any competitor).

None of this necessarily correlates with other kinds of security incidents, but based on the various information sources we do have access to, it seems a reasonable understanding of current means and methods is emerging, and we know which security controls can mitigate those attacks. This is all based on an extremely small sample set, but unfortunately that’s all we have.

The bad guys will, of course, change attacks once the current batch becomes less profitable, but that’s the way the world works. There’s nothing we can possibly do that can eliminate every potential attack method.

It’s also possible all the public information and reports are steering us in the wrong direction, but we need to make the best decisions we can until new data emerges.

Hopefully this is helpful. I know my recommendations have started to change based on the information that’s come out in the past year.