Thoughts on Data Breach History

By Rich

I’ve been writing about data breaches for a long time now – ever since I received my first notification (from in 2002. For about 4 or 5 years now I’ve been giving various versions of my “Involuntary Case Studies in Data Breaches” presentation, where we dig into the history of data breaches and spend time detailing some of the more notable ones, from breach to resolution.

2 weeks ago I presented the latest iteration at the Source Boston conference (video here), and it is materially different than the version I gave at the first Source event. I did some wicked cool 3D visualization in the presentation, making it too big to post, so I thought I should at least post some of the conclusions and lessons. (I plan to make a video of the content, but that’s going to take a while).

Here are some interesting points that arise when we look over the entire history of data breaches:

  • Without compliance, there are no economic incentives to report breaches. When losing personally identifiable information (PII) the breached entity only suffers losses from fines and breach reporting costs. The rest of the system spreads out the cost of the fraud. For loss of intelectual property, there is no incentive to make the breach public.
  • Lost business is a myth. Consumers rarely change companies after a breach, even if that’s what they claim when responding to surveys.
  • I know of no cases where a lost laptop, backup tape, or other media resulted in fraud, even though that’s the most commonly reported breach category. Web application hacking and malware are the top categories for breaches that result in fraud.
  • SQL injection using xp_cmdshell was the source of the biggest pre-TJX credit card breach (CardSystems Solutions in 2004: 40 million transactions). This is the same technique Albert Gonzales used for Heartland, Hannaford, and a handful of other companies in 2008. We never learn, even when there are plenty of warning signs.
  • Our controls are poorly aligned with the threat – for example, nearly all DLP deployments focus on email, even though that’s one of the least common vectors for breaches and other losses.
  • The more a company tries to spin and wheedle out of a breach, the worse the PR (and possibly legal) consequences.
  • We will never be perfect, but most of our security relies on us never making a mistake. Defense in depth is broken, since every layer is its own little spear to the heart.
  • Most breaches are discovered by outsiders – not the breached company (real breaches, not lost media).

The history is pretty clear – we have no chance of being perfect, and since we focus too much on walls and not not enough on response, the bad guys get to act with near impunity. We do catch some of them, but only in the biggest breaches and mostly due to greed and mistakes (just like meatspace crime).

If you think this is interesting, I highly recommend you support the Open Security Foundation, which produces the DataLossDB. I found out only a handful of hard-working volunteers maintains our only public record of breaches. Once I get our PayPal account fixed (it’s tied to my corporate credit card, which was used in some fraud – ironic, yes, I know!) we’ll be sending some beer money their way.

No Related Posts


Good point, but I still think the analysis is reasonable. I’m not basing that conclusion on the DataLossDB alone- but also on the Verizon/Trustwave reports as well as client work.

No argument a ton of information leaks out over email, but is that information that causes *harm*? It pretty much never shows up for the financial fraud issues, and I’ve only seen one case of IP loss (I’m sure there are more though).

I think email is a major channel for customer lists and other kinds of internal issues, but until we tie those to actual corporate loss events are we making the right decision?

Now, if we had better metrics and people used a real risk-based approach, which you and i have talked about in other contexts, I think we’d both be happier and maybe email would be higher on the list.

But you’ve highlighted the main problem here- overall, we are generally stabbing in the dark with a dull spoon. Anything we do to gain greater visibility is good.

By Rich

Excellent work as usual, and really interesting analysis.  You make one statement I disagree with: “Our controls are poorly aligned with the threat—for example, nearly all DLP deployments focus on email, even though that’s one of the least common vectors for breaches and other losses.”

Rather, it’s one of the least common vectors *that we know about*. The problem is that we all know when data is lost because it’s sitting inside a laptop, thumb-drive or other media, because we have to replace that media. A procurement event necessitates an examination of what the thing had in it. So while it’s true that the DLDB shows email as the teeniest, weeniest piece of the overall pie, that’s because by definition the DLDB is listing known losses, and no one knows what was sent out on email before they installed DLP.

Your statement, therefore, is just a little inaccurate: it’s not that we’re protecting the wrong thing, it’s that we’re protecting the thing which provides the worst comparative metrics (loss before DLP versus loss after). We all did that mainly because a) that’s what the bulk of the (relatively) rapidly deployable technology supported, and b) information security people had a gut feeling that a great deal of data loss volume was egressing through email.

While it’s hard to find the good numbers on email leakage (and its subsequent impact on fraud), you said yourself elsewhere that a firm starts to see the stuff as soon as it turns on the DLP box. That is an indication that the gut feeling that email is a large-scale leakage vector is spot on.

Thanks for this and other valuable work on the subject.


By Nick Selby


Yep- I’ve seen your work and have always been very impressed!

By Rich

Very interesting presentation. The OSF is doing amazing work in two areas:  data breaches and vulnerabilities.  It is amazing what they have accomplished with a volunteer community.  They are definitely a worthwhile cause that merits broad support from all of us who benefit from their work.

You and other interested folks in the Securosis community may be interested in some of the quantitative analysis I have done using the OSF DataLossDB.  You can see it at (No login necessary.) Just go to the Dashboards area of the site.  I have posted two that are based on the DataLossDB. 

The first dashboard is titled Public Data Breaches which is solely based on the DataLossDB and presents some basic stats. 

The second dashboard is titled Stock Price Impact.  This looks at mashing up data from the DataLossDB with Google Finance data to get insight on the question “What is the impact of a breach on a public company’s stock price”.

Hope you find it interesting.

By Betsy Nichols


I’m thinking more about switching banks or retail stores, not software/infrastructure. Although I think one common principle applies to both- the more breaches over time, the higher the odds someone will switch. if IE only had one big flaw/exploit, no one would switch. It was a string of problems that affected market share.

By Rich

How are consumers reacting after the McAfee update bug fiasco?

I agree that breaches may not bet getting people to switch their providers, but surely it’s a differentiator for new business? Or perhaps it just takes quite a long time to matter. Look at IE’s market share.

By Marisa

If you like to leave comments, and aren’t a spammer, register for the site and email us at and we’ll turn off moderation for your account.