I spend a heck of a lot of time researching, writing, and speaking about data security. One area that’s been very disappointing is the quality of many of the surveys. Most either try to quantify losses (without using a verifiable loss model), measure general attitudes to inspire some BS hype press release, or assess some other fuzzy aspect you can spin any way you want.

This bugs me, and it’s been on my to-do list to run a better survey myself. When a vendor (Imperva) proposed the same thing back at RSA (meaning we’d have funding) and agreed to our Totally Transparent Research process, it was time to promote it to the top of the stack.

So we are kicking off our first big data security study. Following in the footsteps of the one we did for patch management, this survey will focus on hard metrics – our goal is to avoid general attitude and unquantifiable loss guesses, and focus on figuring out what people are really doing about data security.

As with all our surveys, we are soliciting ideas and feedback before we run it, and will release all the raw results.

Here are my initial ideas on how we might structure the questions:

  • We will group the questions to match the phases in the Pragmatic Data Security Cycle, since we need some structure to start with.
  • For each phase, we will list out the major technologies and processes, then ask which one organizations have adopted.
  • For technologies, we will ask which they’ve researched, budgeted for, purchased, deployed in a limited manner (such as testing), deployed in initial production, and deployed in full production (organization wide).
  • For processes, we will ask about maturity from ad-hoc through fully formalized and documented, similar to what we did for patch management.
  • For the tools and processes, we’ll ask if they were implemented due to a specific compliance deficiency during an assessment.

I’m also wondering if we ask should how many breaches or breach disclosures were directly prevented by the tool (estimates). I’m on the fence about this, because we would need to tightly constrain the question to avoid the results being abused in some way.

Those are my rough ideas – what do you think? Anything else you want to see? Is this even in the right direction? And remember – raw (anonymized) results will be released, so it’s kind of like your chance to run a survey and have someone else bear the costs and do all the work…

FYI The sponsor gets an exclusive on the raw results for 45 days or so, but they will be released free after that. We have to pay for these things somehow.