We are pleased to announce the release of our white paper on securing big data environments. This research project provides a high-level overview of security challenges for big data environments. We cover the ways big data differs from traditional relational databases, both architecturally and operationally. We look at some of the built-in and third-party security solutions for big data clusters, and how they work with – and against – big data installations. Finally, we make a base set of recommendations for securing big data installations – we recommend several technologies to address specific threats to the data and the big data cluster itself, preferring options which can scale with the cluster. After all, security should support big data clusters, not break or hamper them.

Somewhat to our surprise, a major task for this research project was to actually define big data. None of our past topics caused so much trouble identifying our topic. Big data clusters exhibit a handful of essential characteristics, but there are hundreds of possible functional configurations for creating a big data cluster. A concrete definition is elusive because there is an exception to almost every rule. One euphemism for big data is ‘NoSQL’ – which highlights big data’s freedom from traditional relational constraints, but there are relational big data clusters. In general we are talking about self-organizing clusters built on a distributed file model such as Hadoop, which can handle insertion and analysis of massive amounts of data. Beyond that it gets a bit fuzzy, and the range of potential uses is nearly limitless. So we developed a definition we think you will find helpful.

Finally, I would like to thank our sponsor for this research: Vormetric. Without sponsorship like this we could not bring you quality research free to the public! We hope you find this research – and the definition – helpful in understanding big data and its associated security challenges. Download the research paper: Securing Big Data.