Big Data: massively scalable distributed data environments.

 Big data systems have become incredibly popular, because they offer a low-cost way to analyze enormous sets of rapidly changing data. But the sad fact is that Hadoop, Mongo, Couch and Riak have almost no built-in security capabilities, leaving data exposed on every storage node. This research paper discusses how to deploy the most fundamental data security controls – including encryption, isolation, and access controls/identity management – for a big data system. But before we discuss how to secure big data, we have to decide what big data is. So we start with a definition of big data, what it provides, and how it poses different security challenges than prior data storage clusters and database systems. From there we branch out into two major areas of concern: high-level architectural considerations and tactical operational options. Finally, we close with several recommendations for security technologies to solve specific big data security problems, while meeting the design challenges of scalability and distributed management, which are fundamental to big data clusters.

We would like to thank Vormetric for sponsoring this research. Sponsorship allows us to bring our research to the public free of charge.