NoSQL Security 2.0 [New Series] *updated*

NoSQL, both the technology and the industry, have taken off. We are past the point where we can call big data a fad, and we recognize that we are staring straight into the face of the next generation of data storage platforms. About 2 years ago we started the first Securosis research project on big data security, and a lot has changed since then. At that point many people had heard of Hadoop, but could not describe what characteristics made big data different than relational databases – other than storing a lot of data. Now there is no question that NoSQL — as a data management platform — is here to stay; enterprises have jumped into large scale analysis projects with both feet and people understand the advantages of leveraging analytics for business, operations, and security use cases. But as with all types of databases – and make no mistake, big data systems are databases – high quality data produces better analysis results. Which is why in the majority of cases we have witnessed, a key ingredient is sensitive data. It may be customer data, transactional data, intellectual property, or financial information, but it is a critical ingredient. It is not really a question of whether sensitive data is stored within the cluster – more one of which sensitive data it contains. Given broad adoption, rapidly advancing platforms, and sensitive data, it is time to re-examine how to secure these systems and the data they store. But this paper will be different than the last one. We will offer much more on big data security strategies in addition to tools and technologies. We will spend less time defining big data and more looking at trends. We will offer more explanation of security building blocks including data encryption, logging, network encryption, and access controls/identity management in big data ecosystems. We will discuss the types of threats to big data and look at some of the use cases driving security discussions. And just like last time, we will offer a frank discussion of limitations in platforms and vendor offerings, which leave holes in security or fail to mesh with the inherent performance and scalability of big data. I keep getting one question from enterprise customers and security vendors. People ask repeatedly for a short discussion of data-centric security, so this paper provides one. This is because I have gotten far fewer questions in the last year on how to protect a NoSQL cluster, and far more on how to protect data before it is stored into the cluster. This was a surprise, and it is not clear from my conversations whether it is because users simply don’t trust the big data technology, due to worries about data propagation, because they don’t feel they can meet compliance obligations, or if they are worried about the double whammy of big data atop cloud services – all these explanations are plausible, and they have all come up. But regardless of driver, companies are looking for advice around encryption and wondering if tokenization and masking are viable alternatives for their use cases. The nature of the questions tells me that is where the market is looking for guidance, so I will cover both cluster security and data-centric security approaches. Here is our current outline: Big Data Overview and Trends: This post will provide a refresher on what big data is, how it differs from relational databases, and how companies are leveraging its intrinsic advantages. We will also provide references on how the market has changed and matured over the last 24 months, as this bears on how to approach security. Big Data Security Challenges: We will discuss why it is different architecturally and operationally, and also how the platform bundles and approaches differ from traditional relational databases. We will discuss what traditional tools, technologies and security controls are present, and how usage of these tools differs in big data environments. Big Data Security Approaches: We will outline the approaches companies take when implementing big data security programs, as reference architectures. We will outline walled-garden models, cluster security approaches, data-centric security, and cloud strategies. Cluster Security: An examination of how to secure a big data cluster. This will be a threat-centric examination of how to secure a cluster from attackers, rogue admins, and application programmers. Data (Centric) Security: We will look at tools and technologies that protect data regardless of where it is stored or moved, for use when you don’t trust the database or its repository. Application Security: An executive summary of application security controls and approaches. Big data in cloud environments: Several cloud providers offer big data as part of Platform or Infrastructure as a Service offerings. Intrinsic to these environments are security controls offered by the cloud vendor, offering optional approaches to securing the cluster and meeting compliance requirements. Operational Considerations: Day-to-day management of the cluster is different than management of relational databases, so the focus of security efforts changes too. This post will examine how daily security tasks change and how to adjust operational controls and processes to compensate. We will also offer advice on integration with existing security systems such as SIEM and IAM. As with all our papers, you have a voice in what we cover. So I would like feedback from readers, particularly whether you want a short section of application layer security as well. It is (tentatively) included in the current outline. Obviously this would be a brief overview – application security itself is a very large topic. That said, I would like input on that and any other areas you feel need addressing. Share:

Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.