Securosis

Research

NoSQL Security: Understanding NoSQL Platforms

I started this series on recommendations for securing NoSQL clusters a couple weeks ago, so sorry for the delay posting the rest of the series. I had some difficulty contacting the people I spoke with during the first part of this “big data” research project, and some vendors were been slow to respond with current product capabilities. As I hoped, launching this series “shook the tree of knowledge”, and several people responded to my inquiries. It has taken a little more time than I thought to schedule calls and parse through the data, but I am finally restarting, and should be able to quickly post the rest of the research. The first step is to describe what NoSQL is and how it differs from the other databases you have been protecting, so you can understand the challenges. Let’s get started… NoSQL Overview In our last research paper on this subject, we defined NoSQL platforms with a set of characteristics which differentiate them from big iron/proprietary MPP/”cloud in the box” platforms. Yes, some folks slapped “big data” stickers on the same ‘ol stuff they have always sold, but most people we speak with now understand that those platforms are not Hadoop-like, so we can dispense with discussion of the essential characteristics. If you need to characterize NoSQL platforms please review our introductory sections on securing big data clusters. Fundamentally, NoSQL is clustered data analytics and management. And please stop calling it “big data” – we are really discussing a building-block approach to databases. Rather than the packaged relational systems we have grown accustomed to over the last two decades, we now assemble different pieces (data management, data storage, orchestration, etc.) to satisfy specific requirements. The early architects of this trend had a chip on their collective shoulders, and they were trying to escape the shadow of ubiquitous relational databases, so they called their movement ‘NoSQL’. That term nicely illustrates their opposition to relational platforms. Not that they do not support SQL – many in fact do support Structured Query Language syntax. Worse, the term “big data” was used by the press to describe this trend, assigning a label taken from the most obvious – but not most important – characteristic of these platforms. Unfortunately that term serves the movement poorly. ‘NoSQL’ is not much better but it is a step in the right direction. These databases can be tailored to focus on speed, size, analytic capabilities, failsafe operation, or some other goal, and they enable computation on a massive scale for very little money. But just as importantly, they are fully customizable to meet different needs – often simultaneously! So what does a NoSQL database look like? The “poster child” is Hadoop. The Hadoop framework (the combination of Hadoop File System (HDFS) with other services such as YARN, Hive, Pig, etc.) is the general architecture employed by most NoSQL clusters, but many more are in wide use – including Cassandra, MongoDB, Couch, AWS SimpleDB, and Riak. There are over 125 known variations, but those account for the majority of customer usage today. NoSQL platforms scale and perform so well because of two key principles: distribution of data management and query processing across many servers (possibly thousands), combined with a modular architecture that allows different services to be swapped in as needed. Architecturally a Hadoop cluster looks like this: It is useful to think of the Hadoop framework as a ‘stack’, much like the famous LAMP stack. These pieces are normally grouped together, but you can mix and match and add to the stack as needed. For example Sqoop and Hive are replacement data access services. You can select a big data environment specifically to support columnar, graph, document, XML, or multidimensional data – all collectively called ‘NoSQL’ because they are not constrained by relational database constructs or a relational query parser. You can install different query engines depending on the type of data being stored. Or you can extend HDFS functionality with logging tools like Scribe. The entire stack can be configured and extended as needed. I have included Lustre, GFS, and GPFS, as they are all technically alternatives to HDFS but not as widely used. But the point is that this modular approach offers great flexibility at the expense of more difficult security, because each option brings its own security options and deficiencies. We get big, cheap, and easy data management and processing – with lagging security capabilities. This diagram illustrates some of the many variables in play. It seems like every customer uses a slightly different setup – tweaking for performance, manageability, and programmer preference. Many of the people we spoke with are on their second or third stack architecture, having replaced components which did not scale or perform as needed. This flexibility is great for functionality, but makes it much more difficult for third-party vendors to produce monitoring or configuration assessment tools. There are few constants (or even knowns) for NoSQL clusters, and things are too chaotic to definitively identify best or worst practices across all configurations. For convenient reference, we offer a list of key differences between relational and NoSQL platforms which impact security. Relational platforms typically scale by replacing a server with a larger one, rather than by adding many new servers. Relational systems have a “walled garden” security model: you attach to the database through well-defined interfaces, but internal workings are generally not exposed. Relational platforms come with many tools such as built-in encryption, SQL validation, centralized administration, full support for identity management, built-in roles, administrative segregation of duties, and labeling capabilities. You can add many of these features to a NoSQL cluster, but still face the fundamental problem of securing a dynamic constellation of many servers. This makes configuration management, patching, and server validation particularly challenging. Despite a few security detractors, NoSQL facilitates data management and very fast analysis on large-scale data warehouses, at very low cost. Cheap, fast, and easy are the three pillars this movement has been built upon – data analytics for the masses. A NoSQL

Share:
Read Post

Firestarter: The Verizon DBIR

After missing a week, Rich, Mike, and Adrian return to talk about birthdays, the annual Verizon Data Breach Investigations Report, and child-induced alcohol consumption. The audio-only version is up too.   Share:

Share:
Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.