Securosis

Research

Securing Hadoop: Enterprise Security For NoSQL

Hadoop is now enterprise software. There, I said it. I know lots of readers in the IT space still look at Hadoop as an interloper, or worse, part of the rogue IT problem. But better than 50% of the enterprises we spoke with are running Hadoop somewhere within the organization. A small percentage are running Mongo, Cassandra or Riak in parallel with Hadoop, for specific projects. Discussions on what ‘big data’ is, if it is a viable technology, or even if open source can be considered ‘enterprise software’ are long past. What began as proof of concept projects have matured into critical application services. And with that change, IT teams are now tasked with getting a handle on Hadoop security, to which they response with questions like “How do I secure Hadoop?” and “How do I map existing data governance policies to NoSQL databases?” Security vendors will tell you both attacks on corporate IT systems and data breaches are prevalent, so with gobs of data under management, Hadoop provides a tempting target for ‘Hackers’. All of which is true, but as of today, there really have not been major data breaches where Hadoop play a part of the story. As such this sort of ‘FUD’ carries little weight with IT operations. But make no mistake, security is a requirement! As sensitive information, customer data, medical histories, intellectual property and just about every type of data used in enterprise computing is now commonly used in Hadoop clusters, the ‘C’ word (i.e.: Compliance) has become part of their daily vocabulary. One of the big changes we’ve seen in the last couple of years with Hadoop becoming business critical infrastructure, and another – directly cause by the first – is IT is being tasked with bringing existing clusters in line with enterprise compliance requirements. This is somewhat challenging as a fresh install of Hadoop suffers all the same weak points traditional IT systems have, so it takes work to get security set up and the reports being created. For clusters that are already up and running, no need to choose technologies and a deployment roadmap that does not upset ongoing operations. On top of that, there is the additional challenge that the in-house tools you use to secure things like SAP, or the SIEM infrastructure you use for compliance reporting, may not be suitable when it comes to NoSQL. Building security into the cluster The number of security solutions that are compatible – if not outright built for – Hadoop is the biggest change since 2012. All of the major security pillars – authentication, authorization, encryption, key management and configuration management – are covered and the tools are viable. Most of the advancement have come from the firms that provide enterprise distributions of Hadoop. They have built, and in many cases contributed back to the open source community, security tools that accomplish the basics of cluster security. When you look at the threat-response models introduced in the previous two posts, every compensating security control is now available. Better still, they have done a lot of the integration legwork for services like Kerberos, taking a lot of the pain out of deployments. Here are some of the components and functions that were not available – or not viable – in 2012. LDAP/AD Integration – Technically AD and LDAP integration were available in 2012, but these services have both been advanced, and are easier to integrate than before. In fact, this area has received the most attention, and integration is as simple as a setup wizard with some of the commercial platforms. The benefits are obvious, as firms can leverage existing access and authorization schemes, and defer user and role management to external sources. Apache Ranger – Ranger is one of the more interesting technologies to come available, and it closes the biggest gap: Module security policies and configuration management. It provides a tool for cluster administrators to set policies for different modules like Hive, Kafka, HBase or Yarn. What’s more, those policies are in context to the module, so it sets policies for files and directories when in HDSF, SQL policies when in Hive, and so on. This helps with data governance and compliance as administrators set how a cluster should be used, or how data is to be accessed, in ways that simple role based access controls cannot. Apache Knox – You can think of Knox in it’s simplest form as a Hadoop firewall. More correctly, it is an API gateway. It handles HTTP and REST-ful requests, enforcing authentication and usage policies of inbound requests, and blocking everything else. Knox can be used as a virtual moat’ around a cluster, or used with network segmentation to further reduce network attack surface. Apache Atlas – Atlas is a proposed open source governence framework for Hadoop. It allows for annotation of files and tables, set relationships between data sets, and even import meta-data from other sources. These features are helpful for reporting, data discovery and for controlling access. Atlas is new and we expect it to see significant maturing in coming years, but for now it offers some valuable tools for basic data governance and reporting. Apache Ambari – Ambari is a facility for provisioning and managing Hadoop clusters. It helps admins set configurations and propagate changes to the entire cluster. During our interviews we we only spoke to two firms using this capability, but we received positive feedback by both. Additionally we spoke with a handful of companies who had written their own configuration and launch scripts, with pre-deployment validation checks, usually for cloud and virtual machine deployments. This later approach was more time consuming to create, but offered greater capabilities, with each function orchestrated within IT operational processes (e.g.: continuous deployment, failure recovery, DevOps). For most, Ambari’s ability to get you up and running quickly and provide consistent cluster management is a big win and a suitable choice. Monitoring – Hive, PIQL, Impala, Spark SQL and similar modules offer SQL or pseudo-SQL syntax. This means that the activity monitoring, dynamic masking, redaction and tokenization technologies originally developed for

Share:
Read Post

Totally Transparent Research is the embodiment of how we work at Securosis. It’s our core operating philosophy, our research policy, and a specific process. We initially developed it to help maintain objectivity while producing licensed research, but its benefits extend to all aspects of our business.

Going beyond Open Source Research, and a far cry from the traditional syndicated research model, we think it’s the best way to produce independent, objective, quality research.

Here’s how it works:

  • Content is developed ‘live’ on the blog. Primary research is generally released in pieces, as a series of posts, so we can digest and integrate feedback, making the end results much stronger than traditional “ivory tower” research.
  • Comments are enabled for posts. All comments are kept except for spam, personal insults of a clearly inflammatory nature, and completely off-topic content that distracts from the discussion. We welcome comments critical of the work, even if somewhat insulting to the authors. Really.
  • Anyone can comment, and no registration is required. Vendors or consultants with a relevant product or offering must properly identify themselves. While their comments won’t be deleted, the writer/moderator will “call out”, identify, and possibly ridicule vendors who fail to do so.
  • Vendors considering licensing the content are welcome to provide feedback, but it must be posted in the comments - just like everyone else. There is no back channel influence on the research findings or posts.
    Analysts must reply to comments and defend the research position, or agree to modify the content.
  • At the end of the post series, the analyst compiles the posts into a paper, presentation, or other delivery vehicle. Public comments/input factors into the research, where appropriate.
  • If the research is distributed as a paper, significant commenters/contributors are acknowledged in the opening of the report. If they did not post their real names, handles used for comments are listed. Commenters do not retain any rights to the report, but their contributions will be recognized.
  • All primary research will be released under a Creative Commons license. The current license is Non-Commercial, Attribution. The analyst, at their discretion, may add a Derivative Works or Share Alike condition.
  • Securosis primary research does not discuss specific vendors or specific products/offerings, unless used to provide context, contrast or to make a point (which is very very rare).
    Although quotes from published primary research (and published primary research only) may be used in press releases, said quotes may never mention a specific vendor, even if the vendor is mentioned in the source report. Securosis must approve any quote to appear in any vendor marketing collateral.
  • Final primary research will be posted on the blog with open comments.
  • Research will be updated periodically to reflect market realities, based on the discretion of the primary analyst. Updated research will be dated and given a version number.
    For research that cannot be developed using this model, such as complex principles or models that are unsuited for a series of blog posts, the content will be chunked up and posted at or before release of the paper to solicit public feedback, and provide an open venue for comments and criticisms.
  • In rare cases Securosis may write papers outside of the primary research agenda, but only if the end result can be non-biased and valuable to the user community to supplement industry-wide efforts or advances. A “Radically Transparent Research” process will be followed in developing these papers, where absolutely all materials are public at all stages of development, including communications (email, call notes).
    Only the free primary research released on our site can be licensed. We will not accept licensing fees on research we charge users to access.
  • All licensed research will be clearly labeled with the licensees. No licensed research will be released without indicating the sources of licensing fees. Again, there will be no back channel influence. We’re open and transparent about our revenue sources.

In essence, we develop all of our research out in the open, and not only seek public comments, but keep those comments indefinitely as a record of the research creation process. If you believe we are biased or not doing our homework, you can call us out on it and it will be there in the record. Our philosophy involves cracking open the research process, and using our readers to eliminate bias and enhance the quality of the work.

On the back end, here’s how we handle this approach with licensees:

  • Licensees may propose paper topics. The topic may be accepted if it is consistent with the Securosis research agenda and goals, but only if it can be covered without bias and will be valuable to the end user community.
  • Analysts produce research according to their own research agendas, and may offer licensing under the same objectivity requirements.
  • The potential licensee will be provided an outline of our research positions and the potential research product so they can determine if it is likely to meet their objectives.
  • Once the licensee agrees, development of the primary research content begins, following the Totally Transparent Research process as outlined above. At this point, there is no money exchanged.
  • Upon completion of the paper, the licensee will receive a release candidate to determine whether the final result still meets their needs.
  • If the content does not meet their needs, the licensee is not required to pay, and the research will be released without licensing or with alternate licensees.
  • Licensees may host and reuse the content for the length of the license (typically one year). This includes placing the content behind a registration process, posting on white paper networks, or translation into other languages. The research will always be hosted at Securosis for free without registration.

Here is the language we currently place in our research project agreements:

Content will be created independently of LICENSEE with no obligations for payment. Once content is complete, LICENSEE will have a 3 day review period to determine if the content meets corporate objectives. If the content is unsuitable, LICENSEE will not be obligated for any payment and Securosis is free to distribute the whitepaper without branding or with alternate licensees, and will not complete any associated webcasts for the declining LICENSEE. Content licensing, webcasts and payment are contingent on the content being acceptable to LICENSEE. This maintains objectivity while limiting the risk to LICENSEE. Securosis maintains all rights to the content and to include Securosis branding in addition to any licensee branding.

Even this process itself is open to criticism. If you have questions or comments, you can email us or comment on the blog.